Skip to main content
Glama

Bedrock Prompts MCP Server

by mslangford
QUICK_REFERENCE.md7.52 kB
# Quick Reference Guide ## Common Use Cases ### 1. Generate Survey Responses at Scale **Problem**: You need to simulate 100 survey responses with demographic variation. **Solution**: Use batch invocation with different variable sets. ```python # In Claude Desktop, just say: "Batch invoke prompt survey-respondent with 100 different demographic profiles" # The server will: # 1. Run 100 parallel invocations (with max_workers=10) # 2. Each with different variables like: # {"age": "35", "location": "TX", "income": "75000"} # {"age": "52", "location": "CA", "income": "120000"} # etc. ``` **Expected Output**: ```json { "total_invocations": 100, "successful": 100, "failed": 0, "results": [...] } ``` --- ### 2. Test Multiple Questions on Same Persona **Problem**: You have one persona prompt and want to test 10 different questions. **Solution**: Use batch invocation with same persona, different questions. ```python # In Claude Desktop: "Batch invoke my-persona prompt with these 10 questions: [list questions]" # Or programmatically: variable_sets = [ {"question": "How do you feel about the economy?"}, {"question": "What are your hiring plans?"}, {"question": "How do you view inflation?"}, # ... 7 more ] batch_invoke_prompt("PROMPT_ID", variable_sets, max_workers=5) ``` --- ### 3. Generate Long-Form Content with Streaming **Problem**: Your prompt generates a detailed report and you want to see progress. **Solution**: Use streaming invocation. ```python # In Claude Desktop: "Stream the response from my-report-generator prompt" # Returns chunks as they're generated: { "completion": "Full text...", "chunks": ["Executive ", "Summary:\n", "The analysis...", ...], "chunk_count": 150 } ``` **Best for**: - Long reports - Detailed analysis - Multi-paragraph responses - When you want to show progress to users --- ### 4. A/B Test Different Prompt Versions **Problem**: You have two versions of a prompt and want to compare outputs. **Solution**: Use version management. ```python # In Claude Desktop: "Compare version 1 and version 2 of my-prompt with the same input" # Programmatically: result_v1 = invoke_prompt("PROMPT_ID", variables, prompt_version="1") result_v2 = invoke_prompt("PROMPT_ID", variables, prompt_version="2") # Compare: print(f"V1: {result_v1['completion'][:100]}...") print(f"V2: {result_v2['completion'][:100]}...") ``` --- ### 5. Work with Multiple Model Types **Problem**: You want to test the same prompt across different models. **Solution**: The server automatically handles different model formats. ```python # Create prompts with different models in Bedrock: # - prompt-claude: Uses claude-3-5-sonnet # - prompt-titan: Uses amazon.titan-text-express # - prompt-llama: Uses meta.llama3-70b # Invoke each: claude_result = invoke_prompt("prompt-claude", variables) titan_result = invoke_prompt("prompt-titan", variables) llama_result = invoke_prompt("prompt-llama", variables) # Server handles request/response formatting automatically ``` **Supported Models**: - Claude (Anthropic) - Titan (Amazon) - Llama (Meta) - Mistral AI - Cohere - AI21 Labs --- ### 6. Debug Prompt Templates **Problem**: You're not sure what variables your prompt expects. **Solution**: Get prompt details first. ```python # In Claude Desktop: "Show me the details and required variables for prompt VSWZVOISYG" # Returns: { "name": "survey-respondent", "template": "Answer this: {{question}}...", "inputVariables": [ {"name": "question"} ] } ``` --- ### 7. Parallel Processing for Performance **Problem**: You need to generate 50 responses quickly. **Solution**: Use batch invocation with higher max_workers. ```python # Default (5 workers) - ~10 seconds per batch of 5 batch_invoke_prompt("ID", variable_sets, max_workers=5) # Faster (10 workers) - ~10 seconds per batch of 10 batch_invoke_prompt("ID", variable_sets, max_workers=10) # Note: AWS has rate limits, so don't go too high ``` **Performance Tips**: - max_workers=5: Good default, safe rate limits - max_workers=10: Faster, watch for throttling - Each invocation ~2-10 seconds depending on response length --- ### 8. Handle Errors Gracefully **Problem**: Some invocations might fail due to throttling or validation. **Solution**: Batch invocation tracks successes and failures separately. ```python result = batch_invoke_prompt("ID", variable_sets) print(f"Successful: {result['successful']}/{result['total_invocations']}") # Check failures: if result['failed'] > 0: for error in result['errors']: print(f"Failed: {error['variables']} - {error['error']}") # Retry just the failed ones: retry_sets = [e['variables'] for e in result['errors']] retry_result = batch_invoke_prompt("ID", retry_sets) ``` --- ## Performance Benchmarks Based on typical usage: | Operation | Time | Notes | |-----------|------|-------| | Single invoke | 2-5s | Depends on response length | | Batch invoke (5) | 5-10s | Parallel, limited by slowest | | Batch invoke (50) | 50-100s | 10 batches × max_workers=5 | | Streaming | 2-10s | Same as regular, but shows progress | | List prompts | <1s | Fast, cached | | Get details | <1s | Fast, metadata only | --- ## Best Practices ### 1. Variable Naming - Use descriptive names: `question`, `demographic_data`, `context` - Match exactly (case-sensitive): `{{Question}}` ≠ `{{question}}` - Both `{{var}}` and `{var}` syntax supported ### 2. Batch Size - Small batches (5-10): Quick results, good for testing - Medium batches (20-50): Production use - Large batches (100+): Split into multiple calls ### 3. Error Handling - Always check `result["success"]` - Batch results have separate `errors` array - Implement retry logic for transient failures ### 4. Model Selection - Claude: Best for complex reasoning, multi-turn - Titan: Fast, cost-effective for simple tasks - Llama: Open-source alternative - Match model to task complexity ### 5. Streaming - Use for responses >500 words - Good UX for real-time display - Slightly higher latency overhead - Not all models support streaming equally --- ## Troubleshooting ### "No credentials found" ```bash aws configure # or export AWS_ACCESS_KEY_ID=xxx export AWS_SECRET_ACCESS_KEY=xxx ``` ### "Throttling exception" - Reduce max_workers from 10 → 5 → 3 - Add delays between batches - Request limit increase from AWS ### "Variable not found in template" - Check variable name spelling (case-sensitive) - Get prompt details to see expected variables - Use `{{var}}` or `{var}` syntax consistently ### "Model not supported in region" - Check model availability: `aws bedrock list-foundation-models` - Some models only in us-east-1, us-west-2 - Update model_id in prompt configuration --- ## Advanced: Custom Model Support If you need to support a new model type: 1. Add to `get_model_type()` detection 2. Create `build_request_body_<model>()` function 3. Create `parse_response_<model>()` function 4. Add to `build_request_body()` and `parse_model_response()` mappings Example for hypothetical "NewModel": ```python def build_request_body_newmodel(filled_template, inference_config, additional_fields): return { "input": filled_template, "parameters": { "max_length": int(inference_config.get("maxTokens", 2000)), "temp": float(inference_config.get("temperature", 1.0)) } } def parse_response_newmodel(response_body): return response_body.get("output", {}).get("text", "") ```

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mslangford/bedrock-prompts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server