Skip to main content
Glama
pydantic

mcp-run-python

Official
by pydantic
bedrock.md11.9 kB
# Bedrock ## Install To use `BedrockConverseModel`, you need to either install `pydantic-ai`, or install `pydantic-ai-slim` with the `bedrock` optional group: ```bash pip/uv-add "pydantic-ai-slim[bedrock]" ``` ## Configuration To use [AWS Bedrock](https://aws.amazon.com/bedrock/), you'll need an AWS account with Bedrock enabled and appropriate credentials. You can use either AWS credentials directly or a pre-configured boto3 client. `BedrockModelName` contains a list of available Bedrock models, including models from Anthropic, Amazon, Cohere, Meta, and Mistral. ## Environment variables You can set your AWS credentials as environment variables ([among other options](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables)): ```bash export AWS_BEARER_TOKEN_BEDROCK='your-api-key' # or: export AWS_ACCESS_KEY_ID='your-access-key' export AWS_SECRET_ACCESS_KEY='your-secret-key' export AWS_DEFAULT_REGION='us-east-1' # or your preferred region ``` You can then use `BedrockConverseModel` by name: ```python from pydantic_ai import Agent agent = Agent('bedrock:anthropic.claude-3-sonnet-20240229-v1:0') ... ``` Or initialize the model directly with just the model name: ```python from pydantic_ai import Agent from pydantic_ai.models.bedrock import BedrockConverseModel model = BedrockConverseModel('anthropic.claude-3-sonnet-20240229-v1:0') agent = Agent(model) ... ``` ## Customizing Bedrock Runtime API You can customize the Bedrock Runtime API calls by adding additional parameters, such as [guardrail configurations](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html) and [performance settings](https://docs.aws.amazon.com/bedrock/latest/userguide/latency-optimized-inference.html). For a complete list of configurable parameters, refer to the documentation for [`BedrockModelSettings`][pydantic_ai.models.bedrock.BedrockModelSettings]. ```python {title="customize_bedrock_model_settings.py"} from pydantic_ai import Agent from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings # Define Bedrock model settings with guardrail and performance configurations bedrock_model_settings = BedrockModelSettings( bedrock_guardrail_config={ 'guardrailIdentifier': 'v1', 'guardrailVersion': 'v1', 'trace': 'enabled' }, bedrock_performance_configuration={ 'latency': 'optimized' } ) model = BedrockConverseModel(model_name='us.amazon.nova-pro-v1:0') agent = Agent(model=model, model_settings=bedrock_model_settings) ``` ## Prompt Caching Bedrock supports [prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html) on Anthropic models so you can reuse expensive context across requests. Pydantic AI provides four ways to use prompt caching: 1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker to cache everything before it in the current user message. 2. **Cache System Instructions**: Enable [`BedrockModelSettings.bedrock_cache_instructions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_instructions] to append a cache point after the system prompt. 3. **Cache Tool Definitions**: Enable [`BedrockModelSettings.bedrock_cache_tool_definitions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_tool_definitions] to cache your tool schemas. 4. **Cache All Messages**: Set [`BedrockModelSettings.bedrock_cache_messages`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_messages] to `True` to automatically cache the last user message. !!! note "No TTL Support" Unlike the direct Anthropic API, Bedrock manages cache TTL automatically. All cache settings are boolean only — no `'5m'` or `'1h'` options. !!! note "Minimum Token Threshold" AWS only serves cached content once a segment crosses the provider-specific minimum token thresholds (see the [Bedrock prompt caching docs](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html)). Short prompts or tool definitions below those limits will bypass the cache, so don't expect savings for tiny payloads. ### Example 1: Automatic Message Caching Use `bedrock_cache_messages` to automatically cache the last user message: ```python {test="skip"} from pydantic_ai import Agent from pydantic_ai.models.bedrock import BedrockModelSettings agent = Agent( 'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0', system_prompt='You are a helpful assistant.', model_settings=BedrockModelSettings( bedrock_cache_messages=True, # Automatically caches the last message ), ) # The last message is automatically cached - no need for manual CachePoint result1 = agent.run_sync('What is the capital of France?') # Subsequent calls with similar conversation benefit from cache result2 = agent.run_sync('What is the capital of Germany?') print(f'Cache write: {result1.usage().cache_write_tokens}') print(f'Cache read: {result2.usage().cache_read_tokens}') ``` ### Example 2: Comprehensive Caching Strategy Combine multiple cache settings for maximum savings: ```python {test="skip"} from pydantic_ai import Agent, RunContext from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0') agent = Agent( model, system_prompt='Detailed instructions...', model_settings=BedrockModelSettings( bedrock_cache_instructions=True, # Cache system instructions bedrock_cache_tool_definitions=True, # Cache tool definitions bedrock_cache_messages=True, # Also cache the last message ), ) @agent.tool def search_docs(ctx: RunContext, query: str) -> str: """Search documentation.""" return f'Results for {query}' result = agent.run_sync('Search for Python best practices') print(result.output) ``` ### Example 3: Fine-Grained Control with CachePoint Use manual `CachePoint` markers to control cache locations precisely: ```python {test="skip"} from pydantic_ai import Agent, CachePoint agent = Agent( 'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0', system_prompt='Instructions...', ) # Manually control cache points for specific content blocks result = agent.run_sync([ 'Long context from documentation...', CachePoint(), # Cache everything up to this point 'First question' ]) print(result.output) ``` ### Accessing Cache Usage Statistics Access cache usage statistics via [`RequestUsage`][pydantic_ai.usage.RequestUsage]: ```python {test="skip"} from pydantic_ai import Agent, CachePoint agent = Agent('bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0') async def main(): result = await agent.run( [ 'Reference material...', CachePoint(), 'What changed since last time?', ] ) usage = result.usage() print(f'Cache writes: {usage.cache_write_tokens}') print(f'Cache reads: {usage.cache_read_tokens}') ``` ### Cache Point Limits Bedrock enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors. #### How Cache Points Are Allocated Cache points can be placed in three locations: 1. **System Prompt**: Via `bedrock_cache_instructions` setting (adds cache point to last system prompt block) 2. **Tool Definitions**: Via `bedrock_cache_tool_definitions` setting (adds cache point to last tool definition) 3. **Messages**: Via `CachePoint` markers or `bedrock_cache_messages` setting (adds cache points to message content) Each setting uses **at most 1 cache point**, but you can combine them. #### Automatic Cache Point Limiting When cache points from all sources (settings + `CachePoint` markers) exceed 4, Pydantic AI automatically removes excess cache points from **older message content** (keeping the most recent ones). ```python {test="skip"} from pydantic_ai import Agent, CachePoint from pydantic_ai.models.bedrock import BedrockModelSettings agent = Agent( 'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0', system_prompt='Instructions...', model_settings=BedrockModelSettings( bedrock_cache_instructions=True, # 1 cache point bedrock_cache_tool_definitions=True, # 1 cache point ), ) @agent.tool_plain def search() -> str: return 'data' # Already using 2 cache points (instructions + tools) # Can add 2 more CachePoint markers (4 total limit) result = agent.run_sync([ 'Context 1', CachePoint(), # Oldest - will be removed 'Context 2', CachePoint(), # Will be kept (3rd point) 'Context 3', CachePoint(), # Will be kept (4th point) 'Question' ]) # Final cache points: instructions + tools + Context 2 + Context 3 = 4 print(result.output) ``` **Key Points**: - System and tool cache points are **always preserved** - The cache point created by `bedrock_cache_messages` is **always preserved** (as it's the newest message cache point) - Additional `CachePoint` markers in messages are removed from oldest to newest when the limit is exceeded - This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching ## `provider` argument You can provide a custom `BedrockProvider` via the `provider` argument. This is useful when you want to specify credentials directly or use a custom boto3 client: ```python from pydantic_ai import Agent from pydantic_ai.models.bedrock import BedrockConverseModel from pydantic_ai.providers.bedrock import BedrockProvider # Using AWS credentials directly model = BedrockConverseModel( 'anthropic.claude-3-sonnet-20240229-v1:0', provider=BedrockProvider( region_name='us-east-1', aws_access_key_id='your-access-key', aws_secret_access_key='your-secret-key', ), ) agent = Agent(model) ... ``` You can also pass a pre-configured boto3 client: ```python import boto3 from pydantic_ai import Agent from pydantic_ai.models.bedrock import BedrockConverseModel from pydantic_ai.providers.bedrock import BedrockProvider # Using a pre-configured boto3 client bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1') model = BedrockConverseModel( 'anthropic.claude-3-sonnet-20240229-v1:0', provider=BedrockProvider(bedrock_client=bedrock_client), ) agent = Agent(model) ... ``` ## Configuring Retries Bedrock uses boto3's built-in retry mechanisms. You can configure retry behavior by passing a custom boto3 client with retry settings: ```python import boto3 from botocore.config import Config from pydantic_ai import Agent from pydantic_ai.models.bedrock import BedrockConverseModel from pydantic_ai.providers.bedrock import BedrockProvider # Configure retry settings config = Config( retries={ 'max_attempts': 5, 'mode': 'adaptive' # Recommended for rate limiting } ) bedrock_client = boto3.client( 'bedrock-runtime', region_name='us-east-1', config=config ) model = BedrockConverseModel( 'us.amazon.nova-micro-v1:0', provider=BedrockProvider(bedrock_client=bedrock_client), ) agent = Agent(model) ``` ### Retry Modes - `'legacy'` (default): 5 attempts, basic retry behavior - `'standard'`: 3 attempts, more comprehensive error coverage - `'adaptive'`: 3 attempts with client-side rate limiting (recommended for handling `ThrottlingException`) For more details on boto3 retry configuration, see the [AWS boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html). !!! note Unlike other providers that use httpx for HTTP requests, Bedrock uses boto3's native retry mechanisms. The retry strategies described in [HTTP Request Retries](../retries.md) do not apply to Bedrock.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pydantic/pydantic-ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server