Skip to main content
Glama
pydantic

mcp-run-python

Official
by pydantic
embeddings.md14.7 kB
# Embeddings Embeddings are vector representations of text that capture semantic meaning. They're essential for building: - **Semantic search** — Find documents based on meaning, not just keyword matching - **RAG (Retrieval-Augmented Generation)** — Retrieve relevant context for your AI agents - **Similarity detection** — Find similar documents, detect duplicates, or cluster content - **Classification** — Use embeddings as features for downstream ML models Pydantic AI provides a unified interface for generating embeddings across multiple providers. ## Quick Start The [`Embedder`][pydantic_ai.embeddings.Embedder] class is the high-level interface for generating embeddings: ```python {title="embeddings_quickstart.py"} from pydantic_ai import Embedder embedder = Embedder('openai:text-embedding-3-small') async def main(): # Embed a search query result = await embedder.embed_query('What is machine learning?') print(f'Embedding dimensions: {len(result.embeddings[0])}') #> Embedding dimensions: 1536 # Embed multiple documents at once docs = [ 'Machine learning is a subset of AI.', 'Deep learning uses neural networks.', 'Python is a programming language.', ] result = await embedder.embed_documents(docs) print(f'Embedded {len(result.embeddings)} documents') #> Embedded 3 documents ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ !!! tip "Queries vs Documents" Some embedding models optimize differently for queries and documents. Use [`embed_query()`][pydantic_ai.embeddings.Embedder.embed_query] for search queries and [`embed_documents()`][pydantic_ai.embeddings.Embedder.embed_documents] for content you're indexing. ## Embedding Result All embed methods return an [`EmbeddingResult`][pydantic_ai.embeddings.EmbeddingResult] containing the embeddings along with useful metadata. For convenience, you can access embeddings either by index (`result[0]`) or by the original input text (`result['Hello world']`). ```python {title="embedding_result.py"} from pydantic_ai import Embedder embedder = Embedder('openai:text-embedding-3-small') async def main(): result = await embedder.embed_query('Hello world') # Access embeddings - each is a sequence of floats embedding = result.embeddings[0] # By index via .embeddings embedding = result[0] # Or directly via __getitem__ embedding = result['Hello world'] # Or by original input text print(f'Dimensions: {len(embedding)}') #> Dimensions: 1536 # Check usage print(f'Tokens used: {result.usage.input_tokens}') #> Tokens used: 2 # Calculate cost (requires `genai-prices` to have pricing data for the model) cost = result.cost() print(f'Cost: ${cost.total_price:.6f}') #> Cost: $0.000000 ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ ## Providers ### OpenAI [`OpenAIEmbeddingModel`][pydantic_ai.embeddings.openai.OpenAIEmbeddingModel] works with OpenAI's embeddings API and any [OpenAI-compatible provider](models/openai.md#openai-compatible-models). #### Install To use OpenAI embedding models, you need to either install `pydantic-ai`, or install `pydantic-ai-slim` with the `openai` optional group: ```bash pip/uv-add "pydantic-ai-slim[openai]" ``` #### Configuration To use `OpenAIEmbeddingModel` with the OpenAI API, go to [platform.openai.com](https://platform.openai.com/) and follow your nose until you find the place to generate an API key. Once you have the API key, you can set it as an environment variable: ```bash export OPENAI_API_KEY='your-api-key' ``` You can then use the model: ```python {title="openai_embeddings.py"} from pydantic_ai import Embedder embedder = Embedder('openai:text-embedding-3-small') async def main(): result = await embedder.embed_query('Hello world') print(len(result.embeddings[0])) #> 1536 ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ See [OpenAI's embedding models](https://platform.openai.com/docs/guides/embeddings) for available models. #### Dimension Control OpenAI's `text-embedding-3-*` models support dimension reduction via the `dimensions` setting: ```python {title="openai_dimensions.py"} from pydantic_ai import Embedder from pydantic_ai.embeddings import EmbeddingSettings embedder = Embedder( 'openai:text-embedding-3-small', settings=EmbeddingSettings(dimensions=256), ) async def main(): result = await embedder.embed_query('Hello world') print(len(result.embeddings[0])) #> 1536 ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ #### OpenAI-Compatible Providers {#openai-compatible} Since [`OpenAIEmbeddingModel`][pydantic_ai.embeddings.openai.OpenAIEmbeddingModel] uses the same provider system as [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel], you can use it with any [OpenAI-compatible provider](models/openai.md#openai-compatible-models): ```python {title="openai_compatible_embeddings.py"} # Using Azure OpenAI from openai import AsyncAzureOpenAI from pydantic_ai import Embedder from pydantic_ai.embeddings.openai import OpenAIEmbeddingModel from pydantic_ai.providers.openai import OpenAIProvider azure_client = AsyncAzureOpenAI( azure_endpoint='https://your-resource.openai.azure.com', api_version='2024-02-01', api_key='your-azure-key', ) model = OpenAIEmbeddingModel( 'text-embedding-3-small', provider=OpenAIProvider(openai_client=azure_client), ) embedder = Embedder(model) # Using any OpenAI-compatible API model = OpenAIEmbeddingModel( 'your-model-name', provider=OpenAIProvider( base_url='https://your-provider.com/v1', api_key='your-api-key', ), ) embedder = Embedder(model) ``` For providers with dedicated provider classes (like [`OllamaProvider`][pydantic_ai.providers.ollama.OllamaProvider] or [`AzureProvider`][pydantic_ai.providers.azure.AzureProvider]), you can use the shorthand syntax: ```python from pydantic_ai import Embedder embedder = Embedder('azure:text-embedding-3-small') embedder = Embedder('ollama:nomic-embed-text') ``` See [OpenAI-compatible Models](models/openai.md#openai-compatible-models) for the full list of supported providers. ### Cohere [`CohereEmbeddingModel`][pydantic_ai.embeddings.cohere.CohereEmbeddingModel] provides access to Cohere's embedding models, which offer multilingual support and various model sizes. #### Install To use Cohere embedding models, you need to either install `pydantic-ai`, or install `pydantic-ai-slim` with the `cohere` optional group: ```bash pip/uv-add "pydantic-ai-slim[cohere]" ``` #### Configuration To use `CohereEmbeddingModel`, go to [dashboard.cohere.com/api-keys](https://dashboard.cohere.com/api-keys) and follow your nose until you find the place to generate an API key. Once you have the API key, you can set it as an environment variable: ```bash export CO_API_KEY='your-api-key' ``` You can then use the model: ```python {title="cohere_embeddings.py"} from pydantic_ai import Embedder embedder = Embedder('cohere:embed-v4.0') async def main(): result = await embedder.embed_query('Hello world') print(len(result.embeddings[0])) #> 1024 ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ See the [Cohere Embed documentation](https://docs.cohere.com/docs/cohere-embed) for available models. #### Cohere-Specific Settings Cohere models support additional settings via [`CohereEmbeddingSettings`][pydantic_ai.embeddings.cohere.CohereEmbeddingSettings]: ```python {title="cohere_settings.py"} from pydantic_ai import Embedder from pydantic_ai.embeddings.cohere import CohereEmbeddingSettings embedder = Embedder( 'cohere:embed-v4.0', settings=CohereEmbeddingSettings( dimensions=512, cohere_truncate='END', # Truncate long inputs instead of erroring cohere_max_tokens=256, # Limit tokens per input ), ) ``` ### Sentence Transformers (Local) [`SentenceTransformerEmbeddingModel`][pydantic_ai.embeddings.sentence_transformers.SentenceTransformerEmbeddingModel] runs embeddings locally using the [sentence-transformers](https://www.sbert.net/) library. This is ideal for: - **Privacy** — Data never leaves your infrastructure - **Cost** — No API charges for high-volume workloads - **Offline use** — No internet connection required after model download #### Install To use Sentence Transformers embedding models, you need to install `pydantic-ai-slim` with the `sentence-transformers` optional group: ```bash pip/uv-add "pydantic-ai-slim[sentence-transformers]" ``` #### Usage ```python {title="sentence_transformers_embeddings.py"} from pydantic_ai import Embedder # Model is downloaded from Hugging Face on first use embedder = Embedder('sentence-transformers:all-MiniLM-L6-v2') async def main(): result = await embedder.embed_query('Hello world') print(len(result.embeddings[0])) #> 384 ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ See the [Sentence-Transformers pretrained models](https://www.sbert.net/docs/sentence_transformer/pretrained_models.html) documentation for available models. #### Device Selection Control which device to use for inference: ```python {title="sentence_transformers_device.py"} from pydantic_ai import Embedder from pydantic_ai.embeddings.sentence_transformers import ( SentenceTransformersEmbeddingSettings, ) embedder = Embedder( 'sentence-transformers:all-MiniLM-L6-v2', settings=SentenceTransformersEmbeddingSettings( sentence_transformers_device='cuda', # Use GPU sentence_transformers_normalize_embeddings=True, # L2 normalize ), ) ``` #### Using an Existing Model Instance If you need more control over model initialization: ```python {title="sentence_transformers_instance.py"} from sentence_transformers import SentenceTransformer from pydantic_ai import Embedder from pydantic_ai.embeddings.sentence_transformers import ( SentenceTransformerEmbeddingModel, ) # Create and configure the model yourself st_model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu') # Wrap it for use with Pydantic AI model = SentenceTransformerEmbeddingModel(st_model) embedder = Embedder(model) ``` ## Settings [`EmbeddingSettings`][pydantic_ai.embeddings.EmbeddingSettings] provides common configuration options that work across providers. Settings can be specified at the embedder level (applied to all calls) or per-call: ```python {title="embedding_settings.py"} from pydantic_ai import Embedder from pydantic_ai.embeddings import EmbeddingSettings # Default settings for all calls embedder = Embedder( 'openai:text-embedding-3-small', settings=EmbeddingSettings(dimensions=512), ) async def main(): # Override for a specific call result = await embedder.embed_query( 'Hello world', settings=EmbeddingSettings(dimensions=256), ) print(len(result.embeddings[0])) #> 1536 ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ ## Token Counting You can check token counts before embedding to avoid exceeding model limits: ```python {title="token_counting.py"} from pydantic_ai import Embedder embedder = Embedder('openai:text-embedding-3-small') async def main(): text = 'Hello world, this is a test.' # Count tokens in text token_count = await embedder.count_tokens(text) print(f'Tokens: {token_count}') #> Tokens: 7 # Check model's maximum input tokens (returns None if unknown) max_tokens = await embedder.max_input_tokens() print(f'Max tokens: {max_tokens}') #> Max tokens: 1024 ``` _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_ ## Testing Use [`TestEmbeddingModel`][pydantic_ai.embeddings.TestEmbeddingModel] for testing without making API calls: ```python {title="testing_embeddings.py"} from pydantic_ai import Embedder from pydantic_ai.embeddings import TestEmbeddingModel async def test_my_rag_system(): embedder = Embedder('openai:text-embedding-3-small') test_model = TestEmbeddingModel() with embedder.override(model=test_model): result = await embedder.embed_query('test query') # TestEmbeddingModel returns deterministic embeddings assert result.embeddings[0] == [1.0] * 8 # Check what settings were used assert test_model.last_settings is not None ``` ## Instrumentation Enable OpenTelemetry instrumentation for debugging and monitoring: ```python {title="instrumented_embeddings.py"} import logfire from pydantic_ai import Embedder logfire.configure() # Instrument a specific embedder embedder = Embedder('openai:text-embedding-3-small', instrument=True) # Or instrument all embedders globally Embedder.instrument_all() ``` See the [Debugging and Monitoring guide](logfire.md) for more details on using Logfire with Pydantic AI. ## Building Custom Embedding Models To integrate a custom embedding provider, subclass [`EmbeddingModel`][pydantic_ai.embeddings.EmbeddingModel]: ```python {title="custom_embedding_model.py"} from collections.abc import Sequence from pydantic_ai.embeddings import EmbeddingModel, EmbeddingResult, EmbeddingSettings from pydantic_ai.embeddings.result import EmbedInputType class MyCustomEmbeddingModel(EmbeddingModel): @property def model_name(self) -> str: return 'my-custom-model' @property def system(self) -> str: return 'my-provider' async def embed( self, inputs: str | Sequence[str], *, input_type: EmbedInputType, settings: EmbeddingSettings | None = None, ) -> EmbeddingResult: inputs, settings = self.prepare_embed(inputs, settings) # Call your embedding API here embeddings = [[0.1, 0.2, 0.3] for _ in inputs] # Placeholder return EmbeddingResult( embeddings=embeddings, inputs=inputs, input_type=input_type, model_name=self.model_name, provider_name=self.system, ) ``` Use [`WrapperEmbeddingModel`][pydantic_ai.embeddings.WrapperEmbeddingModel] if you want to wrap an existing model to add custom behavior like caching or logging.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pydantic/pydantic-ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server