Skip to main content
Glama

AVS Document Search System

by patw
create_manage_indexes.md7.51 kB
# How to Index Fields for Vector Search in MongoDB MongoDB's Atlas Vector Search enables you to perform semantic searches on your data by indexing vector embeddings. You can use the `vectorSearch` type to index fields that contain vector data, along with additional fields for pre-filtering your results. ## Creating Vector Search Indexes You can create Atlas Vector Search indexes using several methods: the Atlas UI, Atlas Administration API, Atlas CLI, `mongosh`, or supported MongoDB drivers. The index definition requires at least one vector-type field and can include additional filter fields. Here's how to define a basic vector search index: ```python # Python example using PyMongo from pymongo.operations import SearchIndexModel # Basic vector index definition index_model = SearchIndexModel( definition={ "fields": [ { "type": "vector", "path": "plot_embedding", "numDimensions": 1536, "similarity": "dotProduct" } ] }, name="vector_index", type="vectorSearch" ) # Create the index result = collection.create_search_index(model=index_model) print(f"Created index: {result}") ``` ## Understanding Index Fields Your vector search index definition contains several key components: ### Vector Type Fields Vector fields must contain numeric arrays representing vector embeddings. You can specify different data types: - BSON `double` - Binary `BinData` with subtype `float32`, `int1`, or `int8` - Arrays of floats ```javascript // MongoDB Shell example db.embedded_movies.createSearchIndex( "vector_index", "vectorSearch", { "fields": [ { "type": "vector", "path": "plot_embedding", "numDimensions": 1536, "similarity": "dotProduct" } ] } ); ``` ### Filter Type Fields You can add filter fields to narrow down your search results before performing vector similarity calculations: ```python # Python example with filter fields index_model = SearchIndexModel( definition={ "fields": [ { "type": "vector", "path": "plot_embedding", "numDimensions": 1536, "similarity": "dotProduct" }, { "type": "filter", "path": "genres" }, { "type": "filter", "path": "year" } ] }, name="vector_index", type="vectorSearch" ) ``` ## Similarity Functions MongoDB supports three similarity functions for vector searches: 1. **euclidean** - Measures distance between vectors 2. **cosine** - Measures similarity based on angle between vectors 3. **dotProduct** - Measures similarity like cosine but considers vector magnitude ```python # Example using different similarity functions index_definition = { "fields": [ { "type": "vector", "path": "embedding", "numDimensions": 512, "similarity": "cosine" # or "euclidean", "dotProduct" } ] } ``` ## Pre-filtering Data Pre-filtering helps optimize query performance by reducing the number of vectors that need to be compared. You can filter on various data types: ```python # Java example with filter fields import com.mongodb.client.model.SearchIndexModel; import org.bson.Document; Bson definition = new Document( "fields", Arrays.asList( new Document("type", "vector") .append("path", "plot_embedding") .append("numDimensions", 1536) .append("similarity", "dotProduct"), new Document("type", "filter") .append("path", "genres"), new Document("type", "filter") .append("path", "year") ) ); ``` ## Index Considerations ### Data Types and Dimensions - Vector dimensions must be less than or equal to 8192 - Different data types (float32, int1, int8) have specific requirements - For automatic quantization, use `scalar` or `binary` options ### Performance Optimization The HNSW (Hierarchical Navigable Small Worlds) graph parameters can be tuned for better performance: ```python # Example with HNSW options index_definition = { "fields": [ { "type": "vector", "path": "embedding", "numDimensions": 1536, "similarity": "dotProduct", "hnswOptions": { "maxEdges": 20, # Higher values improve recall "numEdgeCandidates": 200 # Higher values improve quality } } ] } ``` ## Supported Clients You can create and manage Atlas Vector Search indexes through multiple clients: - **Atlas UI**: Web-based interface - **mongosh**: Command-line shell - **Atlas CLI**: Command-line interface - **Atlas Administration API**: REST API - **MongoDB Drivers**: Multiple language drivers ## View Existing Indexes ```python # Python example to view indexes cursor = collection.list_search_indexes() for index in cursor: print(f"Index: {index['name']}") print(f"Type: {index['type']}") print(f"Status: {index['status']}") ``` ## Edit Existing Indexes You can modify existing vector search indexes without renaming or changing types: ```python # Python example to update an index definition = { "fields": [ { "type": "vector", "path": "plot_embedding", "numDimensions": 1536, "similarity": "dotProduct", "quantization": "scalar" }, { "type": "filter", "path": "genres" } ] } # Update the existing index collection.update_search_index("vector_index", definition) ``` ## Delete Indexes ```python # Python example to delete an index collection.drop_search_index("vector_index") ``` ## Index Status Management Indexes go through several states during creation and updates: 1. **Not Started**: Index creation has not begun 2. **Initial Sync**: Index is being built 3. **Active**: Index is ready for queries 4. **Recovering**: Replication issues occurred 5. **Failed**: Index creation failed 6. **Delete in Progress**: Index is being removed ## Best Practices 1. **Choose the right similarity function** based on your embedding model 2. **Use pre-filtering** to improve performance 3. **Consider quantization** for memory optimization 4. **Monitor index status** during building process 5. **Plan for index maintenance** as your data grows ```python # Complete example combining all concepts from pymongo.operations import SearchIndexModel # Create a comprehensive vector search index index_model = SearchIndexModel( definition={ "fields": [ { "type": "vector", "path": "plot_embedding", "numDimensions": 1536, "similarity": "dotProduct", "quantization": "scalar" }, { "type": "filter", "path": "genres" }, { "type": "filter", "path": "year" } ] }, name="movie_vector_index", type="vectorSearch" ) # Create the index result = collection.create_search_index(model=index_model) print(f"Created vector search index: {result}") ``` This approach ensures efficient vector search operations while maintaining good performance through proper indexing and filtering strategies.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/patw/avs-docs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server