Skip to main content
Glama

index_document

Index documents into Elasticsearch with duplicate prevention, intelligent ID generation, and schema validation. Use AI similarity checks to ensure accurate indexing while maintaining knowledge base integrity.

Instructions

Index a document into Elasticsearch with smart duplicate prevention and intelligent document ID generation. πŸ’‘ RECOMMENDED: Use 'create_document_template' tool first to generate a proper document structure and avoid validation errors.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
check_duplicatesNoCheck for existing documents with similar title before indexing
doc_idNoOptional document ID - if not provided, smart ID will be generated
documentYesDocument data to index as JSON object. πŸ’‘ RECOMMENDED: Use 'create_document_template' tool first to generate proper document format.
force_indexNoForce indexing even if potential duplicates are found. πŸ’‘ TIP: Set to True if content is genuinely new and not in knowledge base to avoid multiple tool calls
indexYesName of the Elasticsearch index to store the document
use_ai_similarityNoUse AI to analyze content similarity and provide intelligent recommendations
validate_schemaNoWhether to validate document structure for knowledge base format

Implementation Reference

  • The core implementation of the index_document tool handler. This FastMCP tool function performs document validation, smart duplicate detection using title matching and optional AI content similarity analysis, intelligent document ID generation, Elasticsearch indexing, and comprehensive error handling with user-friendly guidance.
    @app.tool( description="Index a document into Elasticsearch with smart duplicate prevention and intelligent document ID generation. πŸ’‘ RECOMMENDED: Use 'create_document_template' tool first to generate a proper document structure and avoid validation errors.", tags={"elasticsearch", "index", "document", "validation", "duplicate-prevention"} ) async def index_document( index: Annotated[str, Field(description="Name of the Elasticsearch index to store the document")], document: Annotated[Dict[str, Any], Field(description="Document data to index as JSON object. πŸ’‘ RECOMMENDED: Use 'create_document_template' tool first to generate proper document format.")], doc_id: Annotated[Optional[str], Field( description="Optional document ID - if not provided, smart ID will be generated")] = None, validate_schema: Annotated[ bool, Field(description="Whether to validate document structure for knowledge base format")] = True, check_duplicates: Annotated[ bool, Field(description="Check for existing documents with similar title before indexing")] = True, force_index: Annotated[ bool, Field(description="Force indexing even if potential duplicates are found. πŸ’‘ TIP: Set to True if content is genuinely new and not in knowledge base to avoid multiple tool calls")] = False, use_ai_similarity: Annotated[bool, Field( description="Use AI to analyze content similarity and provide intelligent recommendations")] = True, ctx: Context = None ) -> str: """Index a document into Elasticsearch with smart duplicate prevention.""" try: es = get_es_client() # Smart duplicate checking if enabled if check_duplicates and not force_index: title = document.get('title', '') content = document.get('content', '') if title: # First check simple title duplicates dup_check = check_title_duplicates(es, index, title) if dup_check['found']: duplicates_info = "\n".join([ f" πŸ“„ {dup['title']} (ID: {dup['id']})\n πŸ“ {dup['summary']}\n πŸ“… {dup['last_modified']}" for dup in dup_check['duplicates'][:3] ]) # Use AI similarity analysis if enabled and content is substantial if use_ai_similarity and content and len(content) > 200 and ctx: try: ai_analysis = await check_content_similarity_with_ai(es, index, title, content, ctx) action = ai_analysis.get('action', 'CREATE') confidence = ai_analysis.get('confidence', 0.5) reasoning = ai_analysis.get('reasoning', 'AI analysis completed') target_doc = ai_analysis.get('target_document_id', '') ai_message = f"\n\nπŸ€– **AI Content Analysis** (Confidence: {confidence:.0%}):\n" ai_message += f" 🎯 **Recommended Action**: {action}\n" ai_message += f" πŸ’­ **AI Reasoning**: {reasoning}\n" if action == "UPDATE" and target_doc: ai_message += f" πŸ“„ **Target Document**: {target_doc}\n" ai_message += f" πŸ’‘ **Suggestion**: Update existing document instead of creating new one\n" elif action == "DELETE": ai_message += f" πŸ—‘οΈ **AI Recommendation**: Existing content is superior, consider not creating this document\n" elif action == "MERGE" and target_doc: ai_message += f" πŸ”„ **Merge Target**: {target_doc}\n" ai_message += f" πŸ“ **Strategy**: {ai_analysis.get('merge_strategy', 'Combine unique information from both documents')}\n" elif action == "CREATE": ai_message += f" βœ… **AI Approval**: Content is sufficiently unique to create new document\n" # If AI says CREATE, allow automatic indexing pass # Show similar documents found by AI similar_docs = ai_analysis.get('similar_docs', []) if similar_docs: ai_message += f"\n πŸ“‹ **Similar Documents Analyzed**:\n" for i, doc in enumerate(similar_docs[:2], 1): ai_message += f" {i}. {doc['title']} (Score: {doc.get('elasticsearch_score', 0):.1f})\n" # If AI recommends CREATE with high confidence, proceed automatically if action == "CREATE" and confidence > 0.8: # Continue with indexing - don't return early pass else: # Return AI analysis for user review return ( f"⚠️ **Potential Duplicates Found** - {dup_check['count']} similar document(s):\n\n" + f"{duplicates_info}\n" + f"{ai_message}\n\n" + f"πŸ€” **What would you like to do?**\n" + f" 1️⃣ **FOLLOW AI RECOMMENDATION**: {action} as suggested by AI\n" + f" 2️⃣ **UPDATE existing document**: Modify one of the above instead\n" + f" 3️⃣ **SEARCH for more**: Use search tool to find all related content\n" + f" 4️⃣ **FORCE CREATE anyway**: Set force_index=True if this is truly unique\n\n" + f"πŸ’‘ **AI Recommendation**: {reasoning}\n" + f"πŸ” **Next Step**: Search for '{title}' to see all related documents\n\n" + f"⚑ **To force indexing**: Call again with force_index=True") except Exception as ai_error: # Fallback to simple duplicate check if AI fails return ( f"⚠️ **Potential Duplicates Found** - {dup_check['count']} similar document(s):\n\n" + f"{duplicates_info}\n\n" + f"⚠️ **AI Analysis Failed**: {str(ai_error)}\n\n" + f"πŸ€” **What would you like to do?**\n" + f" 1️⃣ **UPDATE existing document**: Modify one of the above instead\n" + f" 2️⃣ **SEARCH for more**: Use search tool to find all related content\n" + f" 3️⃣ **FORCE CREATE anyway**: Set force_index=True if this is truly unique\n\n" + f"πŸ’‘ **Recommendation**: Update existing documents to prevent knowledge base bloat\n" + f"πŸ” **Next Step**: Search for '{title}' to see all related documents\n\n" + f"⚑ **To force indexing**: Call again with force_index=True") else: # Simple duplicate check without AI return (f"⚠️ **Potential Duplicates Found** - {dup_check['count']} similar document(s):\n\n" + f"{duplicates_info}\n\n" + f"πŸ€” **What would you like to do?**\n" + f" 1️⃣ **UPDATE existing document**: Modify one of the above instead\n" + f" 2️⃣ **SEARCH for more**: Use search tool to find all related content\n" + f" 3️⃣ **FORCE CREATE anyway**: Set force_index=True if this is truly unique\n\n" + f"πŸ’‘ **Recommendation**: Update existing documents to prevent knowledge base bloat\n" + f"πŸ” **Next Step**: Search for '{title}' to see all related documents\n\n" + f"⚑ **To force indexing**: Call again with force_index=True") # Generate smart document ID if not provided if not doc_id: existing_ids = get_existing_document_ids(es, index) doc_id = generate_smart_doc_id( document.get('title', 'untitled'), document.get('content', ''), existing_ids ) document['id'] = doc_id # Ensure document has the ID # Validate document structure if requested if validate_schema: try: # Check if this looks like a knowledge base document if isinstance(document, dict) and "id" in document and "title" in document: validated_doc = validate_document_structure(document) document = validated_doc # Use the document ID from the validated document if not provided earlier if not doc_id: doc_id = document.get("id") else: # For non-knowledge base documents, still validate with strict mode if enabled validated_doc = validate_document_structure(document, is_knowledge_doc=False) document = validated_doc except DocumentValidationError as e: return f"❌ Validation failed:\n\n{format_validation_error(e)}" except Exception as e: return f"❌ Validation error: {str(e)}" # Index the document result = es.index(index=index, id=doc_id, body=document) success_message = f"βœ… Document indexed successfully:\n\n{json.dumps(result, indent=2, ensure_ascii=False)}" # Add smart guidance based on indexing result if result.get('result') == 'created': success_message += f"\n\nπŸŽ‰ **New Document Created**:\n" success_message += f" πŸ“„ **Document ID**: {doc_id}\n" success_message += f" πŸ†” **ID Strategy**: {'User-provided' if 'doc_id' in locals() and doc_id else 'Smart-generated'}\n" if check_duplicates: success_message += f" βœ… **Duplicate Check**: Passed - no similar titles found\n" else: success_message += f"\n\nπŸ”„ **Document Updated**:\n" success_message += f" πŸ“„ **Document ID**: {doc_id}\n" success_message += f" ⚑ **Action**: Replaced existing document with same ID\n" success_message += (f"\n\nπŸ’‘ **Smart Duplicate Prevention Active**:\n" + f" πŸ” **Auto-Check**: {'Enabled' if check_duplicates else 'Disabled'} - searches for similar titles\n" + f" πŸ€– **AI Analysis**: {'Enabled' if use_ai_similarity else 'Disabled'} - intelligent content similarity detection\n" + f" πŸ†” **Smart IDs**: Auto-generated from title with collision detection\n" + f" ⚑ **Force Option**: Use force_index=True to bypass duplicate warnings\n" + f" πŸ”„ **Update Recommended**: Modify existing documents instead of creating duplicates\n\n" + f"🀝 **Best Practices**:\n" + f" β€’ Search before creating: 'search(index=\"{index}\", query=\"your topic\")'\n" + f" β€’ Update existing documents when possible\n" + f" β€’ Use descriptive titles for better smart ID generation\n" + f" β€’ AI will analyze content similarity for intelligent recommendations\n" + f" β€’ Set force_index=True only when content is truly unique") return success_message except Exception as e: # Provide detailed error messages for different types of Elasticsearch errors error_message = "❌ Document indexing failed:\n\n" error_str = str(e).lower() if "connection" in error_str or "refused" in error_str: error_message += "πŸ”Œ **Connection Error**: Cannot connect to Elasticsearch server\n" error_message += f"πŸ“ Check if Elasticsearch is running at the configured address\n" error_message += f"πŸ’‘ Try: Use 'setup_elasticsearch' tool to start Elasticsearch\n\n" elif ("index" in error_str and "not found" in error_str) or "index_not_found_exception" in error_str: error_message += f"πŸ“ **Index Error**: Index '{index}' does not exist\n" error_message += f"πŸ“ The target index has not been created yet\n" error_message += f"πŸ’‘ **Suggestions for agents**:\n" error_message += f" 1. Use 'create_index' tool to create the index first\n" error_message += f" 2. Use 'list_indices' to see available indices\n" error_message += f" 3. Check the correct index name for your data type\n\n" elif "mapping" in error_str or "field" in error_str: error_message += f"πŸ—‚οΈ **Mapping Error**: Document structure conflicts with index mapping\n" error_message += f"πŸ“ Document fields don't match the expected index schema\n" error_message += f"πŸ’‘ Try: Adjust document structure or update index mapping\n\n" elif "version" in error_str or "conflict" in error_str: error_message += f"⚑ **Version Conflict**: Document already exists with different version\n" error_message += f"πŸ“ Another process modified this document simultaneously\n" error_message += f"πŸ’‘ Try: Use 'get_document' first, then update with latest version\n\n" elif "timeout" in error_str: error_message += "⏱️ **Timeout Error**: Indexing operation timed out\n" error_message += f"πŸ“ Document may be too large or index overloaded\n" error_message += f"πŸ’‘ Try: Reduce document size or retry later\n\n" else: error_message += f"⚠️ **Unknown Error**: {str(e)}\n\n" error_message += f"πŸ” **Technical Details**: {str(e)}" return error_message
  • Registers the index_document tool by importing the document_app from sub_servers.elasticsearch_document and mounting it into the unified Elasticsearch server app. This exposes index_document (along with delete_document and get_document) as part of the Elasticsearch toolset.
    from .sub_servers.elasticsearch_document import app as document_app from .sub_servers.elasticsearch_index import app as index_app from .sub_servers.elasticsearch_search import app as search_app from .sub_servers.elasticsearch_batch import app as batch_app # Create unified FastMCP application app = FastMCP( name="AgentKnowledgeMCP-Elasticsearch", version="2.0.0", instructions="Unified Elasticsearch tools for comprehensive knowledge management via modular server mounting" ) # ================================ # SERVER MOUNTING - MODULAR ARCHITECTURE # ================================ print("πŸ—οΈ Mounting Elasticsearch sub-servers...") # Mount all sub-servers into unified interface app.mount(snapshots_app) # 3 tools: snapshot management app.mount(index_metadata_app) # 3 tools: metadata governance app.mount(document_app) # 3 tools: document operations app.mount(index_app) # 3 tools: index management app.mount(search_app) # 2 tools: search & validation app.mount(batch_app) # 2 tools: batch operations
  • Top-level registration of the Elasticsearch tools including index_document by mounting the elasticsearch_server_app into the main AgentKnowledgeMCP server. Provides backward-compatible access to index_document without prefixes.
    # Mount Elasticsearch server with 'es' prefix # This provides: es_search, es_index_document, es_create_index, etc. app.mount(elasticsearch_server_app) # Mount Administrative operations server with 'admin' prefix # This provides: admin_get_config, admin_update_config, admin_server_status, etc. app.mount(admin_server_app) # Mount Prompt server for AgentKnowledgeMCP guidance # This provides: usage_guide, help_request (prompts for LLM assistance) app.mount(prompt_server_app)
  • Pydantic schema definition via Annotated parameters for the index_document tool inputs: index (str), document (Dict), doc_id (Optional[str]), validate_schema (bool), check_duplicates (bool), force_index (bool), use_ai_similarity (bool).
    async def index_document( index: Annotated[str, Field(description="Name of the Elasticsearch index to store the document")], document: Annotated[Dict[str, Any], Field(description="Document data to index as JSON object. πŸ’‘ RECOMMENDED: Use 'create_document_template' tool first to generate proper document format.")], doc_id: Annotated[Optional[str], Field( description="Optional document ID - if not provided, smart ID will be generated")] = None, validate_schema: Annotated[ bool, Field(description="Whether to validate document structure for knowledge base format")] = True, check_duplicates: Annotated[ bool, Field(description="Check for existing documents with similar title before indexing")] = True, force_index: Annotated[ bool, Field(description="Force indexing even if potential duplicates are found. πŸ’‘ TIP: Set to True if content is genuinely new and not in knowledge base to avoid multiple tool calls")] = False, use_ai_similarity: Annotated[bool, Field( description="Use AI to analyze content similarity and provide intelligent recommendations")] = True, ctx: Context = None

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/itshare4u/AgentKnowledgeMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server