Skip to main content
Glama

GEDCOM MCP Server

by airy10
TECHNICAL_SPECIFICATION.md20.3 kB
# GEDCOM MCP Server - Technical Specification ## Overview This document provides technical specifications for implementing key improvements to the GEDCOM MCP server, focusing on enhanced search capabilities, better error handling, and performance improvements. ## 1. Enhanced Search Capabilities ### 1.1 Fuzzy Search Implementation #### Requirements - Integrate fuzzy string matching library (fuzzywuzzy or rapidfuzz) - Maintain backward compatibility with existing exact search - Provide configurable similarity thresholds - Support for name, place, and other text fields #### Implementation Plan ```python # Add to requirements.txt fuzzywuzzy>=0.18.0 python-levenshtein>=0.12.0 # New tool in fastmcp_server.py @mcp.tool() async def fuzzy_search_person(name: str, ctx: Context, threshold: int = 80, max_results: int = 50) -> list: """Search for persons with fuzzy name matching. Args: name: Search term to match against person names threshold: Minimum similarity score (0-100) max_results: Maximum number of results to return """ from fuzzywuzzy import fuzz, process gedcom_ctx = get_gedcom_context(ctx) if not gedcom_ctx.gedcom_parser: return [{"error": "No GEDCOM file loaded. Please load a GEDCOM file first."}] # Prepare list of names for fuzzy matching choices = [] person_lookup = {} for person_id, individual in gedcom_ctx.individual_lookup.items(): person_name = individual.get_name() if isinstance(person_name, tuple): person_name = " ".join(str(part) for part in person_name if part) else: person_name = str(person_name) if person_name else "" if person_name: # Only include non-empty names choices.append(person_name) person_lookup[person_name] = person_id # Perform fuzzy search results = process.extract(name, choices, limit=max_results) # Filter by threshold and format results matches = [] for match_name, score in results: if score >= threshold: person_id = person_lookup[match_name] person = get_person_details_internal(person_id, gedcom_ctx) if person: matches.append({ "person": person.model_dump(), "similarity_score": score }) return matches ``` #### Testing - Unit tests for various name variations - Performance tests with large datasets - Integration tests with existing search functionality ### 1.2 Advanced Querying #### Requirements - Support for complex query conditions - Logical operators (AND, OR, NOT) - Comparison operators ($gt, $lt, $eq, $ne, $in, $nin) - Regular expression support #### Implementation Plan ```python @mcp.tool() async def query_people_advanced(ctx: Context, filters: str = "{}", page: int = 1, page_size: int = 100) -> dict: """Advanced querying with complex conditions and aggregations. Args: filters: JSON string containing query filters page: Page number for pagination page_size: Number of results per page """ import json import operator gedcom_ctx = get_gedcom_context(ctx) if not gedcom_ctx.gedcom_parser: return {"error": "No GEDCOM file loaded. Please load a GEDCOM file first."} try: query_filters = json.loads(filters) if filters else {} except json.JSONDecodeError as e: return {"error": f"Invalid JSON in filters parameter: {e}"} # Operator mapping ops = { "$gt": operator.gt, "$lt": operator.lt, "$gte": operator.ge, "$lte": operator.le, "$eq": operator.eq, "$ne": operator.ne, "$in": lambda x, y: x in y, "$nin": lambda x, y: x not in y } matching_people = [] for person_id, individual in gedcom_ctx.individual_lookup.items(): person = get_person_details_internal(person_id, gedcom_ctx) if person and _matches_advanced_criteria(person, query_filters, ops): matching_people.append(person) # Sort and paginate matching_people.sort(key=lambda p: p.id) total_count = len(matching_people) total_pages = (total_count + page_size - 1) // page_size start_index = (page - 1) * page_size end_index = min(start_index + page_size, total_count) page_people = matching_people[start_index:end_index] return { "total_count": total_count, "total_pages": total_pages, "current_page": page, "page_size": page_size, "people": [person.model_dump() for person in page_people] } def _matches_advanced_criteria(person, filters, ops): """Check if a person matches advanced query criteria.""" for field, condition in filters.items(): person_value = getattr(person, field, None) if isinstance(condition, dict): # Handle operator conditions for op_name, op_value in condition.items(): if op_name in ops: if not ops[op_name](person_value, op_value): return False elif op_name == "$regex": import re if not re.search(op_value, str(person_value), re.IGNORECASE): return False else: # Handle direct value matching if person_value != condition: return False return True ``` ## 2. Better Error Handling ### 2.1 Structured Error Implementation #### Requirements - Consistent error format across all tools - Error codes for programmatic handling - Recovery suggestions for users - Logging for debugging #### Implementation Plan ```python class GedcomError(Exception): """Base exception for GEDCOM operations.""" def __init__(self, message: str, error_code: str = None, recovery_suggestion: str = None): self.message = message self.error_code = error_code or "UNKNOWN_ERROR" self.recovery_suggestion = recovery_suggestion super().__init__(self.message) def to_dict(self) -> dict: """Convert error to dictionary format.""" result = { "error": self.message, "error_code": self.error_code } if self.recovery_suggestion: result["recovery_suggestion"] = self.recovery_suggestion return result # Example usage in a tool @mcp.tool() async def load_gedcom(file_path: str, ctx: Context) -> dict: """Load and parse a GEDCOM file.""" import os from pathlib import Path # Validate file path if not file_path: raise GedcomError( "File path is required", error_code="MISSING_FILE_PATH", recovery_suggestion="Provide a valid file path to a GEDCOM file" ) # Check if file exists path = Path(file_path) if not path.exists(): raise GedcomError( f"File not found: {file_path}", error_code="FILE_NOT_FOUND", recovery_suggestion="Check that the file path is correct and the file exists" ) # Check if it's a file (not directory) if not path.is_file(): raise GedcomError( f"Path is not a file: {file_path}", error_code="NOT_A_FILE", recovery_suggestion="Provide a path to a GEDCOM file, not a directory" ) # Try to load the file gedcom_ctx = get_gedcom_context(ctx) try: success = load_gedcom_file(file_path, gedcom_ctx) if success: gedcom_ctx.gedcom_file_path = file_path return { "status": "success", "message": f"Successfully loaded GEDCOM file: {file_path}", "individuals": len(gedcom_ctx.individual_lookup), "families": len(gedcom_ctx.family_lookup) } else: raise GedcomError( f"Failed to parse GEDCOM file: {file_path}", error_code="PARSE_ERROR", recovery_suggestion="Check that the file is a valid GEDCOM format" ) except Exception as e: raise GedcomError( f"Error loading GEDCOM file: {str(e)}", error_code="LOAD_ERROR", recovery_suggestion="Check file permissions and format" ) ``` ## 3. Performance Improvements ### 3.1 Progress Indicators #### Requirements - Progress tracking for long-running operations - Regular status updates - Non-blocking operation where possible #### Implementation Plan ```python import time import logging from typing import Callable, Any logger = logging.getLogger(__name__) class ProgressTracker: """Track progress of long-running operations.""" def __init__(self, total_items: int, description: str, update_interval: int = 1000): self.total_items = total_items self.processed = 0 self.description = description self.update_interval = update_interval self.start_time = time.time() self.last_update = 0 def update(self, increment: int = 1, force: bool = False) -> None: """Update progress counter.""" self.processed += increment current_time = time.time() # Update if forced or if enough time has passed if force or (current_time - self.last_update) >= 1.0: # Update every second self._report_progress() self.last_update = current_time def _report_progress(self) -> None: """Report current progress.""" if self.total_items > 0: percentage = (self.processed / self.total_items) * 100 elapsed = time.time() - self.start_time # Estimate remaining time if self.processed > 0: rate = self.processed / elapsed remaining = (self.total_items - self.processed) / rate if rate > 0 else 0 else: remaining = 0 logger.info( f"{self.description}: {percentage:.1f}% complete " f"({self.processed}/{self.total_items}) - " f"Elapsed: {elapsed:.1f}s, Remaining: {remaining:.1f}s" ) def finish(self) -> None: """Mark operation as complete.""" self.processed = self.total_items self._report_progress() total_time = time.time() - self.start_time logger.info(f"{self.description}: Complete in {total_time:.1f}s") # Example usage in _rebuild_lookups def _rebuild_lookups(gedcom_ctx: GedcomContext): """Rebuild lookup dictionaries with progress tracking.""" logger.info("Rebuilding lookup dictionaries...") # Clear existing lookups gedcom_ctx.individual_lookup.clear() gedcom_ctx.family_lookup.clear() gedcom_ctx.source_lookup.clear() gedcom_ctx.note_lookup.clear() # Get root elements root_elements = gedcom_ctx.gedcom_parser.get_root_child_elements() total_elements = len(root_elements) # Create progress tracker progress = ProgressTracker(total_elements, "Rebuilding lookups") # Process elements for i, elem in enumerate(root_elements): pointer = elem.get_pointer() tag = elem.get_tag() if isinstance(elem, IndividualElement): gedcom_ctx.individual_lookup[pointer] = elem elif isinstance(elem, FamilyElement): gedcom_ctx.family_lookup[pointer] = elem elif tag == "SOUR": gedcom_ctx.source_lookup[pointer] = elem elif tag == "NOTE": gedcom_ctx.note_lookup[pointer] = elem # Update progress every 100 elements if i % 100 == 0: progress.update(100) progress.finish() logger.info( f"Rebuilt lookup dictionaries: {len(gedcom_ctx.individual_lookup)} individuals, " f"{len(gedcom_ctx.family_lookup)} families, {len(gedcom_ctx.source_lookup)} sources, " f"{len(gedcom_ctx.note_lookup)} notes" ) ``` ## 4. Batch Operations ### 4.1 Batch Update Implementation #### Requirements - Efficient processing of multiple updates - Transaction-like behavior with rollback - Detailed reporting of results #### Implementation Plan ```python @mcp.tool() async def batch_update_person_attributes(updates: str, ctx: Context) -> dict: """Update multiple person attributes in a single operation. Args: updates: JSON string containing list of updates Each update should have: person_id, attribute_tag, new_value """ import json gedcom_ctx = get_gedcom_context(ctx) if not gedcom_ctx.gedcom_parser: return {"error": "No GEDCOM file loaded. Please load a GEDCOM file first."} try: update_list = json.loads(updates) if updates else [] except json.JSONDecodeError as e: return {"error": f"Invalid JSON in updates parameter: {e}"} if not isinstance(update_list, list): return {"error": "Updates parameter must be a JSON array"} results = { "total_updates": len(update_list), "successful": 0, "failed": 0, "errors": [] } # Process updates for i, update in enumerate(update_list): try: # Validate update structure if not isinstance(update, dict): results["failed"] += 1 results["errors"].append({ "index": i, "error": "Update must be a JSON object" }) continue person_id = update.get("person_id") attribute_tag = update.get("attribute_tag") new_value = update.get("new_value") if not all([person_id, attribute_tag, new_value is not None]): results["failed"] += 1 results["errors"].append({ "index": i, "error": "Missing required fields: person_id, attribute_tag, new_value" }) continue # Perform update result = _update_person_attribute_internal(gedcom_ctx, person_id, attribute_tag, new_value) if "Error" in result: results["failed"] += 1 results["errors"].append({ "index": i, "person_id": person_id, "error": result }) else: results["successful"] += 1 except Exception as e: results["failed"] += 1 results["errors"].append({ "index": i, "error": str(e) }) # Clear caches after successful updates if results["successful"] > 0: gedcom_ctx.clear_caches() _rebuild_lookups(gedcom_ctx) return results ``` ## 5. Testing Strategy ### 5.1 Unit Tests ```python # tests/test_enhanced_search.py import unittest from unittest.mock import patch, MagicMock import sys import os sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from src.gedcom_mcp.gedcom_context import GedcomContext from src.gedcom_mcp.fastmcp_server import fuzzy_search_person class TestEnhancedSearch(unittest.TestCase): def setUp(self): self.ctx = MagicMock() self.gedcom_ctx = GedcomContext() @patch('src.gedcom_mcp.fastmcp_server.get_gedcom_context') @patch('src.gedcom_mcp.fastmcp_server.get_person_details_internal') def test_fuzzy_search_exact_match(self, mock_get_person, mock_get_context): # Test setup mock_get_context.return_value = self.gedcom_ctx self.gedcom_ctx.gedcom_parser = MagicMock() # Mock individual lookup mock_individual = MagicMock() mock_individual.get_name.return_value = "John Smith" self.gedcom_ctx.individual_lookup = {"@I1@": mock_individual} # Mock person details mock_person = MagicMock() mock_person.id = "@I1@" mock_person.name = "John Smith" mock_get_person.return_value = mock_person # Test execution result = fuzzy_search_person("John Smith", self.ctx, threshold=90) # Assertions self.assertIsInstance(result, list) # Note: This is a simplified test - actual implementation would require # more complex mocking of the fuzzywuzzy library if __name__ == '__main__': unittest.main() ``` ### 5.2 Integration Tests ```python # tests/test_batch_operations.py import unittest import json from unittest.mock import patch, MagicMock import sys import os sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) from src.gedcom_mcp.fastmcp_server import batch_update_person_attributes class TestBatchOperations(unittest.TestCase): def setUp(self): self.ctx = MagicMock() @patch('src.gedcom_mcp.fastmcp_server.get_gedcom_context') @patch('src.gedcom_mcp.fastmcp_server._update_person_attribute_internal') def test_batch_update_success(self, mock_update_attr, mock_get_context): # Test setup mock_gedcom_ctx = MagicMock() mock_get_context.return_value = mock_gedcom_ctx mock_gedcom_ctx.gedcom_parser = MagicMock() mock_update_attr.return_value = "Successfully updated attribute" # Test data updates = json.dumps([ { "person_id": "@I1@", "attribute_tag": "OCCU", "new_value": "Engineer" }, { "person_id": "@I2@", "attribute_tag": "RELI", "new_value": "Christian" } ]) # Test execution result = batch_update_person_attributes(updates, self.ctx) # Assertions self.assertIsInstance(result, dict) self.assertEqual(result["total_updates"], 2) self.assertEqual(result["successful"], 2) self.assertEqual(result["failed"], 0) self.assertEqual(len(result["errors"]), 0) if __name__ == '__main__': unittest.main() ``` ## 6. Deployment Considerations ### 6.1 Dependency Management Update `requirements.txt`: ``` fastmcp>=0.1.0 python-gedcom>=0.1.0 pydantic>=2.0.0 cachetools>=4.0.0 unidecode>=1.3.0 nameparser>=1.1.3 fuzzywuzzy>=0.18.0 python-levenshtein>=0.12.0 ``` Update `pyproject.toml`: ```toml [project.dependencies] # ... existing dependencies ... "fuzzywuzzy>=0.18.0", "python-levenshtein>=0.12.0", [project.optional-dependencies] dev = [ # ... existing dev dependencies ... "pytest-asyncio>=0.21.0", ] ``` ### 6.2 Performance Monitoring Add performance monitoring to critical functions: ```python import time import functools def performance_monitor(func): """Decorator to monitor function performance.""" @functools.wraps(func) def wrapper(*args, **kwargs): start_time = time.time() result = func(*args, **kwargs) end_time = time.time() logger.info(f"{func.__name__} executed in {end_time - start_time:.3f}s") return result return wrapper # Apply to critical functions @performance_monitor def _rebuild_lookups(gedcom_ctx: GedcomContext): # ... implementation ... pass ``` ## 7. Documentation Updates ### 7.1 API Documentation Update the help documentation in `fastmcp_server.py`: ```python # Add to the help template """ ## Enhanced Search Tools: - **fuzzy_search_person**(name, threshold, max_results) - Search for persons with fuzzy name matching - **query_people_advanced**(filters, page, page_size) - Advanced querying with complex conditions ## Batch Operations: - **batch_update_person_attributes**(updates) - Update multiple person attributes in a single operation ## Data Quality Tools: - **validate_gedcom_data**() - Validate GEDCOM data integrity and consistency """ ``` This technical specification provides a roadmap for implementing key improvements to the GEDCOM MCP server while maintaining backward compatibility and ensuring robust testing.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/airy10/GedcomMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server