de en es ja ko ru zh

docs-mcp-server

by arabold

TypeScript

MIT License

542

676

Overview InspectNew Endpoints Schema Related Servers Reviews Score

Need Help?View Source Code Report Issue

docs-mcp-server
docs

data-storage.md•6.93 kB

# Data Storage ## Overview The storage system uses SQLite with a normalized schema design for efficient document storage, retrieval, and version management. ## Database Schema ### Libraries Table Core library metadata and organization: ```sql CREATE TABLE libraries ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL UNIQUE, created_at DATETIME DEFAULT CURRENT_TIMESTAMP, updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ); ``` **Purpose:** Library name normalization and metadata storage. ### Versions Table Version tracking with comprehensive status and configuration: ```sql CREATE TABLE versions ( id INTEGER PRIMARY KEY AUTOINCREMENT, library_id INTEGER NOT NULL, version TEXT, status TEXT NOT NULL DEFAULT 'pending', indexed_at DATETIME, error_message TEXT, -- Job state fields job_status TEXT DEFAULT 'queued', progress_current INTEGER DEFAULT 0, progress_total INTEGER DEFAULT 0, -- Configuration storage scraper_config TEXT, created_at DATETIME DEFAULT CURRENT_TIMESTAMP, updated_at DATETIME DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (library_id) REFERENCES libraries (id) ); ``` **Purpose:** Job state management, progress tracking, and scraper configuration persistence. ### Documents Table Document content with embeddings and metadata: ```sql CREATE TABLE documents ( id INTEGER PRIMARY KEY AUTOINCREMENT, version_id INTEGER NOT NULL, title TEXT NOT NULL, content TEXT NOT NULL, url TEXT NOT NULL, order_index INTEGER NOT NULL, embedding BLOB, metadata TEXT, created_at DATETIME DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (version_id) REFERENCES versions (id) ); ``` **Purpose:** Content storage with vector embeddings and search metadata. ## Schema Evolution ### Migration System Sequential SQL migrations in `db/migrations/`: - `000-initial-schema.sql`: Base schema creation - `001-add-indexed-at-column.sql`: Indexing timestamp - `002-normalize-library-table.sql`: Library normalization - `003-normalize-vector-table.sql`: Vector storage optimization - `004-complete-normalization.sql`: Full schema normalization - `005-add-status-tracking.sql`: Job status tracking - `006-add-scraper-options.sql`: Configuration persistence ### Migration Application Automatic migration execution: - Check current schema version - Apply pending migrations in sequence - Validate schema integrity - Handle migration failures gracefully ## Data Location ### Storage Directory Resolution Database location determined by priority: 1. Project-local `.store` directory 2. OS-specific application data directory 3. Temporary directory as fallback ### Cross-Platform Support Platform-specific paths: - **macOS:** `~/Library/Application Support/docs-mcp-server/` - **Linux:** `~/.local/share/docs-mcp-server/` - **Windows:** `%APPDATA%/docs-mcp-server/` ## Document Management ### DocumentManagementService Handles document lifecycle operations: **Core Operations:** - Document addition and removal - Version management and cleanup - Library organization - Duplicate detection **Version Resolution:** - Exact version matching - Semantic version ranges - Latest version fallback - Version conflict resolution ### Document Storage Flow 1. Create or resolve library record 2. Create version record with job configuration 3. Process and store document chunks 4. Generate and store embeddings 5. Update version status and metadata ## Embedding Management ### Vector Storage Embeddings stored as BLOB data: - Consistent 1536-dimensional vectors - Provider-agnostic storage format - Efficient binary serialization - Null handling for missing embeddings ### EmbeddingFactory Centralized embedding generation: - Multiple provider support (OpenAI, Google, Azure, AWS) - Consistent vector dimensions - Error handling and retry logic - Rate limiting and quota management ### Provider Configuration Support for multiple embedding providers: **OpenAI:** - `text-embedding-3-small` (default) - `text-embedding-3-large` - Custom API endpoints (Ollama compatibility) **Google:** - Gemini embedding models - Vertex AI integration - Service account authentication **Azure:** - Azure OpenAI service - Custom deployment support - Region-specific endpoints **AWS:** - Bedrock embedding models - IAM-based authentication - Regional deployment support ## Search Implementation ### DocumentRetrieverService Handles search and retrieval operations: **Search Methods:** - Vector similarity search - Full-text search - Hybrid search combining both - Context-aware result ranking **Context Retrieval:** - Parent-child chunk relationships - Sibling chunk context - Document-level metadata - Sequential ordering preservation ### Search Optimization Performance optimizations: - Vector similarity indexing - Full-text search indexes - Query result caching - Batch retrieval operations ## Data Consistency ### Write-Through Architecture Immediate persistence of state changes: - Job status updates - Progress tracking - Configuration changes - Error information ### Transaction Management Database transactions for consistency: - Atomic document storage - Version state transitions - Batch operations - Error rollback handling ### Concurrent Access Safe concurrent database access: - Connection pooling - Transaction isolation - Lock management - Deadlock prevention ## Performance Considerations ### Index Strategy Database indexes for performance: - Primary keys on all tables - Foreign key indexes - Search-specific indexes - Composite indexes for common queries ### Query Optimization Efficient query patterns: - Prepared statements - Batch operations - Result pagination - Query plan optimization ### Storage Efficiency Space-efficient storage: - Text compression for large content - Binary embedding storage - Metadata JSON optimization - Garbage collection for deleted records ## Backup and Recovery ### Data Export Export functionality for data portability: - Complete database export - Library-specific export - Version-specific export - Metadata preservation ### Data Import Import from various sources: - Previous database versions - External documentation sources - Configuration-based restoration - Duplicate detection during import ### Disaster Recovery Recovery mechanisms: - Database integrity checks - Automatic backup creation - Transaction log recovery - Schema validation and repair ## Monitoring and Maintenance ### Database Health Health monitoring capabilities: - Storage space utilization - Query performance metrics - Connection pool status - Error rate tracking ### Maintenance Operations Regular maintenance tasks: - Vacuum operations for SQLite - Index rebuilding - Orphaned record cleanup - Performance analysis ### Diagnostics Debugging and diagnostic tools: - Query execution analysis - Storage space breakdown - Relationship integrity checks - Performance bottleneck identification

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arabold/docs-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server