Skip to main content
Glama

Claude Skills MCP Server

architecture.md16.5 kB
# Architecture Guide This document describes the internal architecture of the Claude Skills MCP Server v1.0.0, which uses a two-package design to solve the Cursor timeout issue while maintaining simple user experience. ## Overview The v1.0.0 system uses a **frontend-backend architecture** with two separate packages: - **Frontend** (`claude-skills-mcp`): Lightweight MCP proxy that starts instantly - **Backend** (`claude-skills-mcp-backend`): Heavy server with vector search and skill loading The backend consists of seven core components working together to provide intelligent skill discovery through the Model Context Protocol. ## Core Components ### 1. Configuration System (`config.py`) **Purpose**: Manages server configuration and provides sensible defaults. **Key Features**: - Default configuration with both Anthropic and K-Dense AI scientific skills repositories - JSON-based config loading with validation - Fallback to defaults if config unavailable or malformed - Config printer (--example-config) showing default configuration - Support for multiple skill sources (GitHub repos and local directories) **Configuration Flow**: ``` Command line --config option ↓ Load JSON file ↓ Merge with defaults ↓ Pass to skill loader and search engine ``` ### 2. Skill Loader (`skill_loader.py`) **Purpose**: Load skills from various sources and parse their content. **Key Features**: - **GitHub repository loading** via API (no authentication required) - **Local directory scanning** for development and custom skills - **YAML frontmatter parsing** to extract skill metadata - **Support for multiple formats**: - Direct skills (SKILL.md files) - Claude Code plugin repositories - **Robust error handling**: - Network issues with retries - Missing files and malformed content - Rate limiting (60 requests/hour) - **Automatic caching** of GitHub API responses (24-hour validity) - **Document loading**: Scripts, references, images, and other assets **Loading Process**: ``` Source Configuration ↓ GitHub API → Cache → Parse SKILL.md ↓ Extract: name, description, content ↓ Load additional documents ↓ Return Skill objects ``` **Caching Mechanism**: - Cache location: System temp directory (`/tmp/claude_skills_mcp_cache/`) - Cache key: MD5 hash of URL + branch - Cache validity: 24 hours - Automatic invalidation after expiry - Dramatically reduces GitHub API usage **Lazy Document Loading**: To solve startup timeout issues (60+ seconds), documents are loaded lazily: **Problem**: Fetching all documents for 90 skills at startup caused timeouts. **Solution**: 1. **Startup**: Load only SKILL.md files + document metadata (paths, sizes, types, URLs) 2. **On-Demand**: Fetch document content when `read_skill_document` is called 3. **Memory Cache**: Cache in Skill object for repeated access 4. **Disk Cache**: Persist to `/tmp/claude_skills_mcp_cache/documents/` for future runs **Performance Impact**: - Startup time: 60s → 15s (4x improvement) - Network requests at startup: 300+ → 90 (SKILL.md only) - First document access: ~200ms (network fetch + cache) - Subsequent access: <1ms (memory or disk cache) **Cache Directory Structure**: ``` /tmp/claude_skills_mcp_cache/ ├── {md5_hash}.json # GitHub API tree cache (24h TTL) └── documents/ ├── {md5_hash}.cache # Individual document cache (permanent) ├── {md5_hash}.cache └── ... ``` **Document Access Flow**: ``` read_skill_document called ↓ Match documents by pattern ↓ For each matched document: ↓ Check if fetched? → Yes: Use existing content ↓ No skill.get_document(path) ↓ Check memory cache → Found: Return ↓ Not found Check disk cache → Found: Return ↓ Not found Fetch from GitHub ↓ Save to disk cache ↓ Cache in memory ↓ Return content ``` ### 3. Search Engine (`search_engine.py`) **Purpose**: Enable semantic search over skill descriptions using vector embeddings. **Key Features**: - **Sentence-transformers** for local embeddings - **Default model**: `all-MiniLM-L6-v2` (384 dimensions, ~90MB) - **Vector indexing** at startup for fast queries - **Cosine similarity** search algorithm - **Configurable top-K** results - **No API keys** required - fully local operation **Search Process**: ``` Startup: Load skills → Generate embeddings → Build index Query: Encode query → Compute similarity → Rank → Return top-K ``` **Performance Characteristics**: - Startup: 5-10 seconds (load model + index skills) - Query time: <1 second - Memory: ~500MB (model + embeddings) - Scales well: tested with 70+ skills **Embedding Model Details**: - **all-MiniLM-L6-v2**: Fast, good quality, 384 dimensions - **all-mpnet-base-v2**: Higher quality, slower, 768 dimensions (optional) - Models are cached after first download ### 4. MCP Handlers (`mcp_handlers.py`) and HTTP Server (`http_server.py`) **Purpose**: Implement the Model Context Protocol specification over Streamable HTTP. **Key Features**: - **Standard MCP protocol** implementation - **Three tools** with optimized descriptions: 1. `find_helpful_skills` - Semantic search 2. `read_skill_document` - Access skill files 3. `list_skills` - Browse all skills - **Progressive disclosure** of skill content: - Tool descriptions (always visible) - Skill metadata (on search) - Full content (when relevant) - Referenced files (on demand) - **Content truncation** (configurable) - **Streamable HTTP transport** for remote access - **Frontend Proxy** (`mcp_proxy.py`, `backend_manager.py`): - Stdio MCP server for Cursor - HTTP MCP client for backend - Backend process management - Auto-downloads backend via `uvx` - **Formatted output** with: - Relevance scores - Source links - Document metadata **Tool Invocation Flow (v1.0.0)**: ``` AI Assistant (e.g., Cursor) ↓ Frontend Proxy (stdio MCP server) ↓ HTTP Client → Backend (streamable HTTP MCP server) ↓ MCP Handler (find_helpful_skills, read_skill_document, list_skills) ↓ Search Engine / Skill Loader ↓ Format Response ↓ Frontend Proxy → AI Assistant ``` **Progressive Disclosure Implementation**: 1. **Level 1**: Tool names/descriptions (always in context) 2. **Level 2**: Skill names/descriptions (search results) 3. **Level 3**: Full SKILL.md content (when skill is relevant) 4. **Level 4**: Additional documents (scripts, data, references) This architecture minimizes context window usage while ensuring all necessary information is available when needed. ### 5. Entry Point (`__main__.py`) **Purpose**: Provide CLI interface and manage server lifecycle. **Key Features**: - **CLI argument parsing**: - `--config`: Custom configuration file - `--example-config`: Print example config - `--verbose`: Enable debug logging - **Async server lifecycle**: - Load configuration - Initialize components - Run server - Handle shutdown gracefully - **Comprehensive error handling**: - Missing dependencies - Network failures - Invalid configuration - **Logging configuration**: - Info level by default - Debug level with --verbose - Structured log messages **Startup Sequence**: ``` Parse CLI arguments ↓ Load configuration ↓ Initialize skill loader ↓ Load skills from sources (background thread) ↓ Initialize search engine ↓ Index skills (incremental) ↓ Initialize auto-update system (if enabled) ↓ Start MCP server ↓ Listen for tool calls ``` ### 6. Auto-Update System (`update_checker.py`, `scheduler.py`, `state_manager.py`) **Purpose**: Automatically detect and reload skills when remote or local sources change. **Key Features**: - **Hourly scheduling** synchronized to exact clockface hours (e.g., 12:00, 13:00) - **Efficient change detection**: - GitHub: Commit SHA comparison (1 API call per repo) - Local: File modification time tracking - **State persistence** survives server restarts - **API rate limit awareness** (60 req/hr unauthenticated, 5000 with token) - **Graceful error handling** with retry on next cycle **Components**: **UpdateChecker**: - Orchestrates update checking across all sources - Tracks API usage and warns when approaching limits - Returns list of changed sources for selective reloading **GitHubSourceTracker**: - Checks HEAD commit SHA via GitHub API - Compares against last known SHA from persistent state - Only triggers update if SHA has changed - Respects rate limits and tracks usage **LocalSourceTracker**: - Scans for `SKILL.md` files in configured directories - Tracks modification times of all skill files - Detects new, modified, or deleted files **HourlyScheduler**: - Calculates time until next exact hour on startup - Runs update checks at configured intervals (default: 60 min) - Aligns to clockface hours for predictable scheduling - Handles cancellation and errors gracefully **StateManager**: - Persists commit SHAs and modification times to disk - Cache location: `/tmp/claude_skills_mcp_cache/state/` - Prevents false positives on first check after restart **Update Flow**: ``` Hourly Scheduler (wait until :00) ↓ Check GitHub sources (commit SHA) ↓ Check local sources (mtime) ↓ Changes detected? ├─ No → Log and continue └─ Yes → Reload all skills ↓ Re-index embeddings ↓ Update complete ``` **Configuration**: ```json { "auto_update_enabled": true, "auto_update_interval_minutes": 60, "github_api_token": null } ``` **API Budget** (default 2 GitHub sources): - Commit checks: 2 calls/hour - On change: +2 tree API calls = 4 total/hour - Remaining: 56 calls/hour for other operations - Raw content access: Unlimited (doesn't count against limit) See [auto-update.md](auto-update.md) for detailed documentation. ## Data Flow ### Complete Request Flow ``` User Query ↓ AI Assistant (Claude, GPT, etc.) ↓ MCP Client ↓ MCP Server (stdio) ↓ Tool Handler (find_helpful_skills) ↓ Search Engine ↓ Cosine Similarity Computation ↓ Rank Skills ↓ Format Response ↓ Return to AI Assistant ↓ AI uses skill content ``` ### Skill Loading Flow ``` Configuration ↓ Skill Loader ├─ GitHub API (with caching) │ ↓ │ Download tree │ ↓ │ Find SKILL.md files │ ↓ │ Download content │ ↓ │ Load documents │ └─ Local Filesystem ↓ Scan directories ↓ Find SKILL.md files ↓ Read content ↓ Load documents ↓ Parse SKILL.md (YAML + Markdown) ↓ Create Skill objects ↓ Return to Search Engine for indexing ``` ## Design Decisions ### Why Local Embeddings? - **No API keys required**: Easier setup for users - **Privacy**: All processing happens locally - **Cost**: No per-query charges - **Speed**: <1 second per query - **Offline**: Works without internet (after initial setup) **Trade-off**: Slightly lower quality than large cloud models, but excellent for this use case. ### Why Progressive Disclosure? Following Anthropic's Agent Skills architecture: 1. **Context Window Efficiency**: Don't load all skills upfront 2. **Relevance Filtering**: Only show skills that match the task 3. **On-Demand Detail**: Load full content only when needed 4. **Scalability**: Works with hundreds of skills without overwhelming context ### Why Caching? **Problem**: GitHub API has a 60 requests/hour limit for unauthenticated access. **Solution**: - Cache tree API responses (the rate-limited call) - 24-hour validity is reasonable for skill repositories - Dramatically speeds up development and testing - Cache in temp directory for automatic cleanup **Benefits**: - First run: ~20-30 seconds (with API calls + lazy document loading) - Subsequent runs: ~10-15 seconds (from cache) - No rate limit issues during development ### Why Three Tools? 1. **`find_helpful_skills`**: Task-oriented discovery (main use case) 2. **`read_skill_document`**: Access scripts and assets (progressive disclosure) 3. **`list_skills`**: Exploration and debugging (understand what's available) This separation allows AI assistants to: - Find relevant skills efficiently - Access additional resources on demand - Debug configuration issues - Explore available capabilities ## Extension Points The architecture is designed to be extensible: ### Adding New Skill Sources Implement in `skill_loader.py`: ```python def load_from_custom_source(url: str, config: dict) -> list[Skill]: # Your implementation return skills ``` Then add to `load_all_skills()`. ### Adding New Embedding Models Update `config.py`: ```python "embedding_model": "your-model-name" ``` The search engine automatically loads any sentence-transformers model. ### Adding New Tools 1. Define tool in `server.py` `list_tools()` 2. Implement handler method `_handle_your_tool()` 3. Add route in `call_tool()` ### Custom Skill Formats Extend `parse_skill_md()` in `skill_loader.py` to handle custom frontmatter fields or markdown extensions. ## Performance Optimization ### Startup Time - **Fast path**: Use local directories instead of GitHub - **Lazy loading**: Consider implementing lazy skill loading - **Smaller model**: Use `all-MiniLM-L6-v2` instead of larger models ### Query Time Already very fast (<1 second). Further optimization possible: - **Approximate search**: Use FAISS or similar for 1000+ skills - **Caching**: Cache frequent queries (not implemented) ### Memory Usage - **Model size**: 90-420MB depending on choice - **Skills**: ~1KB per skill - **Documents**: Varies based on skill complexity Total: ~500MB typical, scales linearly with skill count. ## Security Considerations ### GitHub API - **No authentication**: Uses public API (60 req/hour limit) - **No credentials**: Never stores or transmits auth tokens - **HTTPS only**: All GitHub requests use TLS ### Local Files - **Path validation**: Checks for directory traversal - **Home directory expansion**: Supports `~` in paths - **Error isolation**: Failed skills don't crash server ### MCP Protocol - **Stdio transport**: Isolated from network - **No code execution**: Skills are data only (markdown) - **Sandboxing**: Running via MCP provides OS-level isolation **Note**: The `read_skill_document` tool can access Python scripts and other files. These are loaded as text/data only, not executed. ## Testing Strategy See [Testing Guide](testing.md) for comprehensive testing instructions. **Architecture tests**: - Unit tests for each component - Integration tests for data flow - End-to-end tests with real repositories **Coverage**: - Configuration: 86% - Skill Loader: 68% - Search Engine: 100% - Server: Tested via integration tests ## Troubleshooting ### Server won't start Check: 1. Python 3.12 installed (not 3.13) 2. Dependencies installed (`uv sync`) 3. Configuration valid JSON ### Skills not loading Check: 1. GitHub rate limit (wait an hour) 2. Repository exists and is public 3. SKILL.md files have valid frontmatter 4. Use `--verbose` to see detailed logs ### Search returns poor results Check: 1. Skill descriptions are specific and descriptive 2. Query uses domain-appropriate terminology 3. Increase `top_k` to see more results 4. Consider using larger embedding model ### High memory usage Solutions: 1. Use smaller embedding model 2. Limit number of skill sources 3. Use subpath filtering to load fewer skills ## Package Structure (v1.0.0) ### Frontend Package (`claude-skills-mcp`) **Location**: `packages/frontend/` **Modules**: - `__main__.py`: CLI entry point with argument forwarding - `mcp_proxy.py`: MCP stdio server + HTTP client proxy - `backend_manager.py`: Backend process lifecycle management **Dependencies**: `mcp`, `httpx` (~15 MB total) ### Backend Package (`claude-skills-mcp-backend`) **Location**: `packages/backend/` **Modules**: - `__main__.py`: CLI entry point - `http_server.py`: Streamable HTTP server with Starlette/Uvicorn - `mcp_handlers.py`: MCP tool implementations - `search_engine.py`: Vector search with sentence-transformers - `skill_loader.py`: GitHub and local skill loading - `config.py`: Configuration management **Dependencies**: `mcp`, `torch`, `sentence-transformers`, `starlette`, `uvicorn`, `httpx`, `numpy` (~250 MB total)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K-Dense-AI/claude-skills-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server