Skip to main content
Glama

Claude Skills MCP Server

by K-Dense-AI
architecture.md12.8 kB
# Architecture Guide This document provides detailed information about the internal architecture of the Claude Skills MCP Server. ## Overview The server consists of five core components working together to provide intelligent skill discovery through the Model Context Protocol. ## Core Components ### 1. Configuration System (`config.py`) **Purpose**: Manages server configuration and provides sensible defaults. **Key Features**: - Default configuration with both Anthropic and K-Dense AI scientific skills repositories - JSON-based config loading with validation - Fallback to defaults if config unavailable or malformed - Config printer (--example-config) showing default configuration - Support for multiple skill sources (GitHub repos and local directories) **Configuration Flow**: ``` Command line --config option ↓ Load JSON file ↓ Merge with defaults ↓ Pass to skill loader and search engine ``` ### 2. Skill Loader (`skill_loader.py`) **Purpose**: Load skills from various sources and parse their content. **Key Features**: - **GitHub repository loading** via API (no authentication required) - **Local directory scanning** for development and custom skills - **YAML frontmatter parsing** to extract skill metadata - **Support for multiple formats**: - Direct skills (SKILL.md files) - Claude Code plugin repositories - **Robust error handling**: - Network issues with retries - Missing files and malformed content - Rate limiting (60 requests/hour) - **Automatic caching** of GitHub API responses (24-hour validity) - **Document loading**: Scripts, references, images, and other assets **Loading Process**: ``` Source Configuration ↓ GitHub API → Cache → Parse SKILL.md ↓ Extract: name, description, content ↓ Load additional documents ↓ Return Skill objects ``` **Caching Mechanism**: - Cache location: System temp directory (`/tmp/claude_skills_mcp_cache/`) - Cache key: MD5 hash of URL + branch - Cache validity: 24 hours - Automatic invalidation after expiry - Dramatically reduces GitHub API usage **Lazy Document Loading**: To solve startup timeout issues (60+ seconds), documents are loaded lazily: **Problem**: Fetching all documents for 90 skills at startup caused timeouts. **Solution**: 1. **Startup**: Load only SKILL.md files + document metadata (paths, sizes, types, URLs) 2. **On-Demand**: Fetch document content when `read_skill_document` is called 3. **Memory Cache**: Cache in Skill object for repeated access 4. **Disk Cache**: Persist to `/tmp/claude_skills_mcp_cache/documents/` for future runs **Performance Impact**: - Startup time: 60s → 15s (4x improvement) - Network requests at startup: 300+ → 90 (SKILL.md only) - First document access: ~200ms (network fetch + cache) - Subsequent access: <1ms (memory or disk cache) **Cache Directory Structure**: ``` /tmp/claude_skills_mcp_cache/ ├── {md5_hash}.json # GitHub API tree cache (24h TTL) └── documents/ ├── {md5_hash}.cache # Individual document cache (permanent) ├── {md5_hash}.cache └── ... ``` **Document Access Flow**: ``` read_skill_document called ↓ Match documents by pattern ↓ For each matched document: ↓ Check if fetched? → Yes: Use existing content ↓ No skill.get_document(path) ↓ Check memory cache → Found: Return ↓ Not found Check disk cache → Found: Return ↓ Not found Fetch from GitHub ↓ Save to disk cache ↓ Cache in memory ↓ Return content ``` ### 3. Search Engine (`search_engine.py`) **Purpose**: Enable semantic search over skill descriptions using vector embeddings. **Key Features**: - **Sentence-transformers** for local embeddings - **Default model**: `all-MiniLM-L6-v2` (384 dimensions, ~90MB) - **Vector indexing** at startup for fast queries - **Cosine similarity** search algorithm - **Configurable top-K** results - **No API keys** required - fully local operation **Search Process**: ``` Startup: Load skills → Generate embeddings → Build index Query: Encode query → Compute similarity → Rank → Return top-K ``` **Performance Characteristics**: - Startup: 5-10 seconds (load model + index skills) - Query time: <1 second - Memory: ~500MB (model + embeddings) - Scales well: tested with 70+ skills **Embedding Model Details**: - **all-MiniLM-L6-v2**: Fast, good quality, 384 dimensions - **all-mpnet-base-v2**: Higher quality, slower, 768 dimensions (optional) - Models are cached after first download ### 4. MCP Server (`server.py`) **Purpose**: Implement the Model Context Protocol specification. **Key Features**: - **Standard MCP protocol** implementation - **Three tools** with optimized descriptions: 1. `search_skills` - Semantic search 2. `read_skill_document` - Access skill files 3. `list_skills` - Browse all skills - **Progressive disclosure** of skill content: - Tool descriptions (always visible) - Skill metadata (on search) - Full content (when relevant) - Referenced files (on demand) - **Content truncation** (configurable) - **Stdio transport** for easy integration - **Formatted output** with: - Relevance scores - Source links - Document metadata **Tool Invocation Flow**: ``` AI Assistant ↓ MCP Client ↓ Tool Call (search_skills, read_skill_document, list_skills) ↓ Server Handler ↓ Search Engine / Skill Loader ↓ Format Response ↓ Return to AI Assistant ``` **Progressive Disclosure Implementation**: 1. **Level 1**: Tool names/descriptions (always in context) 2. **Level 2**: Skill names/descriptions (search results) 3. **Level 3**: Full SKILL.md content (when skill is relevant) 4. **Level 4**: Additional documents (scripts, data, references) This architecture minimizes context window usage while ensuring all necessary information is available when needed. ### 5. Entry Point (`__main__.py`) **Purpose**: Provide CLI interface and manage server lifecycle. **Key Features**: - **CLI argument parsing**: - `--config`: Custom configuration file - `--example-config`: Print example config - `--verbose`: Enable debug logging - **Async server lifecycle**: - Load configuration - Initialize components - Run server - Handle shutdown gracefully - **Comprehensive error handling**: - Missing dependencies - Network failures - Invalid configuration - **Logging configuration**: - Info level by default - Debug level with --verbose - Structured log messages **Startup Sequence**: ``` Parse CLI arguments ↓ Load configuration ↓ Initialize skill loader ↓ Load skills from sources ↓ Initialize search engine ↓ Index skills ↓ Start MCP server ↓ Listen for tool calls ``` ## Data Flow ### Complete Request Flow ``` User Query ↓ AI Assistant (Claude, GPT, etc.) ↓ MCP Client ↓ MCP Server (stdio) ↓ Tool Handler (search_skills) ↓ Search Engine ↓ Cosine Similarity Computation ↓ Rank Skills ↓ Format Response ↓ Return to AI Assistant ↓ AI uses skill content ``` ### Skill Loading Flow ``` Configuration ↓ Skill Loader ├─ GitHub API (with caching) │ ↓ │ Download tree │ ↓ │ Find SKILL.md files │ ↓ │ Download content │ ↓ │ Load documents │ └─ Local Filesystem ↓ Scan directories ↓ Find SKILL.md files ↓ Read content ↓ Load documents ↓ Parse SKILL.md (YAML + Markdown) ↓ Create Skill objects ↓ Return to Search Engine for indexing ``` ## Design Decisions ### Why Local Embeddings? - **No API keys required**: Easier setup for users - **Privacy**: All processing happens locally - **Cost**: No per-query charges - **Speed**: <1 second per query - **Offline**: Works without internet (after initial setup) **Trade-off**: Slightly lower quality than large cloud models, but excellent for this use case. ### Why Progressive Disclosure? Following Anthropic's Agent Skills architecture: 1. **Context Window Efficiency**: Don't load all skills upfront 2. **Relevance Filtering**: Only show skills that match the task 3. **On-Demand Detail**: Load full content only when needed 4. **Scalability**: Works with hundreds of skills without overwhelming context ### Why Caching? **Problem**: GitHub API has a 60 requests/hour limit for unauthenticated access. **Solution**: - Cache tree API responses (the rate-limited call) - 24-hour validity is reasonable for skill repositories - Dramatically speeds up development and testing - Cache in temp directory for automatic cleanup **Benefits**: - First run: ~20-30 seconds (with API calls + lazy document loading) - Subsequent runs: ~10-15 seconds (from cache) - No rate limit issues during development ### Why Three Tools? 1. **`search_skills`**: Task-oriented discovery (main use case) 2. **`read_skill_document`**: Access scripts and assets (progressive disclosure) 3. **`list_skills`**: Exploration and debugging (understand what's available) This separation allows AI assistants to: - Find relevant skills efficiently - Access additional resources on demand - Debug configuration issues - Explore available capabilities ## Extension Points The architecture is designed to be extensible: ### Adding New Skill Sources Implement in `skill_loader.py`: ```python def load_from_custom_source(url: str, config: dict) -> list[Skill]: # Your implementation return skills ``` Then add to `load_all_skills()`. ### Adding New Embedding Models Update `config.py`: ```python "embedding_model": "your-model-name" ``` The search engine automatically loads any sentence-transformers model. ### Adding New Tools 1. Define tool in `server.py` `list_tools()` 2. Implement handler method `_handle_your_tool()` 3. Add route in `call_tool()` ### Custom Skill Formats Extend `parse_skill_md()` in `skill_loader.py` to handle custom frontmatter fields or markdown extensions. ## Performance Optimization ### Startup Time - **Fast path**: Use local directories instead of GitHub - **Lazy loading**: Consider implementing lazy skill loading - **Smaller model**: Use `all-MiniLM-L6-v2` instead of larger models ### Query Time Already very fast (<1 second). Further optimization possible: - **Approximate search**: Use FAISS or similar for 1000+ skills - **Caching**: Cache frequent queries (not implemented) ### Memory Usage - **Model size**: 90-420MB depending on choice - **Skills**: ~1KB per skill - **Documents**: Varies based on skill complexity Total: ~500MB typical, scales linearly with skill count. ## Security Considerations ### GitHub API - **No authentication**: Uses public API (60 req/hour limit) - **No credentials**: Never stores or transmits auth tokens - **HTTPS only**: All GitHub requests use TLS ### Local Files - **Path validation**: Checks for directory traversal - **Home directory expansion**: Supports `~` in paths - **Error isolation**: Failed skills don't crash server ### MCP Protocol - **Stdio transport**: Isolated from network - **No code execution**: Skills are data only (markdown) - **Sandboxing**: Running via MCP provides OS-level isolation **Note**: The `read_skill_document` tool can access Python scripts and other files. These are loaded as text/data only, not executed. ## Testing Strategy See [Testing Guide](testing.md) for details. **Architecture tests**: - Unit tests for each component - Integration tests for data flow - End-to-end tests with real repositories **Coverage**: - Configuration: 86% - Skill Loader: 68% - Search Engine: 100% - Server: Tested via integration tests ## Troubleshooting ### Server won't start Check: 1. Python 3.12 installed (not 3.13) 2. Dependencies installed (`uv sync`) 3. Configuration valid JSON ### Skills not loading Check: 1. GitHub rate limit (wait an hour) 2. Repository exists and is public 3. SKILL.md files have valid frontmatter 4. Use `--verbose` to see detailed logs ### Search returns poor results Check: 1. Skill descriptions are specific and descriptive 2. Query uses domain-appropriate terminology 3. Increase `top_k` to see more results 4. Consider using larger embedding model ### High memory usage Solutions: 1. Use smaller embedding model 2. Limit number of skill sources 3. Use subpath filtering to load fewer skills ## Future Architecture Considerations See [Roadmap](roadmap.md) for planned features. **Potential changes**: - **MCP Sampling**: Use LLM reasoning instead of RAG - **Skill execution**: Sandboxed Python execution - **Binary tools**: Execute scientific software - **Distributed skills**: Load from multiple servers - **Caching layer**: Redis/disk cache for queries

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K-Dense-AI/claude-skills-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server