Claude Skills MCP Server

architecture.md•12.5 KiB

# Architecture Guide

This document provides detailed information about the internal architecture of the Claude Skills MCP Server.

## Overview

The server consists of five core components working together to provide intelligent skill discovery through the Model Context Protocol.

## Core Components

### 1. Configuration System (`config.py`)

**Purpose**: Manages server configuration and provides sensible defaults.

**Key Features**:
- Default configuration with both Anthropic and K-Dense AI scientific skills repositories
- JSON-based config loading with validation
- Fallback to defaults if config unavailable or malformed
- Config printer (--example-config) showing default configuration
- Support for multiple skill sources (GitHub repos and local directories)

**Configuration Flow**:
```
Command line --config option
    ↓
Load JSON file
    ↓
Merge with defaults
    ↓
Pass to skill loader and search engine
```

### 2. Skill Loader (`skill_loader.py`)

**Purpose**: Load skills from various sources and parse their content.

**Key Features**:
- **GitHub repository loading** via API (no authentication required)
- **Local directory scanning** for development and custom skills
- **YAML frontmatter parsing** to extract skill metadata
- **Support for multiple formats**:
  - Direct skills (SKILL.md files)
  - Claude Code plugin repositories
- **Robust error handling**:
  - Network issues with retries
  - Missing files and malformed content
  - Rate limiting (60 requests/hour)
- **Automatic caching** of GitHub API responses (24-hour validity)
- **Document loading**: Scripts, references, images, and other assets

**Loading Process**:
```
Source Configuration
    ↓
GitHub API → Cache → Parse SKILL.md
    ↓
Extract: name, description, content
    ↓
Load additional documents
    ↓
Return Skill objects
```

**Caching Mechanism**:
- Cache location: System temp directory (`/tmp/claude_skills_mcp_cache/`)
- Cache key: MD5 hash of URL + branch
- Cache validity: 24 hours
- Automatic invalidation after expiry
- Dramatically reduces GitHub API usage

**Lazy Document Loading**:

To solve startup timeout issues (60+ seconds), documents are loaded lazily:

**Problem**: Fetching all documents for 90 skills at startup caused timeouts.

**Solution**: 
1. **Startup**: Load only SKILL.md files + document metadata (paths, sizes, types, URLs)
2. **On-Demand**: Fetch document content when `read_skill_document` is called
3. **Memory Cache**: Cache in Skill object for repeated access
4. **Disk Cache**: Persist to `/tmp/claude_skills_mcp_cache/documents/` for future runs

**Performance Impact**:
- Startup time: 60s → 15s (4x improvement)
- Network requests at startup: 300+ → 90 (SKILL.md only)
- First document access: ~200ms (network fetch + cache)
- Subsequent access: <1ms (memory or disk cache)

**Cache Directory Structure**:
```
/tmp/claude_skills_mcp_cache/
├── {md5_hash}.json          # GitHub API tree cache (24h TTL)
└── documents/
    ├── {md5_hash}.cache     # Individual document cache (permanent)
    ├── {md5_hash}.cache
    └── ...
```

**Document Access Flow**:
```
read_skill_document called
    ↓
Match documents by pattern  
    ↓
For each matched document:
    ↓
Check if fetched? → Yes: Use existing content
    ↓ No
skill.get_document(path)
    ↓
Check memory cache → Found: Return
    ↓ Not found
Check disk cache → Found: Return
    ↓ Not found
Fetch from GitHub
    ↓
Save to disk cache
    ↓
Cache in memory
    ↓
Return content
```

### 3. Search Engine (`search_engine.py`)

**Purpose**: Enable semantic search over skill descriptions using vector embeddings.

**Key Features**:
- **Sentence-transformers** for local embeddings
- **Default model**: `all-MiniLM-L6-v2` (384 dimensions, ~90MB)
- **Vector indexing** at startup for fast queries
- **Cosine similarity** search algorithm
- **Configurable top-K** results
- **No API keys** required - fully local operation

**Search Process**:
```
Startup:
  Load skills → Generate embeddings → Build index

Query:
  Encode query → Compute similarity → Rank → Return top-K
```

**Performance Characteristics**:
- Startup: 5-10 seconds (load model + index skills)
- Query time: <1 second
- Memory: ~500MB (model + embeddings)
- Scales well: tested with 70+ skills

**Embedding Model Details**:
- **all-MiniLM-L6-v2**: Fast, good quality, 384 dimensions
- **all-mpnet-base-v2**: Higher quality, slower, 768 dimensions (optional)
- Models are cached after first download

### 4. MCP Server (`server.py`)

**Purpose**: Implement the Model Context Protocol specification.

**Key Features**:
- **Standard MCP protocol** implementation
- **Three tools** with optimized descriptions:
  1. `search_skills` - Semantic search
  2. `read_skill_document` - Access skill files
  3. `list_skills` - Browse all skills
- **Progressive disclosure** of skill content:
  - Tool descriptions (always visible)
  - Skill metadata (on search)
  - Full content (when relevant)
  - Referenced files (on demand)
- **Content truncation** (configurable)
- **Stdio transport** for easy integration
- **Formatted output** with:
  - Relevance scores
  - Source links
  - Document metadata

**Tool Invocation Flow**:
```
AI Assistant
    ↓
MCP Client
    ↓
Tool Call (search_skills, read_skill_document, list_skills)
    ↓
Server Handler
    ↓
Search Engine / Skill Loader
    ↓
Format Response
    ↓
Return to AI Assistant
```

**Progressive Disclosure Implementation**:
1. **Level 1**: Tool names/descriptions (always in context)
2. **Level 2**: Skill names/descriptions (search results)
3. **Level 3**: Full SKILL.md content (when skill is relevant)
4. **Level 4**: Additional documents (scripts, data, references)

This architecture minimizes context window usage while ensuring all necessary information is available when needed.

### 5. Entry Point (`__main__.py`)

**Purpose**: Provide CLI interface and manage server lifecycle.

**Key Features**:
- **CLI argument parsing**:
  - `--config`: Custom configuration file
  - `--example-config`: Print example config
  - `--verbose`: Enable debug logging
- **Async server lifecycle**:
  - Load configuration
  - Initialize components
  - Run server
  - Handle shutdown gracefully
- **Comprehensive error handling**:
  - Missing dependencies
  - Network failures
  - Invalid configuration
- **Logging configuration**:
  - Info level by default
  - Debug level with --verbose
  - Structured log messages

**Startup Sequence**:
```
Parse CLI arguments
    ↓
Load configuration
    ↓
Initialize skill loader
    ↓
Load skills from sources
    ↓
Initialize search engine
    ↓
Index skills
    ↓
Start MCP server
    ↓
Listen for tool calls
```

## Data Flow

### Complete Request Flow

```
User Query
    ↓
AI Assistant (Claude, GPT, etc.)
    ↓
MCP Client
    ↓
MCP Server (stdio)
    ↓
Tool Handler (search_skills)
    ↓
Search Engine
    ↓
Cosine Similarity Computation
    ↓
Rank Skills
    ↓
Format Response
    ↓
Return to AI Assistant
    ↓
AI uses skill content
```

### Skill Loading Flow

```
Configuration
    ↓
Skill Loader
    ├─ GitHub API (with caching)
    │    ↓
    │  Download tree
    │    ↓
    │  Find SKILL.md files
    │    ↓
    │  Download content
    │    ↓
    │  Load documents
    │
    └─ Local Filesystem
         ↓
       Scan directories
         ↓
       Find SKILL.md files
         ↓
       Read content
         ↓
       Load documents
    ↓
Parse SKILL.md (YAML + Markdown)
    ↓
Create Skill objects
    ↓
Return to Search Engine for indexing
```

## Design Decisions

### Why Local Embeddings?

- **No API keys required**: Easier setup for users
- **Privacy**: All processing happens locally
- **Cost**: No per-query charges
- **Speed**: <1 second per query
- **Offline**: Works without internet (after initial setup)

**Trade-off**: Slightly lower quality than large cloud models, but excellent for this use case.

### Why Progressive Disclosure?

Following Anthropic's Agent Skills architecture:

1. **Context Window Efficiency**: Don't load all skills upfront
2. **Relevance Filtering**: Only show skills that match the task
3. **On-Demand Detail**: Load full content only when needed
4. **Scalability**: Works with hundreds of skills without overwhelming context

### Why Caching?

**Problem**: GitHub API has a 60 requests/hour limit for unauthenticated access.

**Solution**: 
- Cache tree API responses (the rate-limited call)
- 24-hour validity is reasonable for skill repositories
- Dramatically speeds up development and testing
- Cache in temp directory for automatic cleanup

**Benefits**:
- First run: ~20-30 seconds (with API calls + lazy document loading)
- Subsequent runs: ~10-15 seconds (from cache)
- No rate limit issues during development

### Why Three Tools?

1. **`search_skills`**: Task-oriented discovery (main use case)
2. **`read_skill_document`**: Access scripts and assets (progressive disclosure)
3. **`list_skills`**: Exploration and debugging (understand what's available)

This separation allows AI assistants to:
- Find relevant skills efficiently
- Access additional resources on demand
- Debug configuration issues
- Explore available capabilities

## Extension Points

The architecture is designed to be extensible:

### Adding New Skill Sources

Implement in `skill_loader.py`:
```python
def load_from_custom_source(url: str, config: dict) -> list[Skill]:
    # Your implementation
    return skills
```

Then add to `load_all_skills()`.

### Adding New Embedding Models

Update `config.py`:
```python
"embedding_model": "your-model-name"
```

The search engine automatically loads any sentence-transformers model.

### Adding New Tools

1. Define tool in `server.py` `list_tools()`
2. Implement handler method `_handle_your_tool()`
3. Add route in `call_tool()`

### Custom Skill Formats

Extend `parse_skill_md()` in `skill_loader.py` to handle custom frontmatter fields or markdown extensions.

## Performance Optimization

### Startup Time

- **Fast path**: Use local directories instead of GitHub
- **Lazy loading**: Consider implementing lazy skill loading
- **Smaller model**: Use `all-MiniLM-L6-v2` instead of larger models

### Query Time

Already very fast (<1 second). Further optimization possible:
- **Approximate search**: Use FAISS or similar for 1000+ skills
- **Caching**: Cache frequent queries (not implemented)

### Memory Usage

- **Model size**: 90-420MB depending on choice
- **Skills**: ~1KB per skill
- **Documents**: Varies based on skill complexity

Total: ~500MB typical, scales linearly with skill count.

## Security Considerations

### GitHub API

- **No authentication**: Uses public API (60 req/hour limit)
- **No credentials**: Never stores or transmits auth tokens
- **HTTPS only**: All GitHub requests use TLS

### Local Files

- **Path validation**: Checks for directory traversal
- **Home directory expansion**: Supports `~` in paths
- **Error isolation**: Failed skills don't crash server

### MCP Protocol

- **Stdio transport**: Isolated from network
- **No code execution**: Skills are data only (markdown)
- **Sandboxing**: Running via MCP provides OS-level isolation

**Note**: The `read_skill_document` tool can access Python scripts and other files. These are loaded as text/data only, not executed.

## Testing Strategy

See [Testing Guide](testing.md) for details.

**Architecture tests**:
- Unit tests for each component
- Integration tests for data flow
- End-to-end tests with real repositories

**Coverage**:
- Configuration: 86%
- Skill Loader: 68%
- Search Engine: 100%
- Server: Tested via integration tests

## Troubleshooting

### Server won't start

Check:
1. Python 3.12 installed (not 3.13)
2. Dependencies installed (`uv sync`)
3. Configuration valid JSON

### Skills not loading

Check:
1. GitHub rate limit (wait an hour)
2. Repository exists and is public
3. SKILL.md files have valid frontmatter
4. Use `--verbose` to see detailed logs

### Search returns poor results

Check:
1. Skill descriptions are specific and descriptive
2. Query uses domain-appropriate terminology
3. Increase `top_k` to see more results
4. Consider using larger embedding model

### High memory usage

Solutions:
1. Use smaller embedding model
2. Limit number of skill sources
3. Use subpath filtering to load fewer skills

## Future Architecture Considerations

See [Roadmap](roadmap.md) for planned features.

**Potential changes**:
- **MCP Sampling**: Use LLM reasoning instead of RAG
- **Skill execution**: Sandboxed Python execution
- **Binary tools**: Execute scientific software
- **Distributed skills**: Load from multiple servers
- **Caching layer**: Redis/disk cache for queries

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K-Dense-AI/claude-skills-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

architecture.md•12.5 KiB

# Architecture Guide

This document provides detailed information about the internal architecture of the Claude Skills MCP Server.

## Overview

The server consists of five core components working together to provide intelligent skill discovery through the Model Context Protocol.

## Core Components

### 1. Configuration System (`config.py`)

**Purpose**: Manages server configuration and provides sensible defaults.

**Key Features**:
- Default configuration with both Anthropic and K-Dense AI scientific skills repositories
- JSON-based config loading with validation
- Fallback to defaults if config unavailable or malformed
- Config printer (--example-config) showing default configuration
- Support for multiple skill sources (GitHub repos and local directories)

**Configuration Flow**:
```
Command line --config option
    ↓
Load JSON file
    ↓
Merge with defaults
    ↓
Pass to skill loader and search engine
```

### 2. Skill Loader (`skill_loader.py`)

**Purpose**: Load skills from various sources and parse their content.

**Key Features**:
- **GitHub repository loading** via API (no authentication required)
- **Local directory scanning** for development and custom skills
- **YAML frontmatter parsing** to extract skill metadata
- **Support for multiple formats**:
  - Direct skills (SKILL.md files)
  - Claude Code plugin repositories
- **Robust error handling**:
  - Network issues with retries
  - Missing files and malformed content
  - Rate limiting (60 requests/hour)
- **Automatic caching** of GitHub API responses (24-hour validity)
- **Document loading**: Scripts, references, images, and other assets

**Loading Process**:
```
Source Configuration
    ↓
GitHub API → Cache → Parse SKILL.md
    ↓
Extract: name, description, content
    ↓
Load additional documents
    ↓
Return Skill objects
```

**Caching Mechanism**:
- Cache location: System temp directory (`/tmp/claude_skills_mcp_cache/`)
- Cache key: MD5 hash of URL + branch
- Cache validity: 24 hours
- Automatic invalidation after expiry
- Dramatically reduces GitHub API usage

**Lazy Document Loading**:

To solve startup timeout issues (60+ seconds), documents are loaded lazily:

**Problem**: Fetching all documents for 90 skills at startup caused timeouts.

**Solution**: 
1. **Startup**: Load only SKILL.md files + document metadata (paths, sizes, types, URLs)
2. **On-Demand**: Fetch document content when `read_skill_document` is called
3. **Memory Cache**: Cache in Skill object for repeated access
4. **Disk Cache**: Persist to `/tmp/claude_skills_mcp_cache/documents/` for future runs

**Performance Impact**:
- Startup time: 60s → 15s (4x improvement)
- Network requests at startup: 300+ → 90 (SKILL.md only)
- First document access: ~200ms (network fetch + cache)
- Subsequent access: <1ms (memory or disk cache)

**Cache Directory Structure**:
```
/tmp/claude_skills_mcp_cache/
├── {md5_hash}.json          # GitHub API tree cache (24h TTL)
└── documents/
    ├── {md5_hash}.cache     # Individual document cache (permanent)
    ├── {md5_hash}.cache
    └── ...
```

**Document Access Flow**:
```
read_skill_document called
    ↓
Match documents by pattern  
    ↓
For each matched document:
    ↓
Check if fetched? → Yes: Use existing content
    ↓ No
skill.get_document(path)
    ↓
Check memory cache → Found: Return
    ↓ Not found
Check disk cache → Found: Return
    ↓ Not found
Fetch from GitHub
    ↓
Save to disk cache
    ↓
Cache in memory
    ↓
Return content
```

### 3. Search Engine (`search_engine.py`)

**Purpose**: Enable semantic search over skill descriptions using vector embeddings.

**Key Features**:
- **Sentence-transformers** for local embeddings
- **Default model**: `all-MiniLM-L6-v2` (384 dimensions, ~90MB)
- **Vector indexing** at startup for fast queries
- **Cosine similarity** search algorithm
- **Configurable top-K** results
- **No API keys** required - fully local operation

**Search Process**:
```
Startup:
  Load skills → Generate embeddings → Build index

Query:
  Encode query → Compute similarity → Rank → Return top-K
```

**Performance Characteristics**:
- Startup: 5-10 seconds (load model + index skills)
- Query time: <1 second
- Memory: ~500MB (model + embeddings)
- Scales well: tested with 70+ skills

**Embedding Model Details**:
- **all-MiniLM-L6-v2**: Fast, good quality, 384 dimensions
- **all-mpnet-base-v2**: Higher quality, slower, 768 dimensions (optional)
- Models are cached after first download

### 4. MCP Server (`server.py`)

**Purpose**: Implement the Model Context Protocol specification.

**Key Features**:
- **Standard MCP protocol** implementation
- **Three tools** with optimized descriptions:
  1. `search_skills` - Semantic search
  2. `read_skill_document` - Access skill files
  3. `list_skills` - Browse all skills
- **Progressive disclosure** of skill content:
  - Tool descriptions (always visible)
  - Skill metadata (on search)
  - Full content (when relevant)
  - Referenced files (on demand)
- **Content truncation** (configurable)
- **Stdio transport** for easy integration
- **Formatted output** with:
  - Relevance scores
  - Source links
  - Document metadata

**Tool Invocation Flow**:
```
AI Assistant
    ↓
MCP Client
    ↓
Tool Call (search_skills, read_skill_document, list_skills)
    ↓
Server Handler
    ↓
Search Engine / Skill Loader
    ↓
Format Response
    ↓
Return to AI Assistant
```

**Progressive Disclosure Implementation**:
1. **Level 1**: Tool names/descriptions (always in context)
2. **Level 2**: Skill names/descriptions (search results)
3. **Level 3**: Full SKILL.md content (when skill is relevant)
4. **Level 4**: Additional documents (scripts, data, references)

This architecture minimizes context window usage while ensuring all necessary information is available when needed.

### 5. Entry Point (`__main__.py`)

**Purpose**: Provide CLI interface and manage server lifecycle.

**Key Features**:
- **CLI argument parsing**:
  - `--config`: Custom configuration file
  - `--example-config`: Print example config
  - `--verbose`: Enable debug logging
- **Async server lifecycle**:
  - Load configuration
  - Initialize components
  - Run server
  - Handle shutdown gracefully
- **Comprehensive error handling**:
  - Missing dependencies
  - Network failures
  - Invalid configuration
- **Logging configuration**:
  - Info level by default
  - Debug level with --verbose
  - Structured log messages

**Startup Sequence**:
```
Parse CLI arguments
    ↓
Load configuration
    ↓
Initialize skill loader
    ↓
Load skills from sources
    ↓
Initialize search engine
    ↓
Index skills
    ↓
Start MCP server
    ↓
Listen for tool calls
```

## Data Flow

### Complete Request Flow

```
User Query
    ↓
AI Assistant (Claude, GPT, etc.)
    ↓
MCP Client
    ↓
MCP Server (stdio)
    ↓
Tool Handler (search_skills)
    ↓
Search Engine
    ↓
Cosine Similarity Computation
    ↓
Rank Skills
    ↓
Format Response
    ↓
Return to AI Assistant
    ↓
AI uses skill content
```

### Skill Loading Flow

```
Configuration
    ↓
Skill Loader
    ├─ GitHub API (with caching)
    │    ↓
    │  Download tree
    │    ↓
    │  Find SKILL.md files
    │    ↓
    │  Download content
    │    ↓
    │  Load documents
    │
    └─ Local Filesystem
         ↓
       Scan directories
         ↓
       Find SKILL.md files
         ↓
       Read content
         ↓
       Load documents
    ↓
Parse SKILL.md (YAML + Markdown)
    ↓
Create Skill objects
    ↓
Return to Search Engine for indexing
```

## Design Decisions

### Why Local Embeddings?

- **No API keys required**: Easier setup for users
- **Privacy**: All processing happens locally
- **Cost**: No per-query charges
- **Speed**: <1 second per query
- **Offline**: Works without internet (after initial setup)

**Trade-off**: Slightly lower quality than large cloud models, but excellent for this use case.

### Why Progressive Disclosure?

Following Anthropic's Agent Skills architecture:

1. **Context Window Efficiency**: Don't load all skills upfront
2. **Relevance Filtering**: Only show skills that match the task
3. **On-Demand Detail**: Load full content only when needed
4. **Scalability**: Works with hundreds of skills without overwhelming context

### Why Caching?

**Problem**: GitHub API has a 60 requests/hour limit for unauthenticated access.

**Solution**: 
- Cache tree API responses (the rate-limited call)
- 24-hour validity is reasonable for skill repositories
- Dramatically speeds up development and testing
- Cache in temp directory for automatic cleanup

**Benefits**:
- First run: ~20-30 seconds (with API calls + lazy document loading)
- Subsequent runs: ~10-15 seconds (from cache)
- No rate limit issues during development

### Why Three Tools?

1. **`search_skills`**: Task-oriented discovery (main use case)
2. **`read_skill_document`**: Access scripts and assets (progressive disclosure)
3. **`list_skills`**: Exploration and debugging (understand what's available)

This separation allows AI assistants to:
- Find relevant skills efficiently
- Access additional resources on demand
- Debug configuration issues
- Explore available capabilities

## Extension Points

The architecture is designed to be extensible:

### Adding New Skill Sources

Implement in `skill_loader.py`:
```python
def load_from_custom_source(url: str, config: dict) -> list[Skill]:
    # Your implementation
    return skills
```

Then add to `load_all_skills()`.

### Adding New Embedding Models

Update `config.py`:
```python
"embedding_model": "your-model-name"
```

The search engine automatically loads any sentence-transformers model.

### Adding New Tools

1. Define tool in `server.py` `list_tools()`
2. Implement handler method `_handle_your_tool()`
3. Add route in `call_tool()`

### Custom Skill Formats

Extend `parse_skill_md()` in `skill_loader.py` to handle custom frontmatter fields or markdown extensions.

## Performance Optimization

### Startup Time

- **Fast path**: Use local directories instead of GitHub
- **Lazy loading**: Consider implementing lazy skill loading
- **Smaller model**: Use `all-MiniLM-L6-v2` instead of larger models

### Query Time

Already very fast (<1 second). Further optimization possible:
- **Approximate search**: Use FAISS or similar for 1000+ skills
- **Caching**: Cache frequent queries (not implemented)

### Memory Usage

- **Model size**: 90-420MB depending on choice
- **Skills**: ~1KB per skill
- **Documents**: Varies based on skill complexity

Total: ~500MB typical, scales linearly with skill count.

## Security Considerations

### GitHub API

- **No authentication**: Uses public API (60 req/hour limit)
- **No credentials**: Never stores or transmits auth tokens
- **HTTPS only**: All GitHub requests use TLS

### Local Files

- **Path validation**: Checks for directory traversal
- **Home directory expansion**: Supports `~` in paths
- **Error isolation**: Failed skills don't crash server

### MCP Protocol

- **Stdio transport**: Isolated from network
- **No code execution**: Skills are data only (markdown)
- **Sandboxing**: Running via MCP provides OS-level isolation

**Note**: The `read_skill_document` tool can access Python scripts and other files. These are loaded as text/data only, not executed.

## Testing Strategy

See [Testing Guide](testing.md) for details.

**Architecture tests**:
- Unit tests for each component
- Integration tests for data flow
- End-to-end tests with real repositories

**Coverage**:
- Configuration: 86%
- Skill Loader: 68%
- Search Engine: 100%
- Server: Tested via integration tests

## Troubleshooting

### Server won't start

Check:
1. Python 3.12 installed (not 3.13)
2. Dependencies installed (`uv sync`)
3. Configuration valid JSON

### Skills not loading

Check:
1. GitHub rate limit (wait an hour)
2. Repository exists and is public
3. SKILL.md files have valid frontmatter
4. Use `--verbose` to see detailed logs

### Search returns poor results

Check:
1. Skill descriptions are specific and descriptive
2. Query uses domain-appropriate terminology
3. Increase `top_k` to see more results
4. Consider using larger embedding model

### High memory usage

Solutions:
1. Use smaller embedding model
2. Limit number of skill sources
3. Use subpath filtering to load fewer skills

## Future Architecture Considerations

See [Roadmap](roadmap.md) for planned features.

**Potential changes**:
- **MCP Sampling**: Use LLM reasoning instead of RAG
- **Skill execution**: Sandboxed Python execution
- **Binary tools**: Execute scientific software
- **Distributed skills**: Load from multiple servers
- **Caching layer**: Redis/disk cache for queries