README.mdโข32.3 kB
# ๐ฆ Llama 4 Maverick MCP Server (Python)
**Author**: Yobie Benjamin
**Version**: 0.9
**Date**: August 1, 2025
A Python implementation of the Model Context Protocol (MCP) server that bridges Llama models with Claude Desktop through Ollama. This pure Python solution offers clean architecture, high performance, and easy extensibility.
## ๐ Table of Contents
- [What Would You Use This Llama MCP Server For?](#-what-would-you-use-this-llama-mcp-server-for)
- [Why Python?](#-why-python)
- [Features](#-features)
- [System Requirements](#-system-requirements)
- [Quick Start](#-quick-start)
- [Detailed Installation](#-detailed-installation)
- [Configuration](#-configuration)
- [Available Tools](#-available-tools)
- [Usage Examples](#-usage-examples)
- [Real-World Applications](#-real-world-applications)
- [Development](#-development)
- [Performance Optimization](#-performance-optimization)
- [Troubleshooting](#-troubleshooting)
- [Contributing](#-contributing)
## ๐ฏ What Would You Use This Llama MCP Server For?
### The Revolution of Local AI + Claude Desktop
This Python MCP server creates a powerful bridge between Claude Desktop's sophisticated interface and your locally-hosted Llama models. Here's what makes this combination revolutionary:
### 1. **Privacy-First AI Operations** ๐
**The Challenge**: Organizations handling sensitive data can't use cloud AI due to privacy concerns.
**The Solution**: This MCP server keeps everything local while providing enterprise-grade AI capabilities.
**Real-World Applications**:
- **Healthcare**: A hospital can analyze patient records using AI without violating HIPAA compliance
- **Legal**: Law firms can process confidential client documents with complete privacy
- **Finance**: Banks can analyze transaction data without exposing customer information
- **Government**: Agencies can process classified documents on air-gapped systems
**Example Implementation**:
```python
# Process sensitive medical records locally
async def analyze_patient_data(patient_file):
# Data never leaves your server
content = await tool_manager.execute("read_file", {"path": patient_file})
# Use specialized medical model
analysis = await llama_service.complete(
prompt=f"Analyze patient data for risk factors: {content}",
model="medical-llama:latest", # Your HIPAA-compliant fine-tuned model
temperature=0.1 # Low temperature for medical accuracy
)
# Store results locally with encryption
await secure_storage.save(analysis, encrypted=True)
```
### 2. **Custom Model Deployment** ๐ฏ
**The Challenge**: Generic models don't understand your domain-specific language and requirements.
**The Solution**: Deploy your own fine-tuned models through the MCP interface.
**Real-World Applications**:
- **Research Labs**: Use models trained on proprietary research data
- **Enterprises**: Deploy models fine-tuned on company documentation
- **Educational Institutions**: Use models trained on curriculum-specific content
- **Industry-Specific**: Legal, medical, financial, or technical domain models
**Example Implementation**:
```python
# Switch between specialized models based on task
class ModelSelector:
def __init__(self):
self.models = {
"general": "llama3:latest",
"code": "codellama:latest",
"medical": "medical-llama:13b",
"legal": "legal-llama:7b",
"finance": "finance-llama:13b"
}
async def select_and_query(self, domain: str, prompt: str):
model = self.models.get(domain, "llama3:latest")
return await llama_service.complete(
prompt=prompt,
model=model,
temperature=0.3 if domain in ["medical", "legal"] else 0.7
)
```
### 3. **Hybrid Intelligence Systems** ๐
**The Challenge**: No single AI model excels at everything.
**The Solution**: Combine Claude's reasoning with Llama's generation capabilities.
**Real-World Applications**:
- **Software Development**: Claude plans architecture, Llama generates implementation
- **Content Creation**: Claude creates outlines, Llama writes detailed content
- **Data Analysis**: Claude interprets results, Llama generates reports
- **Research**: Claude formulates hypotheses, Llama explores implications
**Example Implementation**:
```python
# Hybrid workflow combining Claude and Llama
class HybridAI:
async def complex_task(self, requirement: str):
# Step 1: Use Claude for high-level planning
plan = await claude.create_plan(requirement)
# Step 2: Use local Llama for detailed implementation
implementation = await llama_service.complete(
prompt=f"Implement this plan: {plan}",
model="codellama:34b",
max_tokens=4096
)
# Step 3: Use Claude for review and refinement
refined = await claude.review_and_refine(implementation)
return refined
```
### 4. **Offline and Edge Computing** ๐
**The Challenge**: Many environments lack reliable internet or prohibit cloud connections.
**The Solution**: Full AI capabilities without any internet requirement.
**Real-World Applications**:
- **Remote Operations**: Oil rigs, ships, remote research stations
- **Industrial IoT**: Factory floors with real-time requirements
- **Field Work**: Geological surveys, wildlife research, disaster response
- **Secure Facilities**: Military bases, research labs, government buildings
**Example Implementation**:
```python
# Edge deployment for industrial quality control
class EdgeQualityControl:
def __init__(self):
self.config = Config(
llama_model_name="quality-control:latest",
enable_streaming=True,
max_context_length=8192 # Optimized for edge devices
)
async def inspect_product(self, sensor_data: dict):
# Process sensor data locally
analysis = await llama_service.complete(
prompt=f"Analyze sensor readings for defects: {sensor_data}",
temperature=0.1, # Consistent results needed
max_tokens=256 # Quick response for real-time processing
)
# Trigger local actions based on analysis
if "defect" in analysis.lower():
await self.trigger_alert(analysis)
return analysis
```
### 5. **Experimentation and Research** ๐งช
**The Challenge**: Researchers need reproducible results and full control over model behavior.
**The Solution**: Complete transparency and control over every aspect of the AI pipeline.
**Real-World Applications**:
- **Academic Research**: Reproducible experiments for papers
- **Model Comparison**: A/B testing different models and parameters
- **Behavior Analysis**: Understanding how models respond to different inputs
- **Prompt Engineering**: Developing optimal prompts for specific tasks
**Example Implementation**:
```python
# Research experiment framework
class ExperimentRunner:
async def run_experiment(self, hypothesis: str, test_cases: list):
results = []
# Test multiple models
for model in ["llama3:7b", "llama3:13b", "llama3:70b"]:
# Test multiple parameters
for temp in [0.1, 0.5, 0.9, 1.5]:
model_results = []
for test in test_cases:
response = await llama_service.complete(
prompt=test,
model=model,
temperature=temp,
seed=42 # Reproducible results
)
model_results.append({
"input": test,
"output": response,
"model": model,
"temperature": temp,
"timestamp": datetime.now()
})
results.append(model_results)
# Analyze and save results
analysis = self.analyze_results(results)
await self.save_experiment(hypothesis, results, analysis)
return analysis
```
### 6. **Cost-Effective Scaling** ๐ฐ
**The Challenge**: API costs can become prohibitive for high-volume applications.
**The Solution**: One-time hardware investment for unlimited usage.
**Real-World Applications**:
- **Startups**: Prototype without burning through funding
- **Education**: Provide AI access to all students without budget concerns
- **Non-profits**: Leverage AI without ongoing costs
- **High-volume Processing**: Batch jobs, data analysis, content generation
**Cost Analysis Example**:
```python
# Cost comparison calculator
class CostAnalyzer:
def calculate_savings(self, monthly_tokens: int):
# API costs (approximate)
api_cost_per_million = 15.00 # USD
monthly_api_cost = (monthly_tokens / 1_000_000) * api_cost_per_million
# Local costs (one-time hardware)
hardware_cost = 2000 # Good GPU setup
electricity_monthly = 50 # Approximate
# Calculate break-even
months_to_break_even = hardware_cost / (monthly_api_cost - electricity_monthly)
return {
"monthly_api_cost": monthly_api_cost,
"monthly_local_cost": electricity_monthly,
"monthly_savings": monthly_api_cost - electricity_monthly,
"break_even_months": months_to_break_even,
"first_year_savings": (monthly_api_cost * 12) - (hardware_cost + electricity_monthly * 12)
}
```
### 7. **Real-Time Processing** โก
**The Challenge**: Network latency makes cloud AI unsuitable for real-time applications.
**The Solution**: Sub-second response times with local processing.
**Real-World Applications**:
- **Trading Systems**: Analyze market data in milliseconds
- **Gaming**: Real-time NPC dialogue and behavior
- **Robotics**: Immediate response to sensor inputs
- **Live Translation**: Instant language translation
**Example Implementation**:
```python
# Real-time stream processing
class StreamProcessor:
def __init__(self):
self.buffer = []
self.processing = False
async def process_stream(self, data_stream):
async for chunk in data_stream:
self.buffer.append(chunk)
if not self.processing and len(self.buffer) > 0:
self.processing = True
# Process immediately without network delay
result = await llama_service.complete(
prompt=f"Analyze: {self.buffer[-1]}",
model="tinyllama:latest", # Fast model for real-time
max_tokens=50,
stream=True
)
async for token in result:
yield token # Stream results immediately
self.processing = False
```
### 8. **Custom Tool Integration** ๐ ๏ธ
**The Challenge**: Generic AI can't interact with your specific systems and databases.
**The Solution**: Build custom tools that integrate with your infrastructure.
**Real-World Applications**:
- **DevOps**: AI that can manage your specific infrastructure
- **Database Management**: Query and manage your databases via natural language
- **System Administration**: Automate complex administrative tasks
- **Business Intelligence**: Connect to your BI tools and data warehouses
**Example Implementation**:
```python
# Custom tool for database operations
class DatabaseTool(BaseTool):
@property
def name(self) -> str:
return "company_database"
@property
def description(self) -> str:
return "Query and manage company database"
async def execute(self, query: str, operation: str = "select") -> ToolResult:
# Connect to your specific database
async with get_company_db() as db:
if operation == "select":
results = await db.fetch(query)
return ToolResult(success=True, data=results)
elif operation == "analyze":
# Use Llama to analyze query results
analysis = await llama_service.complete(
prompt=f"Analyze this data: {results}",
temperature=0.3
)
return ToolResult(success=True, data=analysis)
```
### 9. **Compliance and Governance** ๐
**The Challenge**: Regulatory requirements demand complete control and audit trails.
**The Solution**: Full transparency and logging of all AI operations.
**Real-World Applications**:
- **Healthcare**: HIPAA compliance with audit trails
- **Finance**: SOX compliance with transaction monitoring
- **Legal**: Attorney-client privilege protection
- **Government**: Security clearance requirements
**Example Implementation**:
```python
# Compliance-aware AI system
class ComplianceAI:
def __init__(self):
self.audit_logger = AuditLogger()
self.encryption = EncryptionService()
async def process_regulated_data(self, data: str, user: str, purpose: str):
# Log access for audit
audit_id = await self.audit_logger.log_access(
user=user,
data_type="regulated",
purpose=purpose,
timestamp=datetime.now()
)
# Encrypt data in transit
encrypted = self.encryption.encrypt(data)
# Process with local model (data never leaves premises)
result = await llama_service.complete(
prompt=f"Process: {encrypted}",
model="compliance-llama:latest"
)
# Log completion
await self.audit_logger.log_completion(
audit_id=audit_id,
success=True,
result_hash=hashlib.sha256(result.encode()).hexdigest()
)
return self.encryption.encrypt(result)
```
### 10. **Educational Environments** ๐
**The Challenge**: Educational institutions need affordable AI access for all students.
**The Solution**: Single deployment serves unlimited students without per-use costs.
**Real-World Applications**:
- **Computer Science**: Teaching AI/ML concepts hands-on
- **Research Projects**: Student research without budget constraints
- **Writing Centers**: AI-assisted writing for all students
- **Language Learning**: Personalized language practice
**Example Implementation**:
```python
# Educational AI assistant
class EducationalAssistant:
def __init__(self):
self.student_profiles = {}
self.learning_analytics = LearningAnalytics()
async def personalized_tutoring(self, student_id: str, subject: str, question: str):
# Get student's learning profile
profile = self.student_profiles.get(student_id, self.create_profile(student_id))
# Adjust response based on student level
response = await llama_service.complete(
prompt=f"""
Student Level: {profile['level']}
Subject: {subject}
Question: {question}
Provide an explanation appropriate for this student's level.
""",
temperature=0.7,
model="education-llama:latest"
)
# Track learning progress
await self.learning_analytics.record_interaction(
student_id=student_id,
subject=subject,
question=question,
response=response
)
return response
```
## ๐ Why Python?
### Advantages Over TypeScript/Node.js
| Aspect | Python Advantage | Use Case |
|--------|------------------|----------|
| **Scientific Computing** | NumPy, SciPy, Pandas integration | Data analysis, research |
| **ML Ecosystem** | Direct integration with PyTorch, TensorFlow | Model experimentation |
| **Simplicity** | Cleaner async/await syntax | Faster development |
| **Libraries** | Vast ecosystem of AI/ML tools | Extended functionality |
| **Debugging** | Better error messages and debugging tools | Easier troubleshooting |
| **Performance** | uvloop for high-performance async | Better concurrency |
| **Type Safety** | Type hints + Pydantic validation | Runtime validation |
## โจ Features
### Core Capabilities
- ๐ **High Performance**: Async/await with uvloop support
- ๐ ๏ธ **10+ Built-in Tools**: Web search, file ops, calculations, and more
- ๐ **Prompt Templates**: Pre-defined prompts for common tasks
- ๐ **Resource Management**: Access templates and documentation
- ๐ **Streaming Support**: Real-time token generation
- ๐ง **Highly Configurable**: Environment-based configuration
- ๐ **Structured Logging**: Comprehensive debugging support
- ๐งช **Fully Tested**: Pytest test suite included
### Python-Specific Features
- ๐ผ **Data Science Integration**: Works with Pandas, NumPy
- ๐ค **ML Framework Compatible**: Integrate with PyTorch, TensorFlow
- ๐ **Analytics Built-in**: Performance metrics and monitoring
- ๐ **Plugin System**: Easy to extend with Python packages
- ๐ฏ **Type Safety**: Pydantic models for validation
- ๐ **Security**: Built-in sanitization and validation
## ๐ป System Requirements
### Minimum Requirements
| Component | Minimum | Recommended | Optimal |
|-----------|---------|-------------|---------|
| **Python** | 3.9+ | 3.11+ | Latest |
| **CPU** | 4 cores | 8 cores | 16+ cores |
| **RAM** | 8GB | 16GB | 32GB+ |
| **Storage** | 10GB SSD | 50GB SSD | 100GB NVMe |
| **OS** | Linux/macOS/Windows | Ubuntu 22.04 | Latest Linux |
### Model Requirements
| Model | Parameters | RAM | Use Case |
|-------|------------|-----|----------|
| `tinyllama` | 1.1B | 2GB | Testing, quick responses |
| `llama3:7b` | 7B | 8GB | General purpose |
| `llama3:13b` | 13B | 16GB | Advanced tasks |
| `llama3:70b` | 70B | 48GB | Professional use |
| `codellama` | 7-34B | 8-32GB | Code generation |
## ๐ Quick Start
```bash
# Clone the repository
git clone https://github.com/yobieben/llama4-maverick-mcp-python.git
cd llama4-maverick-mcp-python
# Run setup (handles everything)
python setup.py
# Start the server
python -m llama4_maverick_mcp.server
```
That's it! The server is now running and ready to connect to Claude Desktop.
## ๐ฆ Detailed Installation
### Step 1: Python Setup
```bash
# Check Python version
python --version # Should be 3.9+
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows:
venv\Scripts\activate
```
### Step 2: Install Dependencies
```bash
# Install the package in development mode
pip install -e .
# For development with testing tools
pip install -e .[dev]
```
### Step 3: Install Ollama
```bash
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# Download from https://ollama.com/download/windows
```
### Step 4: Configure Environment
```bash
# Copy example configuration
cp .env.example .env
# Edit configuration
nano .env # or your preferred editor
```
### Step 5: Download Models
```bash
# Start Ollama service
ollama serve
# In another terminal, pull models
ollama pull llama3:latest
ollama pull codellama:latest
ollama pull tinyllama:latest
```
### Step 6: Configure Claude Desktop
Add to Claude Desktop configuration:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"llama4-python": {
"command": "python",
"args": ["-m", "llama4_maverick_mcp.server"],
"cwd": "/path/to/llama4-maverick-mcp-python",
"env": {
"PYTHONPATH": "/path/to/llama4-maverick-mcp-python/src",
"LLAMA_MODEL_NAME": "llama3:latest"
}
}
}
}
```
## โ๏ธ Configuration
### Environment Variables
Create a `.env` file:
```bash
# Ollama Configuration
LLAMA_API_URL=http://localhost:11434
LLAMA_MODEL_NAME=llama3:latest
LLAMA_API_KEY= # Optional
# Server Configuration
MCP_LOG_LEVEL=INFO
MCP_SERVER_HOST=localhost
MCP_SERVER_PORT=3000
# Features
ENABLE_STREAMING=true
ENABLE_FUNCTION_CALLING=true
ENABLE_VISION=false
ENABLE_CODE_EXECUTION=false # Security risk
ENABLE_WEB_SEARCH=true
# Model Parameters
TEMPERATURE=0.7 # 0.0-2.0
TOP_P=0.9 # 0.0-1.0
TOP_K=40 # 1-100
REPEAT_PENALTY=1.1
SEED=42 # For reproducibility
# File System
FILE_SYSTEM_BASE_PATH=/safe/path
ALLOW_FILE_WRITES=true
# Performance
MAX_CONTEXT_LENGTH=128000
MAX_CONCURRENT_REQUESTS=10
REQUEST_TIMEOUT_MS=30000
CACHE_TTL=3600
CACHE_MAX_SIZE=1000
# Debug
DEBUG=false
VERBOSE_LOGGING=false
```
### Configuration Classes
```python
from llama4_maverick_mcp.config import Config
# Create custom configuration
config = Config(
llama_model_name="codellama:latest",
temperature=0.3,
enable_code_execution=True
)
# Access configuration
print(config.llama_model_name)
print(config.get_model_params())
```
## ๐ ๏ธ Available Tools
### Built-in Tools
| Tool | Description | Example |
|------|-------------|---------|
| `calculator` | Mathematical calculations | `2 + 2`, `sqrt(16)` |
| `datetime` | Date/time operations | Current time, date math |
| `json_tool` | JSON manipulation | Parse, extract, transform |
| `web_search` | Search the web | Query for information |
| `file_read` | Read files | Access local files |
| `file_write` | Write files | Save data locally |
| `list_files` | List directories | Browse file system |
| `code_executor` | Run code | Execute Python/JS/Bash |
| `http_request` | HTTP calls | API interactions |
### Creating Custom Tools
```python
# src/llama4_maverick_mcp/tools/custom/my_tool.py
from pydantic import BaseModel, Field
from ..base import BaseTool, ToolResult
class MyToolParams(BaseModel):
"""Parameters for my custom tool."""
input_text: str = Field(..., description="Text to process")
option: str = Field(default="default", description="Processing option")
class MyCustomTool(BaseTool):
@property
def name(self) -> str:
return "my_custom_tool"
@property
def description(self) -> str:
return "Performs custom processing on text"
@property
def parameters(self) -> type[BaseModel]:
return MyToolParams
async def execute(self, input_text: str, option: str = "default") -> ToolResult:
# Your custom logic here
result = f"Processed: {input_text} with option: {option}"
return ToolResult(
success=True,
data={"result": result, "length": len(input_text)}
)
```
## ๐ Usage Examples
### Basic Usage
```python
import asyncio
from llama4_maverick_mcp import MCPServer, Config
async def main():
# Create server with custom config
config = Config(
llama_model_name="llama3:latest",
temperature=0.7
)
server = MCPServer(config)
# Run the server
await server.run()
if __name__ == "__main__":
asyncio.run(main())
```
### Direct API Usage
```python
from llama4_maverick_mcp import LlamaService, Config
async def generate_text():
config = Config()
llama = LlamaService(config)
await llama.initialize()
# Simple completion
result = await llama.complete(
prompt="Explain quantum computing",
temperature=0.5,
max_tokens=200
)
print(result)
# Chat completion
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is Python?"}
]
response = await llama.complete_chat(messages)
print(response)
```
### Tool Execution
```python
from llama4_maverick_mcp.tools import ToolManager
async def use_tools():
manager = ToolManager(Config())
await manager.initialize()
# Execute calculator
result = await manager.execute_tool(
"calculator",
{"expression": "factorial(5) + sqrt(16)"}
)
print(result)
# Read file
content = await manager.execute_tool(
"file_read",
{"path": "config.json"}
)
print(content)
```
## ๐ Real-World Applications
### 1. Document Analysis Pipeline
```python
class DocumentAnalyzer:
def __init__(self):
self.config = Config(temperature=0.3)
self.llama = LlamaService(self.config)
self.tools = ToolManager(self.config)
async def analyze_documents(self, directory: str):
# List all documents
files = await self.tools.execute_tool(
"list_files",
{"path": directory, "recursive": True}
)
results = []
for file in files['data']['files']:
if file.endswith(('.txt', '.md', '.pdf')):
# Read document
content = await self.tools.execute_tool(
"file_read",
{"path": file}
)
# Analyze with Llama
analysis = await self.llama.complete(
prompt=f"Summarize and extract key points: {content['data']}",
max_tokens=500
)
results.append({
"file": file,
"analysis": analysis
})
return results
```
### 2. Code Review System
```python
class CodeReviewer:
async def review_code(self, code: str, language: str = "python"):
prompt = f"""
Review this {language} code for:
1. Security vulnerabilities
2. Performance issues
3. Best practices
4. Potential bugs
Code:
```{language}
{code}
```
Provide specific suggestions for improvement.
"""
review = await llama_service.complete(
prompt=prompt,
model="codellama:latest",
temperature=0.3
)
return self.parse_review(review)
```
### 3. Research Assistant
```python
class ResearchAssistant:
async def research_topic(self, topic: str):
# Search for information
search_results = await self.tools.execute_tool(
"web_search",
{"query": topic, "max_results": 10}
)
# Analyze sources
analysis = await self.llama.complete(
prompt=f"Analyze these sources about {topic}: {search_results}",
temperature=0.5
)
# Generate report
report = await self.llama.complete(
prompt=f"Write a comprehensive report on {topic} based on: {analysis}",
temperature=0.7,
max_tokens=2000
)
# Save report
await self.tools.execute_tool(
"file_write",
{
"path": f"research_{topic}_{datetime.now().strftime('%Y%m%d')}.md",
"content": report
}
)
return report
```
## ๐งช Development
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=llama4_maverick_mcp
# Run specific test
pytest tests/test_llama_service.py
# Run with verbose output
pytest -v
```
### Code Quality
```bash
# Format code with Black
black src/
# Lint with Ruff
ruff check src/
# Type checking with mypy
mypy src/
# All quality checks
make quality
```
### Creating Tests
```python
# tests/test_my_tool.py
import pytest
from llama4_maverick_mcp.tools.custom.my_tool import MyCustomTool
@pytest.mark.asyncio
async def test_my_custom_tool():
tool = MyCustomTool()
result = await tool.execute(
input_text="Hello, world!",
option="uppercase"
)
assert result.success
assert "Hello, world!" in result.data["result"]
assert result.data["length"] == 13
```
## ๐ Performance Optimization
### 1. Use uvloop (Linux/macOS)
```python
# Automatically enabled if available
# 2-4x performance improvement for async operations
pip install uvloop
```
### 2. Model Optimization
```python
# Use smaller models for simple tasks
config = Config(
llama_model_name="tinyllama:latest", # 1.1B params, very fast
max_context_length=4096, # Reduce context for speed
temperature=0.1 # Lower temperature for consistency
)
```
### 3. Caching Strategy
```python
from functools import lru_cache
from cachetools import TTLCache
class CachedLlamaService(LlamaService):
def __init__(self, config):
super().__init__(config)
self.cache = TTLCache(maxsize=1000, ttl=3600)
async def complete(self, prompt: str, **kwargs):
cache_key = f"{prompt}:{kwargs}"
if cache_key in self.cache:
return self.cache[cache_key]
result = await super().complete(prompt, **kwargs)
self.cache[cache_key] = result
return result
```
### 4. Batch Processing
```python
import asyncio
async def batch_process(prompts: list):
# Process multiple prompts concurrently
tasks = [
llama_service.complete(prompt, temperature=0.5)
for prompt in prompts
]
# Limit concurrency to avoid overwhelming the system
semaphore = asyncio.Semaphore(5)
async def limited_task(task):
async with semaphore:
return await task
results = await asyncio.gather(*[limited_task(t) for t in tasks])
return results
```
## ๐ง Troubleshooting
### Common Issues
| Issue | Solution |
|-------|----------|
| **ImportError** | Check Python path: `export PYTHONPATH=$PYTHONPATH:$(pwd)/src` |
| **Ollama not found** | Install: `curl -fsSL https://ollama.com/install.sh \| sh` |
| **Model not available** | Pull model: `ollama pull llama3:latest` |
| **Permission denied** | Check file permissions and base path configuration |
| **Memory error** | Use smaller model or increase system RAM |
| **Timeout errors** | Increase `REQUEST_TIMEOUT_MS` in configuration |
### Debug Mode
```python
# Enable detailed logging
config = Config(
debug_mode=True,
verbose_logging=True,
log_level="DEBUG"
)
# Or via environment
export DEBUG=true
export MCP_LOG_LEVEL=DEBUG
export VERBOSE_LOGGING=true
```
### Health Check
```python
async def health_check():
"""Check system health."""
checks = {
"python_version": sys.version,
"ollama_connected": config.validate_ollama_connection(),
"models_available": await llama_service.list_models(),
"tools_loaded": len(await tool_manager.get_tools()),
"memory_usage": psutil.virtual_memory().percent,
"disk_usage": psutil.disk_usage('/').percent
}
return {
"status": "healthy" if all(checks.values()) else "degraded",
"checks": checks,
"timestamp": datetime.now().isoformat()
}
```
## ๐ค Contributing
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### Areas for Contribution
- ๐ ๏ธ New tools and integrations
- ๐ Documentation improvements
- ๐ Bug fixes
- ๐ Performance optimizations
- ๐งช Test coverage
- ๐ Internationalization
### Development Workflow
```bash
# Fork and clone
git clone https://github.com/YOUR_USERNAME/llama4-maverick-mcp-python.git
# Create branch
git checkout -b feature/your-feature
# Make changes and test
pytest
# Commit with conventional commits
git commit -m "feat: add new amazing feature"
# Push and create PR
git push origin feature/your-feature
```
## ๐ License
MIT License - See [LICENSE](LICENSE) file
## ๐จโ๐ป Author
**Yobie Benjamin**
Version 0.9
August 1, 2025
## ๐ Acknowledgments
- Anthropic for the MCP protocol
- Ollama team for local model hosting
- Meta for Llama models
- Python community for excellent libraries
## ๐ Support
- **Issues**: [GitHub Issues](https://github.com/yobieben/llama4-maverick-mcp-python/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yobieben/llama4-maverick-mcp-python/discussions)
- **Documentation**: [Wiki](https://github.com/yobieben/llama4-maverick-mcp-python/wiki)
---
**Ready to experience the power of local AI?** Start with Llama 4 Maverick MCP Python today! ๐ฆ๐๐