๐ฆ Llama 4 Maverick MCP Server (Python)
Author: Yobie Benjamin
Version: 0.9
Date: August 1, 2025
A Python implementation of the Model Context Protocol (MCP) server that bridges Llama models with Claude Desktop through Ollama. This pure Python solution offers clean architecture, high performance, and easy extensibility.
๐ Table of Contents
๐ฏ What Would You Use This Llama MCP Server For?
The Revolution of Local AI + Claude Desktop
This Python MCP server creates a powerful bridge between Claude Desktop's sophisticated interface and your locally-hosted Llama models. Here's what makes this combination revolutionary:
1. Privacy-First AI Operations ๐
The Challenge: Organizations handling sensitive data can't use cloud AI due to privacy concerns.
The Solution: This MCP server keeps everything local while providing enterprise-grade AI capabilities.
Real-World Applications:
Healthcare: A hospital can analyze patient records using AI without violating HIPAA compliance
Legal: Law firms can process confidential client documents with complete privacy
Finance: Banks can analyze transaction data without exposing customer information
Government: Agencies can process classified documents on air-gapped systems
Example Implementation:
# Process sensitive medical records locally
async def analyze_patient_data(patient_file):
# Data never leaves your server
content = await tool_manager.execute("read_file", {"path": patient_file})
# Use specialized medical model
analysis = await llama_service.complete(
prompt=f"Analyze patient data for risk factors: {content}",
model="medical-llama:latest", # Your HIPAA-compliant fine-tuned model
temperature=0.1 # Low temperature for medical accuracy
)
# Store results locally with encryption
await secure_storage.save(analysis, encrypted=True)
2. Custom Model Deployment ๐ฏ
The Challenge: Generic models don't understand your domain-specific language and requirements.
The Solution: Deploy your own fine-tuned models through the MCP interface.
Real-World Applications:
Research Labs: Use models trained on proprietary research data
Enterprises: Deploy models fine-tuned on company documentation
Educational Institutions: Use models trained on curriculum-specific content
Industry-Specific: Legal, medical, financial, or technical domain models
Example Implementation:
# Switch between specialized models based on task
class ModelSelector:
def __init__(self):
self.models = {
"general": "llama3:latest",
"code": "codellama:latest",
"medical": "medical-llama:13b",
"legal": "legal-llama:7b",
"finance": "finance-llama:13b"
}
async def select_and_query(self, domain: str, prompt: str):
model = self.models.get(domain, "llama3:latest")
return await llama_service.complete(
prompt=prompt,
model=model,
temperature=0.3 if domain in ["medical", "legal"] else 0.7
)
3. Hybrid Intelligence Systems ๐
The Challenge: No single AI model excels at everything.
The Solution: Combine Claude's reasoning with Llama's generation capabilities.
Real-World Applications:
Software Development: Claude plans architecture, Llama generates implementation
Content Creation: Claude creates outlines, Llama writes detailed content
Data Analysis: Claude interprets results, Llama generates reports
Research: Claude formulates hypotheses, Llama explores implications
Example Implementation:
# Hybrid workflow combining Claude and Llama
class HybridAI:
async def complex_task(self, requirement: str):
# Step 1: Use Claude for high-level planning
plan = await claude.create_plan(requirement)
# Step 2: Use local Llama for detailed implementation
implementation = await llama_service.complete(
prompt=f"Implement this plan: {plan}",
model="codellama:34b",
max_tokens=4096
)
# Step 3: Use Claude for review and refinement
refined = await claude.review_and_refine(implementation)
return refined
4. Offline and Edge Computing ๐
The Challenge: Many environments lack reliable internet or prohibit cloud connections.
The Solution: Full AI capabilities without any internet requirement.
Real-World Applications:
Remote Operations: Oil rigs, ships, remote research stations
Industrial IoT: Factory floors with real-time requirements
Field Work: Geological surveys, wildlife research, disaster response
Secure Facilities: Military bases, research labs, government buildings
Example Implementation:
# Edge deployment for industrial quality control
class EdgeQualityControl:
def __init__(self):
self.config = Config(
llama_model_name="quality-control:latest",
enable_streaming=True,
max_context_length=8192 # Optimized for edge devices
)
async def inspect_product(self, sensor_data: dict):
# Process sensor data locally
analysis = await llama_service.complete(
prompt=f"Analyze sensor readings for defects: {sensor_data}",
temperature=0.1, # Consistent results needed
max_tokens=256 # Quick response for real-time processing
)
# Trigger local actions based on analysis
if "defect" in analysis.lower():
await self.trigger_alert(analysis)
return analysis
5. Experimentation and Research ๐งช
The Challenge: Researchers need reproducible results and full control over model behavior.
The Solution: Complete transparency and control over every aspect of the AI pipeline.
Real-World Applications:
Academic Research: Reproducible experiments for papers
Model Comparison: A/B testing different models and parameters
Behavior Analysis: Understanding how models respond to different inputs
Prompt Engineering: Developing optimal prompts for specific tasks
Example Implementation:
# Research experiment framework
class ExperimentRunner:
async def run_experiment(self, hypothesis: str, test_cases: list):
results = []
# Test multiple models
for model in ["llama3:7b", "llama3:13b", "llama3:70b"]:
# Test multiple parameters
for temp in [0.1, 0.5, 0.9, 1.5]:
model_results = []
for test in test_cases:
response = await llama_service.complete(
prompt=test,
model=model,
temperature=temp,
seed=42 # Reproducible results
)
model_results.append({
"input": test,
"output": response,
"model": model,
"temperature": temp,
"timestamp": datetime.now()
})
results.append(model_results)
# Analyze and save results
analysis = self.analyze_results(results)
await self.save_experiment(hypothesis, results, analysis)
return analysis
6. Cost-Effective Scaling ๐ฐ
The Challenge: API costs can become prohibitive for high-volume applications.
The Solution: One-time hardware investment for unlimited usage.
Real-World Applications:
Startups: Prototype without burning through funding
Education: Provide AI access to all students without budget concerns
Non-profits: Leverage AI without ongoing costs
High-volume Processing: Batch jobs, data analysis, content generation
Cost Analysis Example:
# Cost comparison calculator
class CostAnalyzer:
def calculate_savings(self, monthly_tokens: int):
# API costs (approximate)
api_cost_per_million = 15.00 # USD
monthly_api_cost = (monthly_tokens / 1_000_000) * api_cost_per_million
# Local costs (one-time hardware)
hardware_cost = 2000 # Good GPU setup
electricity_monthly = 50 # Approximate
# Calculate break-even
months_to_break_even = hardware_cost / (monthly_api_cost - electricity_monthly)
return {
"monthly_api_cost": monthly_api_cost,
"monthly_local_cost": electricity_monthly,
"monthly_savings": monthly_api_cost - electricity_monthly,
"break_even_months": months_to_break_even,
"first_year_savings": (monthly_api_cost * 12) - (hardware_cost + electricity_monthly * 12)
}
7. Real-Time Processing โก
The Challenge: Network latency makes cloud AI unsuitable for real-time applications.
The Solution: Sub-second response times with local processing.
Real-World Applications:
Trading Systems: Analyze market data in milliseconds
Gaming: Real-time NPC dialogue and behavior
Robotics: Immediate response to sensor inputs
Live Translation: Instant language translation
Example Implementation:
# Real-time stream processing
class StreamProcessor:
def __init__(self):
self.buffer = []
self.processing = False
async def process_stream(self, data_stream):
async for chunk in data_stream:
self.buffer.append(chunk)
if not self.processing and len(self.buffer) > 0:
self.processing = True
# Process immediately without network delay
result = await llama_service.complete(
prompt=f"Analyze: {self.buffer[-1]}",
model="tinyllama:latest", # Fast model for real-time
max_tokens=50,
stream=True
)
async for token in result:
yield token # Stream results immediately
self.processing = False
8. Custom Tool Integration ๐ ๏ธ
The Challenge: Generic AI can't interact with your specific systems and databases.
The Solution: Build custom tools that integrate with your infrastructure.
Real-World Applications:
DevOps: AI that can manage your specific infrastructure
Database Management: Query and manage your databases via natural language
System Administration: Automate complex administrative tasks
Business Intelligence: Connect to your BI tools and data warehouses
Example Implementation:
# Custom tool for database operations
class DatabaseTool(BaseTool):
@property
def name(self) -> str:
return "company_database"
@property
def description(self) -> str:
return "Query and manage company database"
async def execute(self, query: str, operation: str = "select") -> ToolResult:
# Connect to your specific database
async with get_company_db() as db:
if operation == "select":
results = await db.fetch(query)
return ToolResult(success=True, data=results)
elif operation == "analyze":
# Use Llama to analyze query results
analysis = await llama_service.complete(
prompt=f"Analyze this data: {results}",
temperature=0.3
)
return ToolResult(success=True, data=analysis)
9. Compliance and Governance ๐
The Challenge: Regulatory requirements demand complete control and audit trails.
The Solution: Full transparency and logging of all AI operations.
Real-World Applications:
Healthcare: HIPAA compliance with audit trails
Finance: SOX compliance with transaction monitoring
Legal: Attorney-client privilege protection
Government: Security clearance requirements
Example Implementation:
# Compliance-aware AI system
class ComplianceAI:
def __init__(self):
self.audit_logger = AuditLogger()
self.encryption = EncryptionService()
async def process_regulated_data(self, data: str, user: str, purpose: str):
# Log access for audit
audit_id = await self.audit_logger.log_access(
user=user,
data_type="regulated",
purpose=purpose,
timestamp=datetime.now()
)
# Encrypt data in transit
encrypted = self.encryption.encrypt(data)
# Process with local model (data never leaves premises)
result = await llama_service.complete(
prompt=f"Process: {encrypted}",
model="compliance-llama:latest"
)
# Log completion
await self.audit_logger.log_completion(
audit_id=audit_id,
success=True,
result_hash=hashlib.sha256(result.encode()).hexdigest()
)
return self.encryption.encrypt(result)
10. Educational Environments ๐
The Challenge: Educational institutions need affordable AI access for all students.
The Solution: Single deployment serves unlimited students without per-use costs.
Real-World Applications:
Computer Science: Teaching AI/ML concepts hands-on
Research Projects: Student research without budget constraints
Writing Centers: AI-assisted writing for all students
Language Learning: Personalized language practice
Example Implementation:
# Educational AI assistant
class EducationalAssistant:
def __init__(self):
self.student_profiles = {}
self.learning_analytics = LearningAnalytics()
async def personalized_tutoring(self, student_id: str, subject: str, question: str):
# Get student's learning profile
profile = self.student_profiles.get(student_id, self.create_profile(student_id))
# Adjust response based on student level
response = await llama_service.complete(
prompt=f"""
Student Level: {profile['level']}
Subject: {subject}
Question: {question}
Provide an explanation appropriate for this student's level.
""",
temperature=0.7,
model="education-llama:latest"
)
# Track learning progress
await self.learning_analytics.record_interaction(
student_id=student_id,
subject=subject,
question=question,
response=response
)
return response
๐ Why Python?
Advantages Over TypeScript/Node.js
Aspect | Python Advantage | Use Case |
Scientific Computing | NumPy, SciPy, Pandas integration | Data analysis, research |
ML Ecosystem | Direct integration with PyTorch, TensorFlow | Model experimentation |
Simplicity | Cleaner async/await syntax | Faster development |
Libraries | Vast ecosystem of AI/ML tools | Extended functionality |
Debugging | Better error messages and debugging tools | Easier troubleshooting |
Performance | uvloop for high-performance async | Better concurrency |
Type Safety | Type hints + Pydantic validation | Runtime validation |
โจ Features
Core Capabilities
๐ High Performance: Async/await with uvloop support
๐ ๏ธ 10+ Built-in Tools: Web search, file ops, calculations, and more
๐ Prompt Templates: Pre-defined prompts for common tasks
๐ Resource Management: Access templates and documentation
๐ Streaming Support: Real-time token generation
๐ง Highly Configurable: Environment-based configuration
๐ Structured Logging: Comprehensive debugging support
๐งช Fully Tested: Pytest test suite included
Python-Specific Features
๐ผ Data Science Integration: Works with Pandas, NumPy
๐ค ML Framework Compatible: Integrate with PyTorch, TensorFlow
๐ Analytics Built-in: Performance metrics and monitoring
๐ Plugin System: Easy to extend with Python packages
๐ฏ Type Safety: Pydantic models for validation
๐ Security: Built-in sanitization and validation
๐ป System Requirements
Minimum Requirements
Component | Minimum | Recommended | Optimal |
Python | 3.9+ | 3.11+ | Latest |
CPU | 4 cores | 8 cores | 16+ cores |
RAM | 8GB | 16GB | 32GB+ |
Storage | 10GB SSD | 50GB SSD | 100GB NVMe |
OS | Linux/macOS/Windows | Ubuntu 22.04 | Latest Linux |
Model Requirements
Model | Parameters | RAM | Use Case |
tinyllama
| 1.1B | 2GB | Testing, quick responses |
llama3:7b
| 7B | 8GB | General purpose |
llama3:13b
| 13B | 16GB | Advanced tasks |
llama3:70b
| 70B | 48GB | Professional use |
codellama
| 7-34B | 8-32GB | Code generation |
๐ Quick Start
# Clone the repository
git clone https://github.com/yobieben/llama4-maverick-mcp-python.git
cd llama4-maverick-mcp-python
# Run setup (handles everything)
python setup.py
# Start the server
python -m llama4_maverick_mcp.server
That's it! The server is now running and ready to connect to Claude Desktop.
๐ฆ Detailed Installation
Step 1: Python Setup
# Check Python version
python --version # Should be 3.9+
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows:
venv\Scripts\activate
Step 2: Install Dependencies
# Install the package in development mode
pip install -e .
# For development with testing tools
pip install -e .[dev]
Step 3: Install Ollama
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# Download from https://ollama.com/download/windows
Step 4: Configure Environment
# Copy example configuration
cp .env.example .env
# Edit configuration
nano .env # or your preferred editor
Step 5: Download Models
# Start Ollama service
ollama serve
# In another terminal, pull models
ollama pull llama3:latest
ollama pull codellama:latest
ollama pull tinyllama:latest
Step 6: Configure Claude Desktop
Add to Claude Desktop configuration:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"llama4-python": {
"command": "python",
"args": ["-m", "llama4_maverick_mcp.server"],
"cwd": "/path/to/llama4-maverick-mcp-python",
"env": {
"PYTHONPATH": "/path/to/llama4-maverick-mcp-python/src",
"LLAMA_MODEL_NAME": "llama3:latest"
}
}
}
}
โ๏ธ Configuration
Environment Variables
Create a .env file:
# Ollama Configuration
LLAMA_API_URL=http://localhost:11434
LLAMA_MODEL_NAME=llama3:latest
LLAMA_API_KEY= # Optional
# Server Configuration
MCP_LOG_LEVEL=INFO
MCP_SERVER_HOST=localhost
MCP_SERVER_PORT=3000
# Features
ENABLE_STREAMING=true
ENABLE_FUNCTION_CALLING=true
ENABLE_VISION=false
ENABLE_CODE_EXECUTION=false # Security risk
ENABLE_WEB_SEARCH=true
# Model Parameters
TEMPERATURE=0.7 # 0.0-2.0
TOP_P=0.9 # 0.0-1.0
TOP_K=40 # 1-100
REPEAT_PENALTY=1.1
SEED=42 # For reproducibility
# File System
FILE_SYSTEM_BASE_PATH=/safe/path
ALLOW_FILE_WRITES=true
# Performance
MAX_CONTEXT_LENGTH=128000
MAX_CONCURRENT_REQUESTS=10
REQUEST_TIMEOUT_MS=30000
CACHE_TTL=3600
CACHE_MAX_SIZE=1000
# Debug
DEBUG=false
VERBOSE_LOGGING=false
Configuration Classes
from llama4_maverick_mcp.config import Config
# Create custom configuration
config = Config(
llama_model_name="codellama:latest",
temperature=0.3,
enable_code_execution=True
)
# Access configuration
print(config.llama_model_name)
print(config.get_model_params())
๐ ๏ธ Available Tools
Built-in Tools
Tool | Description | Example |
calculator
| Mathematical calculations | 2 + 2
, sqrt(16)
|
datetime
| Date/time operations | Current time, date math |
json_tool
| JSON manipulation | Parse, extract, transform |
web_search
| Search the web | Query for information |
file_read
| Read files | Access local files |
file_write
| Write files | Save data locally |
list_files
| List directories | Browse file system |
code_executor
| Run code | Execute Python/JS/Bash |
http_request
| HTTP calls | API interactions |
Creating Custom Tools
# src/llama4_maverick_mcp/tools/custom/my_tool.py
from pydantic import BaseModel, Field
from ..base import BaseTool, ToolResult
class MyToolParams(BaseModel):
"""Parameters for my custom tool."""
input_text: str = Field(..., description="Text to process")
option: str = Field(default="default", description="Processing option")
class MyCustomTool(BaseTool):
@property
def name(self) -> str:
return "my_custom_tool"
@property
def description(self) -> str:
return "Performs custom processing on text"
@property
def parameters(self) -> type[BaseModel]:
return MyToolParams
async def execute(self, input_text: str, option: str = "default") -> ToolResult:
# Your custom logic here
result = f"Processed: {input_text} with option: {option}"
return ToolResult(
success=True,
data={"result": result, "length": len(input_text)}
)
๐ Usage Examples
Basic Usage
import asyncio
from llama4_maverick_mcp import MCPServer, Config
async def main():
# Create server with custom config
config = Config(
llama_model_name="llama3:latest",
temperature=0.7
)
server = MCPServer(config)
# Run the server
await server.run()
if __name__ == "__main__":
asyncio.run(main())
Direct API Usage
from llama4_maverick_mcp import LlamaService, Config
async def generate_text():
config = Config()
llama = LlamaService(config)
await llama.initialize()
# Simple completion
result = await llama.complete(
prompt="Explain quantum computing",
temperature=0.5,
max_tokens=200
)
print(result)
# Chat completion
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is Python?"}
]
response = await llama.complete_chat(messages)
print(response)
Tool Execution
from llama4_maverick_mcp.tools import ToolManager
async def use_tools():
manager = ToolManager(Config())
await manager.initialize()
# Execute calculator
result = await manager.execute_tool(
"calculator",
{"expression": "factorial(5) + sqrt(16)"}
)
print(result)
# Read file
content = await manager.execute_tool(
"file_read",
{"path": "config.json"}
)
print(content)
๐ Real-World Applications
1. Document Analysis Pipeline
class DocumentAnalyzer:
def __init__(self):
self.config = Config(temperature=0.3)
self.llama = LlamaService(self.config)
self.tools = ToolManager(self.config)
async def analyze_documents(self, directory: str):
# List all documents
files = await self.tools.execute_tool(
"list_files",
{"path": directory, "recursive": True}
)
results = []
for file in files['data']['files']:
if file.endswith(('.txt', '.md', '.pdf')):
# Read document
content = await self.tools.execute_tool(
"file_read",
{"path": file}
)
# Analyze with Llama
analysis = await self.llama.complete(
prompt=f"Summarize and extract key points: {content['data']}",
max_tokens=500
)
results.append({
"file": file,
"analysis": analysis
})
return results
2. Code Review System
class CodeReviewer:
async def review_code(self, code: str, language: str = "python"):
prompt = f"""
Review this {language} code for:
1. Security vulnerabilities
2. Performance issues
3. Best practices
4. Potential bugs
Code:
```{language}
{code}
```
Provide specific suggestions for improvement.
"""
review = await llama_service.complete(
prompt=prompt,
model="codellama:latest",
temperature=0.3
)
return self.parse_review(review)
3. Research Assistant
class ResearchAssistant:
async def research_topic(self, topic: str):
# Search for information
search_results = await self.tools.execute_tool(
"web_search",
{"query": topic, "max_results": 10}
)
# Analyze sources
analysis = await self.llama.complete(
prompt=f"Analyze these sources about {topic}: {search_results}",
temperature=0.5
)
# Generate report
report = await self.llama.complete(
prompt=f"Write a comprehensive report on {topic} based on: {analysis}",
temperature=0.7,
max_tokens=2000
)
# Save report
await self.tools.execute_tool(
"file_write",
{
"path": f"research_{topic}_{datetime.now().strftime('%Y%m%d')}.md",
"content": report
}
)
return report
๐งช Development
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=llama4_maverick_mcp
# Run specific test
pytest tests/test_llama_service.py
# Run with verbose output
pytest -v
Code Quality
# Format code with Black
black src/
# Lint with Ruff
ruff check src/
# Type checking with mypy
mypy src/
# All quality checks
make quality
Creating Tests
# tests/test_my_tool.py
import pytest
from llama4_maverick_mcp.tools.custom.my_tool import MyCustomTool
@pytest.mark.asyncio
async def test_my_custom_tool():
tool = MyCustomTool()
result = await tool.execute(
input_text="Hello, world!",
option="uppercase"
)
assert result.success
assert "Hello, world!" in result.data["result"]
assert result.data["length"] == 13
๐ Performance Optimization
1. Use uvloop (Linux/macOS)
# Automatically enabled if available
# 2-4x performance improvement for async operations
pip install uvloop
2. Model Optimization
# Use smaller models for simple tasks
config = Config(
llama_model_name="tinyllama:latest", # 1.1B params, very fast
max_context_length=4096, # Reduce context for speed
temperature=0.1 # Lower temperature for consistency
)
3. Caching Strategy
from functools import lru_cache
from cachetools import TTLCache
class CachedLlamaService(LlamaService):
def __init__(self, config):
super().__init__(config)
self.cache = TTLCache(maxsize=1000, ttl=3600)
async def complete(self, prompt: str, **kwargs):
cache_key = f"{prompt}:{kwargs}"
if cache_key in self.cache:
return self.cache[cache_key]
result = await super().complete(prompt, **kwargs)
self.cache[cache_key] = result
return result
4. Batch Processing
import asyncio
async def batch_process(prompts: list):
# Process multiple prompts concurrently
tasks = [
llama_service.complete(prompt, temperature=0.5)
for prompt in prompts
]
# Limit concurrency to avoid overwhelming the system
semaphore = asyncio.Semaphore(5)
async def limited_task(task):
async with semaphore:
return await task
results = await asyncio.gather(*[limited_task(t) for t in tasks])
return results
๐ง Troubleshooting
Common Issues
Issue | Solution |
ImportError | Check Python path: export PYTHONPATH=$PYTHONPATH:$(pwd)/src
|
Ollama not found | Install: curl -fsSL https://ollama.com/install.sh | sh
|
Model not available | Pull model: ollama pull llama3:latest
|
Permission denied | Check file permissions and base path configuration |
Memory error | Use smaller model or increase system RAM |
Timeout errors | Increase REQUEST_TIMEOUT_MS
in configuration |
Debug Mode
# Enable detailed logging
config = Config(
debug_mode=True,
verbose_logging=True,
log_level="DEBUG"
)
# Or via environment
export DEBUG=true
export MCP_LOG_LEVEL=DEBUG
export VERBOSE_LOGGING=true
Health Check
async def health_check():
"""Check system health."""
checks = {
"python_version": sys.version,
"ollama_connected": config.validate_ollama_connection(),
"models_available": await llama_service.list_models(),
"tools_loaded": len(await tool_manager.get_tools()),
"memory_usage": psutil.virtual_memory().percent,
"disk_usage": psutil.disk_usage('/').percent
}
return {
"status": "healthy" if all(checks.values()) else "degraded",
"checks": checks,
"timestamp": datetime.now().isoformat()
}
๐ค Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas for Contribution
๐ ๏ธ New tools and integrations
๐ Documentation improvements
๐ Bug fixes
๐ Performance optimizations
๐งช Test coverage
๐ Internationalization
Development Workflow
# Fork and clone
git clone https://github.com/YOUR_USERNAME/llama4-maverick-mcp-python.git
# Create branch
git checkout -b feature/your-feature
# Make changes and test
pytest
# Commit with conventional commits
git commit -m "feat: add new amazing feature"
# Push and create PR
git push origin feature/your-feature
๐ License
MIT License - See LICENSE file
๐จโ๐ป Author
Yobie Benjamin
Version 0.9
August 1, 2025
๐ Acknowledgments
Anthropic for the MCP protocol
Ollama team for local model hosting
Meta for Llama models
Python community for excellent libraries
๐ Support
Ready to experience the power of local AI? Start with Llama 4 Maverick MCP Python today! ๐ฆ๐๐