Pierre Fitness Platform MCP Server

llm-providers.md•22.6 KiB

# LLM Provider Integration This document describes Pierre's LLM (Large Language Model) provider abstraction layer, which enables pluggable AI model integration with streaming support for chat functionality and recipe generation. ## Overview The LLM module provides a trait-based abstraction that allows Pierre to integrate with multiple AI providers through a unified interface. The design mirrors the fitness provider SPI pattern for consistency. ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ ChatProvider │ │ Runtime provider selector (from env) │ │ PIERRE_LLM_PROVIDER=groq|gemini|local|ollama|vllm │ └──────────────────────────────────┬──────────────────────────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ Gemini │ │ Groq │ │ OpenAI- │ │ Provider │ │ Provider │ │ Compatible │ │ (vision, │ │ (fast LPU │ │ (Ollama, vLLM, │ │ tools) │ │ inference) │ │ LocalAI) │ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │ │ │ │ │ ┌──────────┴──────────┐ │ │ │ │ │ │ ┌────┴────┐ ┌────┴────┐ │ │ │ Ollama │ │ vLLM │ │ │ │localhost│ │localhost│ │ │ │ :11434 │ │ :8000 │ └───────────────────────┴─────────┴────┬────┴──────────┴────┬────┘ │ │ ▼ ▼ ┌───────────────────────────────────┐ │ LlmProvider Trait │ │ (shared interface) │ └───────────────────────────────────┘ ``` ## Quick Start ### Option 1: Cloud Providers (No Setup Required) ```bash # Groq (default, cost-effective, fast) export GROQ_API_KEY="your-groq-api-key" export PIERRE_LLM_PROVIDER=groq # Gemini (full-featured with vision) export GEMINI_API_KEY="your-gemini-api-key" export PIERRE_LLM_PROVIDER=gemini ``` ### Option 2: Local LLM (Privacy-First, No API Costs) ```bash # Use local Ollama instance export PIERRE_LLM_PROVIDER=local export LOCAL_LLM_MODEL=qwen2.5:14b-instruct # Start Pierre ./bin/start-server.sh ``` --- ## Local LLM Setup Guide Running a local LLM gives you complete privacy, no API costs, and works offline. This section covers setting up Ollama (recommended) on macOS. ### Hardware Requirements | Model Size | RAM Required | GPU VRAM | Recommended Hardware | |------------|--------------|----------|---------------------| | 7B-8B (Q4) | 8GB+ | 8GB | MacBook Air M1/M2 16GB | | 14B (Q4) | 12GB+ | 12GB | MacBook Air M2 24GB, MacBook Pro | | 32B (Q4) | 20GB+ | 20-24GB | MacBook Pro M2/M3 Pro 32GB+ | | 70B (Q4) | 40GB+ | 40-48GB | Mac Studio, High-end workstation | **Example: Apple Silicon with 24GB unified memory:** - ✅ Qwen 2.5 7B (~30 tokens/sec) - ✅ Qwen 2.5 14B (~15-20 tokens/sec) **← Recommended** - ⚠️ Qwen 2.5 32B (~5-8 tokens/sec, tight fit) ### Step 1: Install Ollama ```bash # macOS (Homebrew) brew install ollama # Or download from https://ollama.ai/download ``` ### Step 2: Start Ollama Server ```bash # Start the Ollama service (runs in background) ollama serve # Verify it's running curl http://localhost:11434/api/version # Should return: {"version":"0.x.x"} ``` ### Step 3: Pull a Model **Recommended models for function calling:** ```bash # Best for 24GB RAM (recommended) ollama pull qwen2.5:14b-instruct # Faster, lighter alternative ollama pull qwen2.5:7b-instruct # If you have 32GB+ RAM ollama pull qwen2.5:32b-instruct # Alternative: Llama 3.1 (also excellent) ollama pull llama3.1:8b-instruct ``` ### Step 4: Test the Model ```bash # Interactive test ollama run qwen2.5:14b-instruct "What are the benefits of interval training?" # API test curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen2.5:14b-instruct", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ### Step 5: Configure Pierre ```bash # Set environment variables export PIERRE_LLM_PROVIDER=local export LOCAL_LLM_BASE_URL=http://localhost:11434/v1 export LOCAL_LLM_MODEL=qwen2.5:14b-instruct # Or add to .envrc: echo 'export PIERRE_LLM_PROVIDER=local' >> .envrc echo 'export LOCAL_LLM_MODEL=qwen2.5:14b-instruct' >> .envrc direnv allow ``` ### Step 6: Start Pierre and Test ```bash # Start Pierre server ./bin/start-server.sh # Test chat endpoint curl -X POST http://localhost:8081/api/chat/conversations \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_TOKEN" \ -d '{"title": "Test Chat"}' ``` --- ## Model Recommendations ### For Function Calling (Pierre's 14+ Tools) | Model | Size | Function Calling | Speed | Notes | |-------|------|------------------|-------|-------| | **Qwen 2.5 14B-Instruct** | 14B | ⭐⭐⭐⭐⭐ | Fast | Best balance for most hardware | | Qwen 2.5 32B-Instruct | 32B | ⭐⭐⭐⭐⭐ | Medium | Best quality, needs 24GB+ | | Qwen 2.5 7B-Instruct | 7B | ⭐⭐⭐⭐ | Very Fast | Good for lighter hardware | | Llama 3.1 8B-Instruct | 8B | ⭐⭐⭐⭐ | Very Fast | Meta's latest, excellent | | Llama 3.3 70B-Instruct | 70B | ⭐⭐⭐⭐⭐ | Slow | Best quality, needs 48GB+ | | Mistral 7B-Instruct | 7B | ⭐⭐⭐⭐ | Very Fast | Fast and versatile | ### Ollama Model Commands ```bash # List installed models ollama list # Pull a model ollama pull qwen2.5:14b-instruct # Remove a model ollama rm qwen2.5:7b-instruct # Show model info ollama show qwen2.5:14b-instruct ``` --- ## Configuration Reference ### Environment Variables | Variable | Description | Default | Required | |----------|-------------|---------|----------| | `PIERRE_LLM_PROVIDER` | Provider: `groq`, `gemini`, `local`, `ollama`, `vllm`, `localai` | `groq` | No | | `GROQ_API_KEY` | Groq API key | - | Yes (for Groq) | | `GEMINI_API_KEY` | Google Gemini API key | - | Yes (for Gemini) | | `LOCAL_LLM_BASE_URL` | Local LLM API endpoint | `http://localhost:11434/v1` | No | | `LOCAL_LLM_MODEL` | Model name for local provider | `qwen2.5:14b-instruct` | No | | `LOCAL_LLM_API_KEY` | API key for local provider | (empty) | No | ### Provider Capabilities | Capability | Groq | Gemini | Local (Ollama) | |------------|------|--------|----------------| | Streaming | ✅ | ✅ | ✅ | | Function/Tool Calling | ✅ | ✅ | ✅ | | Vision/Image Input | ❌ | ✅ | ❌ | | JSON Mode | ✅ | ✅ | ❌ | | System Messages | ✅ | ✅ | ✅ | | Offline Operation | ❌ | ❌ | ✅ | | Privacy (No Data Sent) | ❌ | ❌ | ✅ | ### Supported Models by Provider #### Groq (Cloud) | Model | Description | Default | |-------|-------------|---------| | `llama-3.3-70b-versatile` | High-quality general purpose | ✓ | | `llama-3.1-8b-instant` | Fast responses for simple tasks | | | `llama-3.1-70b-versatile` | Versatile 70B model | | | `mixtral-8x7b-32768` | Long context window (32K tokens) | | | `gemma2-9b-it` | Google's Gemma 2 instruction-tuned | | **Rate Limits**: Free tier has 12,000 tokens-per-minute limit. #### Gemini (Cloud) | Model | Description | Default | |-------|-------------|---------| | `gemini-2.5-flash` | Latest fast model with improved capabilities | ✓ | | `gemini-2.0-flash-exp` | Experimental fast model | | | `gemini-1.5-pro` | Advanced reasoning capabilities | | | `gemini-1.5-flash` | Balanced performance and cost | | | `gemini-1.0-pro` | Legacy pro model | | #### Local (Ollama/vLLM) | Model | Description | Recommended For | |-------|-------------|-----------------| | `qwen2.5:14b-instruct` | Excellent function calling | 24GB RAM (default) | | `qwen2.5:7b-instruct` | Fast, good function calling | 16GB RAM | | `qwen2.5:32b-instruct` | Best quality function calling | 32GB+ RAM | | `llama3.1:8b-instruct` | Meta's latest 8B | 16GB RAM | | `llama3.1:70b-instruct` | Meta's latest 70B | 48GB+ RAM | | `mistral:7b-instruct` | Fast and versatile | 16GB RAM | --- ## Testing ### Run All LLM Tests ```bash # LLM module unit tests cargo test --test llm_test -- --nocapture # LLM provider abstraction tests cargo test --test llm_provider_test -- --nocapture ``` ### Test Local Provider Specifically ```bash # Ensure Ollama is running first ollama serve & # Test provider initialization cargo test test_llm_provider_type -- --nocapture # Test chat functionality (requires running server) cargo test --test llm_local_integration_test -- --nocapture ``` ### Manual Testing ```bash # 1. Start Ollama ollama serve # 2. Pull test model ollama pull qwen2.5:7b-instruct # 3. Set environment export PIERRE_LLM_PROVIDER=local export LOCAL_LLM_MODEL=qwen2.5:7b-instruct # 4. Start Pierre ./bin/start-server.sh # 5. Test health endpoint curl http://localhost:8081/health # 6. Test chat (requires authentication) # Create admin token first: cargo run --bin admin-setup -- generate-token --service test --expires-days 1 ``` ### Validation Checklist for Local LLM Before deploying with local LLM, verify: - [ ] Ollama server is running (`curl http://localhost:11434/api/version`) - [ ] Model is pulled (`ollama list`) - [ ] Model supports function calling (use Qwen 2.5 or Llama 3.1) - [ ] Environment variables are set correctly - [ ] Pierre can connect to Ollama (`curl http://localhost:8081/health`) - [ ] Chat streaming works - [ ] Tool execution works (test with fitness tools) --- ## Alternative Local Backends ### vLLM (Production) For production deployments with high throughput: ```bash # Install vLLM pip install vllm # Start vLLM server python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-14B-Instruct \ --port 8000 # Configure Pierre export PIERRE_LLM_PROVIDER=vllm export LOCAL_LLM_BASE_URL=http://localhost:8000/v1 export LOCAL_LLM_MODEL=Qwen/Qwen2.5-14B-Instruct ``` **vLLM advantages:** - Parallel function calls - Streaming tool calls - Higher throughput via PagedAttention - Better for multiple concurrent users ### LocalAI ```bash # Run LocalAI with Docker docker run -p 8080:8080 localai/localai # Configure Pierre export PIERRE_LLM_PROVIDER=localai export LOCAL_LLM_BASE_URL=http://localhost:8080/v1 ``` --- ## Basic Usage ### Using ChatProvider (Recommended) The `ChatProvider` enum automatically selects the provider based on environment configuration: ```rust use pierre_mcp_server::llm::{ChatProvider, ChatMessage, ChatRequest}; // Create provider from environment (reads PIERRE_LLM_PROVIDER) let provider = ChatProvider::from_env()?; // Build a chat request let request = ChatRequest::new(vec![ ChatMessage::system("You are a helpful fitness assistant."), ChatMessage::user("What's a good warm-up routine?"), ]) .with_temperature(0.7) .with_max_tokens(1000); // Get a response let response = provider.complete(&request).await?; println!("{}", response.content); ``` ### Explicit Provider Selection ```rust // Force Gemini let provider = ChatProvider::gemini()?; // Force Groq let provider = ChatProvider::groq()?; // Force Local let provider = ChatProvider::local()?; ``` ### Streaming Responses ```rust use futures_util::StreamExt; let request = ChatRequest::new(vec![ ChatMessage::user("Explain the benefits of interval training"), ]) .with_streaming(); let mut stream = provider.complete_stream(&request).await?; while let Some(chunk) = stream.next().await { match chunk { Ok(chunk) => { print!("{}", chunk.delta); if chunk.is_final { println!("\n[Done]"); } } Err(e) => eprintln!("Error: {e}"), } } ``` ### Tool/Function Calling All three providers (Gemini, Groq, Local) support tool calling: ```rust use pierre_mcp_server::llm::{Tool, FunctionDeclaration}; let tools = vec![Tool { function_declarations: vec![FunctionDeclaration { name: "get_weather".to_string(), description: "Get current weather for a location".to_string(), parameters: Some(serde_json::json!({ "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] })), }], }]; let response = provider.complete_with_tools(&request, Some(tools)).await?; if response.has_function_calls() { for call in response.function_calls.unwrap() { println!("Call function: {} with args: {}", call.name, call.args); } } ``` --- ## Recipe Generation Integration Pierre uses LLM providers for the "Combat des Chefs" recipe generation architecture. The workflow differs based on whether the client has LLM capabilities: ### LLM Clients (Claude, ChatGPT, etc.) When an LLM client connects to Pierre, it generates recipes itself: ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ LLM Client │────▶│ Pierre MCP │────▶│ USDA │ │ (Claude) │ │ Server │ │ Database │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ 1. get_recipe_ │ │ │ constraints │ │ │───────────────────▶│ │ │ │ │ │ 2. Returns macro │ │ │ targets, hints │ │ │◀───────────────────│ │ │ │ │ │ [LLM generates │ │ │ recipe locally] │ │ │ │ │ │ 3. validate_ │ │ │ recipe │ │ │───────────────────▶│ │ │ │ Lookup nutrition │ │ │───────────────────▶│ │ │◀───────────────────│ │ 4. Validation │ │ │ result + macros│ │ │◀───────────────────│ │ │ │ │ │ 5. save_recipe │ │ │───────────────────▶│ │ ``` ### Non-LLM Clients For clients without LLM capabilities, Pierre uses its internal LLM (via `ChatProvider`): ```rust // The suggest_recipe tool uses Pierre's configured LLM let provider = ChatProvider::from_env()?; let recipe = generate_recipe_with_llm(&provider, constraints).await?; ``` ### Recipe Tools | Tool | Description | |------|-------------| | `get_recipe_constraints` | Get macro targets and prompt hints for LLM recipe generation | | `validate_recipe` | Validate recipe nutrition via USDA FoodData Central | | `suggest_recipe` | Uses Pierre's internal LLM to generate recipes | | `save_recipe` | Save validated recipes to user collection | | `list_recipes` | List user's saved recipes | | `get_recipe` | Get recipe by ID | | `search_recipes` | Search recipes by name, tags, or ingredients | --- ## API Reference ### LlmCapabilities Bitflags indicating provider features: | Flag | Description | |------|-------------| | `STREAMING` | Supports streaming responses | | `FUNCTION_CALLING` | Supports function/tool calling | | `VISION` | Supports image input | | `JSON_MODE` | Supports structured JSON output | | `SYSTEM_MESSAGES` | Supports system role messages | ```rust // Check capabilities let caps = provider.capabilities(); if caps.supports_streaming() { // Use streaming API } ``` ### ChatMessage Message structure for conversations: ```rust // Constructor methods let system = ChatMessage::system("You are helpful"); let user = ChatMessage::user("Hello!"); let assistant = ChatMessage::assistant("Hi there!"); ``` ### ChatRequest Request configuration with builder pattern: ```rust let request = ChatRequest::new(messages) .with_model("gemini-1.5-pro") // Override default model .with_temperature(0.7) // 0.0 to 1.0 .with_max_tokens(2000) // Max output tokens .with_streaming(); // Enable streaming ``` ### ChatResponse Response structure: | Field | Type | Description | |-------|------|-------------| | `content` | `String` | Generated text | | `model` | `String` | Model used | | `usage` | `Option<TokenUsage>` | Token counts | | `finish_reason` | `Option<String>` | Why generation stopped | ### StreamChunk Streaming chunk structure: | Field | Type | Description | |-------|------|-------------| | `delta` | `String` | Incremental text | | `is_final` | `bool` | Whether this is the last chunk | | `finish_reason` | `Option<String>` | Reason if final | --- ## Module Structure ``` src/llm/ ├── mod.rs # Trait definitions, types, registry, exports ├── provider.rs # ChatProvider enum (runtime selector) ├── gemini.rs # Google Gemini implementation ├── groq.rs # Groq LPU implementation ├── openai_compatible.rs # Generic OpenAI-compatible provider (Ollama, vLLM, LocalAI) └── prompts/ └── mod.rs # System prompts (pierre_system.md) ``` --- ## Adding New Providers To implement a new LLM provider: 1. **Implement the trait**: ```rust use async_trait::async_trait; use pierre_mcp_server::llm::{ LlmProvider, LlmCapabilities, ChatRequest, ChatResponse, ChatStream, AppError, }; pub struct MyProvider { api_key: String, // ... } #[async_trait] impl LlmProvider for MyProvider { fn name(&self) -> &'static str { "myprovider" } fn display_name(&self) -> &'static str { "My Custom Provider" } fn capabilities(&self) -> LlmCapabilities { LlmCapabilities::STREAMING | LlmCapabilities::SYSTEM_MESSAGES } fn default_model(&self) -> &'static str { "my-model-v1" } fn available_models(&self) -> &'static [&'static str] { &["my-model-v1", "my-model-v2"] } async fn complete(&self, request: &ChatRequest) -> Result<ChatResponse, AppError> { // Implementation } async fn complete_stream(&self, request: &ChatRequest) -> Result<ChatStream, AppError> { // Implementation } async fn health_check(&self) -> Result<bool, AppError> { // Implementation } } ``` 2. **Add to ChatProvider enum** in `src/llm/provider.rs`: ```rust pub enum ChatProvider { Gemini(GeminiProvider), Groq(GroqProvider), Local(OpenAiCompatibleProvider), MyProvider(MyProvider), // Add variant } ``` 3. **Update environment config** in `src/config/environment.rs` 4. **Register tests** in `tests/llm_test.rs` --- ## Error Handling All provider methods return `Result<T, AppError>`: ```rust match provider.complete(&request).await { Ok(response) => println!("{}", response.content), Err(AppError { code, message, .. }) => { match code { ErrorCode::RateLimitExceeded => // Handle rate limit ErrorCode::AuthenticationFailed => // Handle auth error _ => // Handle other errors } } } ``` ### Common Local LLM Errors | Error | Cause | Solution | |-------|-------|----------| | "Cannot connect to Ollama" | Ollama not running | Run `ollama serve` | | "Model not found" | Model not pulled | Run `ollama pull MODEL_NAME` | | "Connection refused" | Wrong port/URL | Check `LOCAL_LLM_BASE_URL` | | "Timeout" | Model loading or slow inference | Wait, or use smaller model | --- ## Troubleshooting ### Ollama Won't Start ```bash # Check if already running pgrep -f ollama # Kill existing instance pkill ollama # Start fresh ollama serve ``` ### Model Too Slow ```bash # Use a smaller quantization ollama pull qwen2.5:14b-instruct-q4_K_M # Or use a smaller model ollama pull qwen2.5:7b-instruct ``` ### Out of Memory ```bash # Check model size ollama show qwen2.5:14b-instruct --modelfile # Use smaller model ollama pull qwen2.5:7b-instruct # Or reduce context length in requests ``` ### Function Calling Not Working - Ensure you're using a model trained for function calling (Qwen 2.5, Llama 3.1) - Verify the model is the instruct/chat variant, not base - Check tool definitions are valid JSON Schema --- ## See Also - [Tools Reference - Recipe Management](tools-reference.md#recipe-management) - [Configuration Guide](configuration.md) - [Architecture Documentation](architecture.md)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jfarcand/pierre_mcp_server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

llm-providers.md•22.6 KiB