# AI & LLM Features
The robotics webapp includes a state-of-the-art AI stack with comprehensive LLM management and chat capabilities.
## Overview
The AI/LLM system provides:
- **Local LLM Support**: Ollama and LM Studio integration
- **Cloud LLM Support**: OpenAI, Anthropic, Google AI
- **Model Management**: List, load, unload, and pull models
- **Chat Interface**: Interactive chatbot with selectable personalities
- **Personality System**: 6 built-in personalities + custom personality support
- **Settings Management**: Comprehensive configuration interface
## Quick Start
### 1. Access AI/LLM Management
Navigate to **AI & LLM** → **LLM Management** from the sidebar, or go to `/ai-llm`.
### 2. Set Up Local LLM MCP Server
The AI features require the `local-llm-mcp` server:
```powershell
# Clone the repository
cd D:\Dev\repos
git clone https://github.com/sandraschi/local-llm-mcp.git
cd local-llm-mcp
# Install dependencies
pip install -e ".[dev]"
# Start the server
python -m src.llm_mcp.main --host 127.0.0.1 --port 8007
```
### 3. Configure Backend
Set the environment variable in `backend/.env`:
```env
LOCAL_LLM_MCP_URL=http://localhost:8007
```
### 4. Start Using AI Features
1. **View Models**: The AI/LLM page shows all available models from Ollama, LM Studio, and cloud providers
2. **Load Models**: Click "Load Model" to load a model for inference
3. **Open Chatbot**: Click "Open Chatbot" to start chatting with loaded models
4. **Configure Settings**: Go to Settings → LLM Settings to configure defaults
## Features
### Model Management
#### List Models
- View all available models across providers
- Filter by provider (All, Ollama, LM Studio)
- See model details (context length, max tokens, description)
- Real-time status updates
#### Load/Unload Models
- **Load**: Load a model into memory for inference
- **Unload**: Free memory by unloading models
- **Active Status**: See which models are currently loaded
#### Pull Models (Ollama)
- Download models from Ollama's registry
- Monitor download progress
- Automatic refresh after pull completes
### Chatbot Interface
#### Accessing the Chatbot
- Click "Open Chatbot" button on AI/LLM page
- Or use the chatbot modal from any page (when implemented)
#### Features
- **Model Selection**: Choose which model to use for chat
- **Personality Selection**: Select from 6 built-in personalities:
- **Assistant**: Helpful, harmless, honest assistant
- **Robotics Expert**: Technical expert in robotics and automation
- **Code Assistant**: Expert software engineer
- **Creative Writer**: Creative and imaginative writer
- **Data Analyst**: Data analysis and insights
- **Teacher**: Patient and knowledgeable teacher
- **Message History**: View conversation history
- **Settings Panel**: Collapsible settings for model/personality selection
- **Keyboard Shortcuts**:
- `Enter` to send
- `Shift+Enter` for new line
### Personality System
#### Built-in Personalities
Each personality includes:
- **System Prompt**: Defines the AI's behavior and expertise
- **Temperature**: Controls creativity (0.0-2.0)
- **Max Tokens**: Maximum response length
#### Custom Personalities
Add custom personalities via Settings → LLM Settings or the API:
```typescript
await llmService.addPersonality(
'my_personality',
'My Personality Name',
'You are a helpful assistant specialized in...',
0.7, // temperature
2000 // max tokens
)
```
### Settings
#### LLM Settings Tab
Configure:
- **Default Model**: Model to use by default
- **Default Personality**: Personality to use by default
- **Temperature**: Default temperature (0.0-2.0)
- **Max Tokens**: Default maximum tokens
- **Provider URLs**:
- Ollama URL (default: `http://localhost:11434`)
- LM Studio URL (default: `http://localhost:1234`)
- **API Keys**:
- OpenAI API Key (optional)
- Anthropic API Key (optional)
- Google API Key (optional)
#### MCP Servers Tab
Configure MCP server connections:
- Enable/disable servers
- Set custom URLs
- View connection status
## API Endpoints
### Model Management
- `GET /api/llm/models` - List all models
- `GET /api/llm/models/ollama` - List Ollama models
- `GET /api/llm/models/lmstudio` - List LM Studio models
- `POST /api/llm/models/{model_id}/load` - Load a model
- `POST /api/llm/models/{model_id}/unload` - Unload a model
- `POST /api/llm/models/{model_id}/pull` - Pull/download a model
- `GET /api/llm/models/active` - Get active models
### Generation
- `POST /api/llm/generate` - Generate text from prompt
- `POST /api/llm/chat` - Chat completion with messages
### System
- `GET /api/llm/health` - Health check
- `GET /api/llm/system-info` - System information
- `GET /api/llm/personalities` - Get personalities
- `POST /api/llm/personalities` - Add custom personality
## Provider Setup
### Ollama
1. **Install Ollama**: Download from [ollama.ai](https://ollama.ai)
2. **Start Ollama**: Ollama runs on `http://localhost:11434` by default
3. **Pull Models**: Use Ollama CLI or the webapp interface
```powershell
ollama pull llama3
ollama pull mistral
```
### LM Studio
1. **Install LM Studio**: Download from [lmstudio.ai](https://lmstudio.ai)
2. **Start LM Studio**: Ensure the local server is running
3. **Load Models**: Load models in LM Studio interface
4. **Configure URL**: Default is `http://localhost:1234`
### Cloud Providers
#### OpenAI
- Get API key from [platform.openai.com](https://platform.openai.com)
- Add key in Settings → LLM Settings → OpenAI API Key
- Models: `gpt-4o`, `gpt-4-turbo`, `gpt-3.5-turbo`
#### Anthropic
- Get API key from [console.anthropic.com](https://console.anthropic.com)
- Add key in Settings → LLM Settings → Anthropic API Key
- Models: `claude-3-opus`, `claude-3-sonnet`, `claude-3-haiku`
#### Google AI
- Get API key from [makersuite.google.com](https://makersuite.google.com)
- Add key in Settings → LLM Settings → Google API Key
- Models: `gemini-pro`, `gemini-ultra`
## Troubleshooting
### Models Not Showing
1. **Check MCP Server**: Ensure `local-llm-mcp` is running on port 8007
2. **Check Provider**: Verify Ollama/LM Studio is running
3. **Check Health**: View health status on AI/LLM page
4. **Check Logs**: Review backend logs for connection errors
### Chatbot Not Responding
1. **Model Loaded**: Ensure a model is loaded (shows "Active" badge)
2. **Model Selected**: Verify model is selected in chatbot settings
3. **Check Backend**: Ensure backend is running and connected to MCP server
4. **Check Logs**: Review backend logs for generation errors
### Model Loading Fails
1. **Provider Running**: Ensure Ollama/LM Studio is running
2. **Model Exists**: Verify model is available in provider
3. **Memory Available**: Check system memory (large models need RAM)
4. **Check Logs**: Review backend logs for detailed error messages
## Best Practices
1. **Load Models On-Demand**: Only load models when needed to save memory
2. **Use Appropriate Models**: Choose models based on task complexity
3. **Monitor Resources**: Watch CPU/RAM usage with large models
4. **Save Settings**: Configure defaults in Settings for convenience
5. **Use Personalities**: Leverage personalities for consistent behavior
## Architecture
The AI/LLM system uses a three-layer architecture:
1. **MCP Server Layer**: `local-llm-mcp` provides unified interface to all LLM providers
2. **Backend Layer**: FastAPI backend (`backend/llm_service.py`) manages LLM operations
3. **Frontend Layer**: React components (`src/app/ai-llm/`, `src/components/ChatbotModal.tsx`)
### Data Flow
```
Frontend (React)
↓ HTTP API
Backend (FastAPI)
↓ HTTP API
Local LLM MCP Server
↓ Provider APIs
Ollama / LM Studio / Cloud APIs
```
## Code Quality
The backend Python code follows strict quality standards:
- **Ruff Linting**: All code passes `ruff check`
- **Ruff Formatting**: Code formatted with `ruff format`
- **Type Hints**: Modern Python type annotations (`dict` instead of `Dict`, `X | None` instead of `Optional[X]`)
- **Error Handling**: Proper exception chaining with `from e`
- **Code Style**: Consistent formatting and naming conventions
Run linting:
```powershell
cd backend
ruff check .
ruff format .
```
## Future Enhancements
Planned features:
- Streaming responses for real-time chat
- Model fine-tuning interface
- Embedding generation
- Multi-modal support (images, audio)
- Advanced prompt templates
- Conversation history persistence
- Model performance metrics
## Related Documentation
- [MCP Integration](./MCP_INTEGRATION.md) - Technical architecture
- [MCP Setup](./MCP_SETUP.md) - Server setup guide
- [Settings Guide](./SETTINGS_GUIDE.md) - Configuration details