Skip to main content

Delia

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`GEMINI_API_KEY`	No	Your API key from aistudio.google.com (required for Gemini cloud backend)
`DELIA_JWT_SECRET`	No	Your secure secret for JWT authentication
`DELIA_AUTH_ENABLED`	No	Enable authentication for HTTP mode with multiple users

Capabilities

Server capabilities have not been inspected yet.

Tools

Functions exposed to the LLM to take actions

Name	Description
delegate	Execute a task on local/remote GPU with intelligent 3-tier model selection. Routes to optimal backend based on content size, task type, and GPU availability. WHEN TO USE: "locally", "on my GPU", "without API", "privately" → Use this tool Code review, generation, analysis tasks → Use this tool Any task you want processed on local hardware → Use this tool Args: task: Task type determines model tier: - "quick" or "summarize" → quick tier (fast, 14B model) - "generate", "review", "analyze" → coder tier (code-optimized 14B) - "plan", "critique" → moe tier (deep reasoning 30B+) content: The prompt or content to process (required) file: Optional file path to include in context model: Force specific tier - "quick" \| "coder" \| "moe" \| "thinking" OR natural language: "7b", "14b", "30b", "small", "large", "coder model", "fast", "complex", "thinking" language: Language hint for better prompts - python\|typescript\|react\|nextjs\|rust\|go context: Serena memory names to include (comma-separated: "architecture,decisions") symbols: Code symbols to focus on (comma-separated: "Foo,Bar/calculate") include_references: True if content includes symbol usages from elsewhere backend_type: Force backend type - "local" \| "remote" (default: auto-select) ROUTING LOGIC: Content > 32K tokens → Uses backend with largest context window Prefer local GPUs (lower latency) unless unavailable Falls back to remote if local circuit breaker is open Load balances across available backends based on priority weights Returns: LLM response with metadata footer showing model, tokens, time, backend Examples: delegate(task="review", content="", language="python") delegate(task="generate", content="Write a REST API", backend_type="local") delegate(task="plan", content="Design caching strategy", model="moe") delegate(task="analyze", content="Debug this error", model="14b") delegate(task="quick", content="Summarize this article", model="fast")
think	Deep reasoning for complex problems using local GPU with extended thinking. Offloads complex analysis to local LLM - zero API costs. WHEN TO USE: Complex multi-step problems requiring careful reasoning Architecture decisions, trade-off analysis Debugging strategies, refactoring plans Any situation requiring "thinking through" before acting Args: problem: The problem or question to think through (required) context: Supporting information - code, docs, constraints (optional) depth: Reasoning depth level: - "quick" → Fast answer, no extended thinking (14B model) - "normal" → Balanced reasoning with thinking (14B coder) - "deep" → Thorough multi-step analysis (30B+ MoE model) ROUTING: Uses largest available GPU for deep thinking Automatically enables thinking mode for normal/deep Prefers local GPU, falls back to remote if needed Returns: Structured analysis with step-by-step reasoning and conclusions Examples: think(problem="How should we handle authentication?", depth="deep") think(problem="Debug this error", context="", depth="normal")
batch	Execute multiple tasks in PARALLEL across all available GPUs for maximum throughput. Distributes work across local and remote backends intelligently. WHEN TO USE: Processing multiple files/documents simultaneously Bulk code review, summarization, or analysis Any workload that can be parallelized Args: tasks: JSON string containing an array of task objects. Each object can have: - task: "quick"\|"summarize"\|"generate"\|"review"\|"analyze"\|"plan"\|"critique" - content: The content to process (required) - file: Optional file path - model: Force tier - "quick"\|"coder"\|"moe" - language: Language hint for code tasks ROUTING LOGIC: Distributes tasks across ALL available GPUs (local + remote) Large content (>32K tokens) → Routes to backend with sufficient context Normal content → Round-robin for parallel execution Respects backend health and circuit breakers Returns: Combined results from all tasks with timing and routing info Example: batch('[ {"task": "summarize", "content": "doc1..."}, {"task": "review", "content": "code2...", "language": "python"}, {"task": "analyze", "content": "log3..."} ]')
health	Check health status of Delia and all configured GPU backends. Only checks backends that are enabled in settings.json. Shows availability, loaded models, usage stats, and cost savings. WHEN TO USE: Verify backends are available before delegating Check which models are currently loaded Monitor usage statistics and cost savings Diagnose connection issues Returns: JSON with: - status: "healthy" \| "degraded" \| "unhealthy" - backends: Array of configured backend status - usage: Token counts and call statistics per tier - cost_savings: Estimated savings vs cloud API - routing: Current routing configuration
queue_status	Get current status of the model queue system. Shows loaded models, queued requests, and GPU memory usage. Useful for monitoring queue performance and debugging loading issues. Returns: JSON with queue status, loaded models, and pending requests
models	List all configured models across all GPU backends. Shows model tiers (quick/coder/moe) and which are currently loaded. WHEN TO USE: Check which models are available for tasks Verify model configuration across backends Understand task-to-model routing logic Returns: JSON with: - backends: All configured backends with their models - currently_loaded: Models in GPU memory (no load time) - selection_logic: How tasks map to model tiers
switch_backend	Switch the active LLM backend. Args: backend_id: ID of the backend to switch to (from settings.json) Returns: Confirmation message with current status
switch_model	Switch the model for a specific tier at runtime. This allows dynamic model experimentation without restarting the server. Changes are persisted to settings.json for consistency across restarts. Args: tier: Model tier to change - "quick", "coder", "moe", or "thinking" model_name: New model name (must be available in the current backend) Returns: Confirmation with model change details and availability status
get_model_info_tool	Get detailed information about a specific model. Returns VRAM requirements, context window size, and tier classification. For configured models, shows exact values. For unknown models, provides estimates. Args: model_name: Name of the model to get info for (e.g., "qwen2.5:14b", "llama3.1:70b") Returns: Formatted model information including VRAM, context, and tier

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zbrdc/delia'

If you have feedback or need assistance with the MCP directory API, please join our Discord server