HydraMCP
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| OLLAMA_URL | No | The URL for the Ollama backend (e.g., http://localhost:11434). Can be skipped if not using this backend. | http://localhost:11434 |
| CLIPROXYAPI_KEY | No | The local API key/passphrase for CLIProxyAPI, which must match the key defined in its config.yaml. | sk-my-local-key |
| CLIPROXYAPI_URL | No | The URL for the CLIProxyAPI backend (e.g., http://localhost:8317). Can be skipped if not using this backend. | http://localhost:8317 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| list_modelsA | List all available models across all providers. Run this first to see what you can query. |
| ask_modelA | Query any AI model with a prompt. Returns the model's response with metadata. OUTPUT: Markdown with the model's response, latency, and token usage. If max_response_tokens is set and compression occurred, includes distillation metadata (original tokens, compressed tokens, compressor model, compressor latency). Shows "Saved: X tokens (Y% smaller)" when compression is active. Shows "(cached)" when response is served from cache. WHEN TO USE: When you need another model's perspective, analysis, or capabilities. Set max_response_tokens to control how much of your context window this response consumes — the response will be distilled by a fast model to fit the budget while preserving code, file paths, errors, and actionable details. Set include_raw=true to see both compressed and original responses for quality verification. FAILURE MODES:
|
| compare_modelsA | Query 2-5 models in parallel with the same prompt. Returns side-by-side comparison with latency and token metrics. |
| consensusB | Query 3-7 models and aggregate responses using voting strategy (majority/supermajority/unanimous). Returns consensus answer with confidence score. |
| synthesizeB | Query 2-5 models in parallel, then combine their best ideas into one answer. Returns a synthesized response that's better than any single model. |
| session_recapA | Read previous Claude Code sessions from disk and generate a smart-sized recap using a large-context model. Claude never sees the raw session data — only the distilled summary. OUTPUT: Returns markdown starting with "## Session Recap" containing sections: Project State, What Was Built, Key Decisions, Errors Resolved, Unfinished/In Progress, File Map. Empty sections are omitted. Output size is auto-calculated (1K-30K tokens) based on session density. WHEN TO USE: At the start of a new session when the user asks to restore context, recall previous work, or continue where they left off. FAILURE MODES:
|
| analyze_fileA | Offload file analysis to a worker model. The file is read server-side — it never enters your context window. You send a file path and a question, and get back only the analysis. OUTPUT: Markdown with the model's analysis of the file, including file metadata (path, lines, chars), latency, and token usage. If max_response_tokens is set and compression occurred, includes distillation metadata (original tokens, compressed tokens, compressor model, compressor latency). WHEN TO USE: When you need to analyze, review, or search a file but want to avoid reading it yourself. Especially valuable for large files (1000+ lines) where reading would consume significant context. The file is sent to a large-context model (Gemini 1M) that can process the entire file at once. FAILURE MODES:
|
| smart_readA | Surgical code extraction from files. Returns ONLY relevant code sections with line numbers — not analysis. OUTPUT: Markdown with extracted code sections (verbatim, with line numbers), minimal annotations, file metadata, latency, token usage. Shows "Context saved" metric. Unlike analyze_file which returns prose analysis, smart_read returns actual code you can act on directly. WHEN TO USE: When you need to read a file but only care about specific sections. Use instead of the Read tool when you have a specific intent like "find the auth logic", "show error handling", "extract the database schema". Especially valuable for large files (1000+ lines) where reading the whole file wastes context tokens. For general questions about a file, use analyze_file instead. FAILURE MODES:
|
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Pickle-Pixel/HydraMCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server