EX MCP Server

EXAI_MCP_Overview.md•4.17 KiB

# EXAI MCP Server Overview (Detailed) This document explains how the EXAI MCP server in this repository works, how it integrates with providers (Kimi/Moonshot, GLM/ZhipuAI, etc.), and how to interact with it in practice. ## What is MCP here? - The server implements the Model Context Protocol (MCP) over stdio; the Augment VS Code extension launches it and consumes tools exposed by the server. - Tools include: thinkdeep, analyze, consensus, chat, orchestrate_auto, challenge, listmodels, version, and project-specific utilities. - Requests and responses flow through stdio; the extension shows tool outputs and progress logs. ## Provider architecture - Providers live under `src/providers/` and are registered via `ModelProviderRegistry`. - Kimi (Moonshot): implemented with an OpenAI-compatible client; supports chat, files API (upload/list/content/delete), and embeddings. - GLM (ZhipuAI): uses the official SDK when available, with HTTP fallback for uploads; base URL unified to `https://api.z.ai/api/paas/v4`. - OpenAI-compatible base: shared request/response logic, retries, timeouts, model validation, and vision payload formatting. ## Key recent hardening - Uploads: per-request timeouts for multipart, size guardrails (env-configurable), clear logging. - Single-instance lock: PID mutex in `scripts/mcp_server_wrapper.py` to avoid duplicate servers. - Token awareness: CJK-aware token estimation for better cost/latency planning. ## How tools work - Tools accept a structured prompt plus optional `relevant_files` and `use_websearch=true` to browse official docs. - The tool framework token-budgets file embeddings; when attachments are large, tools degrade gracefully or request trimmed excerpts. - Outputs are summarized into markdown reports under `docs/sweep_reports/...` with timestamps. ## Using file uploads with Kimi - Upload local files with purpose `file-extract`; then retrieve parsed text via `client.files.content(file_id).text`. - Delete files with `client.files.delete(file_id=...)` (OpenAI-compatible DELETE /files/{file_id}). - Use the helper `tools/kimi_files_cleanup.py` to list or delete old files (>N days) with dry-run by default. ## Using embeddings - Near term: use Kimi embeddings via `client.embeddings.create(model=..., input=[...])`. See `tools/kimi_embeddings.py`. - Long term: decouple retrieval; compute/store embeddings in your system (vector DB); at query time, send top-k snippets to Kimi/GLM chat. MCP remains for orchestration and chat. ## Cost- and token-aware routing (typical defaults) - Quick Q&A: `glm-4.5-flash` (fast/economical) - Long-context/deep synthesis: Kimi long-context models - Web research: plan on a cheaper model, synthesize on a longer-context model if needed - Vision: Kimi vision-capable models; compress images prior to upload ## Configuration knobs (env) - `KIMI_API_URL`, `KIMI_API_KEY`, optional `KIMI_EMBED_MODEL` - `GLM_API_URL`, `GLM_API_KEY`, upload timeouts - `KIMI_FILES_MAX_SIZE_MB`, `GLM_FILES_MAX_SIZE_MB`, global `FILE_UPLOAD_TIMEOUT_SECS` - Web browsing limits (max hops, per-hop timeout, domain allowlist) via tool parameters/config ## Typical workflows 1) Doc-grounded analysis: call `thinkdeep` with `use_websearch=true` and pass authoritative URLs; attach only necessary files. 2) File Q&A: upload via Kimi, retrieve content, and include in messages; or pass file_ids where supported. 3) Research: plan → browse with hop/time limits → cite → synthesize. 4) Embeddings: generate embeddings with Kimi now, or plug in external embeddings later; build a retrieval-augmented chat flow. ## Where results go - Reports and analyses saved to `docs/sweep_reports/<phase>/...` and `docs/sweep_reports/thinkdeep/...`. - Personal/system documentation: `docs/personal/`. ## Safety and operations - Dry-run for any destructive action (cleanup) by default. - No dependencies installed or env mutated at runtime without explicit approval. - Logs clearly indicate model routing, file embedding budgets, and failures. ## Next improvements (suggested) - RouterService to centralize cost-aware model selection. - BrowseOrchestrator to formalize plan-first, limited-hops browsing with citations. - ExternalEmbeddingsProvider interface for seamless swap-in of your embedding pipeline.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Zazzles2908/EX_AI-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

EXAI_MCP_Overview.md•4.17 KiB