Skip to main content
Glama
EXAI_MCP_Overview.md4.27 kB
# EXAI MCP Server Overview (Detailed) This document explains how the EXAI MCP server in this repository works, how it integrates with providers (Kimi/Moonshot, GLM/ZhipuAI, etc.), and how to interact with it in practice. ## What is MCP here? - The server implements the Model Context Protocol (MCP) over stdio; the Augment VS Code extension launches it and consumes tools exposed by the server. - Tools include: thinkdeep, analyze, consensus, chat, orchestrate_auto, challenge, listmodels, version, and project-specific utilities. - Requests and responses flow through stdio; the extension shows tool outputs and progress logs. ## Provider architecture - Providers live under `src/providers/` and are registered via `ModelProviderRegistry`. - Kimi (Moonshot): implemented with an OpenAI-compatible client; supports chat, files API (upload/list/content/delete), and embeddings. - GLM (ZhipuAI): uses the official SDK when available, with HTTP fallback for uploads; base URL unified to `https://api.z.ai/api/paas/v4`. - OpenAI-compatible base: shared request/response logic, retries, timeouts, model validation, and vision payload formatting. ## Key recent hardening - Uploads: per-request timeouts for multipart, size guardrails (env-configurable), clear logging. - Single-instance lock: PID mutex in `scripts/mcp_server_wrapper.py` to avoid duplicate servers. - Token awareness: CJK-aware token estimation for better cost/latency planning. ## How tools work - Tools accept a structured prompt plus optional `relevant_files` and `use_websearch=true` to browse official docs. - The tool framework token-budgets file embeddings; when attachments are large, tools degrade gracefully or request trimmed excerpts. - Outputs are summarized into markdown reports under `docs/sweep_reports/...` with timestamps. ## Using file uploads with Kimi - Upload local files with purpose `file-extract`; then retrieve parsed text via `client.files.content(file_id).text`. - Delete files with `client.files.delete(file_id=...)` (OpenAI-compatible DELETE /files/{file_id}). - Use the helper `tools/kimi_files_cleanup.py` to list or delete old files (>N days) with dry-run by default. ## Using embeddings - Near term: use Kimi embeddings via `client.embeddings.create(model=..., input=[...])`. See `tools/kimi_embeddings.py`. - Long term: decouple retrieval; compute/store embeddings in your system (vector DB); at query time, send top-k snippets to Kimi/GLM chat. MCP remains for orchestration and chat. ## Cost- and token-aware routing (typical defaults) - Quick Q&A: `glm-4.5-flash` (fast/economical) - Long-context/deep synthesis: Kimi long-context models - Web research: plan on a cheaper model, synthesize on a longer-context model if needed - Vision: Kimi vision-capable models; compress images prior to upload ## Configuration knobs (env) - `KIMI_API_URL`, `KIMI_API_KEY`, optional `KIMI_EMBED_MODEL` - `GLM_API_URL`, `GLM_API_KEY`, upload timeouts - `KIMI_FILES_MAX_SIZE_MB`, `GLM_FILES_MAX_SIZE_MB`, global `FILE_UPLOAD_TIMEOUT_SECS` - Web browsing limits (max hops, per-hop timeout, domain allowlist) via tool parameters/config ## Typical workflows 1) Doc-grounded analysis: call `thinkdeep` with `use_websearch=true` and pass authoritative URLs; attach only necessary files. 2) File Q&A: upload via Kimi, retrieve content, and include in messages; or pass file_ids where supported. 3) Research: plan → browse with hop/time limits → cite → synthesize. 4) Embeddings: generate embeddings with Kimi now, or plug in external embeddings later; build a retrieval-augmented chat flow. ## Where results go - Reports and analyses saved to `docs/sweep_reports/<phase>/...` and `docs/sweep_reports/thinkdeep/...`. - Personal/system documentation: `docs/personal/`. ## Safety and operations - Dry-run for any destructive action (cleanup) by default. - No dependencies installed or env mutated at runtime without explicit approval. - Logs clearly indicate model routing, file embedding budgets, and failures. ## Next improvements (suggested) - RouterService to centralize cost-aware model selection. - BrowseOrchestrator to formalize plan-first, limited-hops browsing with citations. - ExternalEmbeddingsProvider interface for seamless swap-in of your embedding pipeline.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Zazzles2908/EX_AI-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server