Scripting Docs MCP

README.md•9.93 KiB

# Scripting Docs MCP 这个仓库提供一套最小可用的工具链：把 [`ScriptingApp/ScriptingApp.github.io`](https://github.com/ScriptingApp/ScriptingApp.github.io) 中的 Markdown/MDX 文档转为可检索的 LlamaIndex 索引，并通过 Model Context Protocol（MCP）对外暴露。同时还附带一个本地 CLI 检索脚本，方便将上下文灌入 Codex／Claude／Gemini 等命令行客户端。 ## 核心特性 - **多语言索引**：`--docs-root PATH[:LANG]` 可重复传入，默认 `docs/en:en`，可一次构建中英文等多语向量库，语言标签写入元数据以供下游过滤。 - **嵌入后端可选**：默认使用 HuggingFace 上的 `BAAI/bge-base-zh-v1.5`（兼顾中英文，CPU/GPU 均可运行），也可切换 `--embed-backend openai` 与任意兼容模型名，方便在本地/云端之间切换。 - **可移植存储**：索引持久化至 `storage/llamaindex`，CLI 查询与 MCP 服务器共用该目录，便于共享部署。 - **CLI+MCP 双模式**：`query_docs.py` 输出可直接喂给 CLI 模型；`mcp_docs_server.py` 以 stdio 暴露 `scripting_docs_query`，为 Codex/Claude/Gemini CLI 提供统一工具。 - **最小依赖脚本**：3 个 Python 文件+一个 requirements，结构简单，易于自定义 chunking、扩展名过滤、清理策略等。 ## 工作流总览 ``` ┌────────────┐ ┌─────────────┐ ┌────────────────────┐ ┌──────────────────┐ │ Markdown │ │ ingest_docs │ │ storage/llamaindex │ │ query_docs / MCP │ │ docs/en zh │ → │ + Sentence │ → │ VectorStore+config │ → │ LLM prompt构造 │ └────────────┘ │ splitter │ └────────────────────┘ └──────────────────┘ │ + Embedding │ └─────────────┘ ``` 1. 从 `ScriptingApp.github.io/docs/<lang>` 读取 `.md/.mdx`，按语言打标签。 2. SentenceSplitter 切块后通过 HuggingFace 或 OpenAI 嵌入，写入统一 storage 目录。 3. `query_docs.py` 或 MCP 服务器加载同一持久化目录，检索 top-k chunk 并格式化上下文。 4. 输出可直接进入终端 LLM，或通过 `scripting_docs_query` 工具注入到 Codex/Claude/Gemini CLI。 > **提示**：本仓库不包含实际文档，请单独克隆官方文档仓库，并在摄取脚本中指向它的 `docs/en`、`docs/zh` 等目录。 ## 目录结构 ``` . ├── README.md # 使用指南（本文件） ├── requirements.txt # Python 依赖（LlamaIndex、MCP SDK 等） └── scripts/ ├── ingest_docs.py # 构建单/多语言文档向量库 ├── query_docs.py # 查询索引 / 分发到 CLI 模型 └── mcp_docs_server.py # FastMCP 服务器，提供 `scripting_docs_query` ``` ## 环境要求 - Python 3.10+（已在 3.12/3.13 验证） - 可访问 `ScriptingApp.github.io` 仓库中的英文/中文文档内容（按需选择） - （可选）如需改用 OpenAI 嵌入，请准备 `OPENAI_API_KEY` ## 快速上手 1. **克隆文档仓库** ```bash git clone https://github.com/ScriptingApp/ScriptingApp.github.io.git ``` 2. **安装 uv (Rust-based Python 包管理器)** `uv` 能够自动管理 Python 环境和依赖，推荐作为 `pip` 和 `venv` 的替代。如果尚未安装 `uv`，可以通过以下命令安装： ```bash curl -LsSf https://astral.sh/uv/install.sh | sh # 确保 uv 在你的 PATH 中，如果需要，请将其添加到你的 shell 配置文件中。 ``` > 如果你习惯 `pip/venv`，仍可运行 `uv pip install -r requirements.txt` 复现同一套依赖。 **无需克隆的快捷方式**：仓库已发布 console script，可直接远程执行第一次运行会自动创建隔离环境并缓存依赖，后续命令可直接复用。推荐在任何改动入口点后本地执行以下 smoke test，确保 wiring 正常： ```bash uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP ingest-docs --help uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP query-docs --help uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP mcp-docs-server --help ``` 3. **摄取文档 (使用 uv run / uvx)** ```bash uv run scripts/ingest_docs.py \ --docs-root /path/to/ScriptingApp.github.io/docs/zh:zh \ --persist-dir storage/llamaindex --clean # 或者（无需克隆）： uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP ingest-docs \ --docs-root /path/to/ScriptingApp.github.io/docs/zh:zh \ --persist-dir storage/llamaindex --clean ``` - `--embed-backend openai` 可切换至 OpenAI 嵌入（需 `OPENAI_API_KEY`）。 - `--docs-root` 可重复传入，格式 `路径:语言`；漏写 `:语言` 会默认使用目录名或 `en`，若完全省略该参数则默认载入 `docs/en:en`。 - `--extensions .md .mdx ...` 可扩展索引的文件类型；每个 `--docs-root` 会共享该过滤条件。 ### 模型选择 - `--embedding-model BAAI/bge-base-zh-v1.5`（默认）：FlagEmbedding 的中文，体积适中， CPU 即可跑满，适合当前中文文档。 - `--embedding-model BAAI/bge-base-en-v1.5`：英文检索表现最佳，若仓库后续仅包含英文，可切换至该模型降低语义偏移。 - `--embedding-model BAAI/bge-m3`（不推荐）：多语旗舰但参数量巨大，本地推理成本高；只有在需要跨语+长上下文+多向量特性时才考虑。 - 更多可用模型参考 FlagEmbedding 的[官方模型列表](https://github.com/FlagOpen/FlagEmbedding/blob/master/README.md#model-list)，按需替换 `--embedding-model` 后重建索引即可。 4. **命令行验收 (使用 uv run / uvx)** ```bash uv run scripts/query_docs.py "如何自定义导航栏？" --model raw # 或直接从 Git 远程运行（无需克隆）： uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP query-docs "如何自定义导航栏？" --model raw ``` 将输出管道给任意本地模型 CLI（例如 `| ollama run llama3`）。 ## 脚本命令 ### `scripts/ingest_docs.py` ```bash uv run scripts/ingest_docs.py \ --docs-root /abs/path/docs/en:en \ --docs-root /abs/path/docs/zh:zh \ --persist-dir storage/llamaindex --clean \ --embedding-model BAAI/bge-m3 \ --embed-backend huggingface # 无需克隆的等价命令： uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP ingest-docs \ --docs-root /abs/path/docs/en:en \ --docs-root /abs/path/docs/zh:zh \ --persist-dir storage/llamaindex --clean \ --embedding-model BAAI/bge-m3 \ --embed-backend huggingface ``` - `--docs-root PATH[:LANG]`：可重复，缺省为 `docs/en:en`。`:LANG` 省略时会取目录名（或 `en`），写入 metadata 供过滤。 - `--persist-dir`：索引输出目录，默认 `storage/llamaindex`，加 `--clean` 可在构建前删除旧索引。 - `--chunk-size / --chunk-overlap`：SentenceSplitter 的块大小与重叠（默认 750/120 tokens）。 - `--extensions`：摄取的扩展名列表，默认 `.md .mdx`。 - `--embedding-model`：详见“模型选择”，可指向任何 FlagEmbedding 或 OpenAI 模型。 - `--embed-backend`：`huggingface`（本地）或 `openai`。 ### `scripts/query_docs.py` ```bash uv run scripts/query_docs.py \ "How do I customize navigation?" \ --persist-dir storage/llamaindex \ --k 5 \ --model codex \ --embedding-model BAAI/bge-base-en-v1.5 # 远程执行示例： uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP query-docs \ "How do I customize navigation?" \ --persist-dir storage/llamaindex \ --k 5 \ --model codex \ --embedding-model BAAI/bge-base-en-v1.5 ``` - `question`（位置参数）：用户问题。 - `--persist-dir`：指向摄取阶段的目录，默认 `storage/llamaindex`。 - `--k`：返回 chunk 数，默认 4。 - `--model`：`raw`（纯文本）、`codex`/`claude`/`gemini`（附带提示头）、`mcp`（stdout JSON）。 - `--embedding-model / --embed-backend`：需与索引保持一致；用于在查询侧构造 Retrieval 模型以匹配存储。 - `--cli-path`：当 `--model codex|claude|gemini` 时覆盖执行的 CLI 模型命令。 ### `scripts/mcp_docs_server.py` ```bash uv run scripts/mcp_docs_server.py \ --persist-dir storage/llamaindex \ --embedding-model BAAI/bge-base-zh-v1.5 \ --default-k 4 # 无需克隆： uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP mcp-docs-server \ --persist-dir storage/llamaindex \ --embedding-model BAAI/bge-base-zh-v1.5 \ --default-k 4 ``` - `--persist-dir`：索引目录（默认 `storage/llamaindex`）。 - `--embedding-model / --embed-backend`：与摄取阶段一致，避免向量维度不匹配。 - `--default-k`：MCP 客户端未传 `k` 时的默认 chunk 数。 - 服务器会暴露 `scripting_docs_query` 工具，输入 `{ "question": "...", "k": 4 }`，返回 top-k 文段及 metadata。 ## MCP 服务器使用 1. **以 stdio 模式启动服务器 (使用 uv run / uvx)** ```bash uv run --quiet scripts/mcp_docs_server.py --persist-dir storage/llamaindex # 或直接远程执行： uvx --from git+https://github.com/JaxsonWang/Scripting-Docs-MCP mcp-docs-server \ --persist-dir storage/llamaindex --quiet ``` 该进程常驻，并暴露单一工具：`scripting_docs_query`。 2. **接入 Codex / Claude / Gemini CLI** MCP 配置示例： ```json { "servers": { "scripting_docs": { "command": "uvx", "args": [ "--from", "git+https://github.com/JaxsonWang/Scripting-Docs-MCP", "mcp-docs-server", "--persist-dir", "/absolute/path/to/Scripting-Docs-MCP/storage/llamaindex", ] } } } ``` 重启 CLI 后，就能在 MCP 工具列表中看到 `scripting_docs_query`。 --- ## License MIT License

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JaxsonWang/Scripting-Docs-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•9.93 KiB