How do I use Speakeasy Docs MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Speakeasy Docs MCP search for the authentication section in the Python SDK docs" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Speakeasy Docs MCP

Official

by speakeasy-api

Overview Schema Related Servers Score Discussions

TypeScript

Hybrid

Speakeasy Docs MCP

A lightweight, domain-agnostic hybrid search engine for markdown (.md) corpora, exposed via the Model Context Protocol (MCP). While it can index and serve any markdown corpus, it is deeply optimized for serving SDK documentation to AI coding agents. Beta.

Features

Hybrid search — full-text, phrase proximity, and vector similarity blended via Reciprocal Rank Fusion
Distributed manifests — per-directory .docs-mcp.json files configure chunking strategy, metadata, and taxonomy independently per subtree
Faceted taxonomy — metadata keys become enum-injected JSON Schema filters on the search tool
Vector collapse — deduplicates near-identical cross-language results at search time
Incremental builds — embedding cache fingerprints each chunk; only changed content is re-embedded
MCP prompt templates — register reusable prompts from docs using *.template.md (single-message shorthand) or *.template.yaml (multi-message format) with mustache argument rendering
Graceful degradation
- Chunking — chunk sizes adapt to the configured embedding provider's context window; falls back to conservative defaults when no provider is set
- Query — if the embedding API errors at runtime (downtime, expired credits, network issues), the server falls back to FTS-only search with a one-time warning

Related MCP server: doc-lib-mcp

How It Works

Docs MCP provides a local, in-memory search engine (powered by LanceDB) that runs inside a Node.js MCP server. Three core optimizations make it effective for structured documentation:

Faceted Taxonomy

Metadata keys defined in .docs-mcp.json manifests become enum-injected JSON Schema parameters on the search_docs tool. The agent selects from a strict set of valid filter values (e.g. language: ["typescript", "python", "go"]). On zero results, the server returns structured hints (e.g. "0 results for 'typescript'. Matches found in: ['python']").

Vector Collapse

SDK documentation for the same API operation across multiple languages produces near-identical embeddings. Vector collapse deduplicates these at search time, keeping only the highest-scoring variant per taxonomy field:

{ "taxonomy": { "language": { "vector_collapse": true } } }

When the agent explicitly filters by language, collapse is automatically skipped — the filter already restricts to a single variant.

Hybrid FTS + Semantic Search

Search combines three ranking signals via Reciprocal Rank Fusion:

Full-text search — multi-field matching on headings (boosted 3x) and content
Phrase proximity — rewards results where query terms appear close together
Vector similarity — semantic embedding distance (when an embedding provider is configured)

FTS dominates for exact class names and error codes. Vector similarity lifts conceptual and paraphrased queries. The blend is configurable via RRF weights.

Hierarchical Context

Ancestor headings (breadcrumbs like Auth SDK > AcmeAuthClientV2 > Initialization) are prepended to each chunk's embedding input and returned with search results. This enables the calling agent to explore the corpus structure, navigating from high-level concepts down to specific implementation details.

Benchmarks

On a realistic ~300-operation API with hand-written guides (~28.8MB corpus, 5 eval categories), benchmarked with docs-mcp-eval benchmark:

Summary

Metric	none	openai	Takeaway
MRR@5	0.2141	0.2833	Embeddings lift relevant-result ranking by 32%
NDCG@5	0.2536	0.3218	Graded relevance improves 27% with embeddings
Facet Precision	0.3750	0.4375	Embeddings improve filter accuracy by 17%
Search p50 (ms)	5.2	258.4	FTS-only is ~50x faster at median
Search p95 (ms)	6.5	11101.1	Tail latency dominated by embedding API
Build Time (ms)	6022	1569703	Embedding uses batch API for large corpora
Peak RSS (MB)	221.1	283.8	Modest memory overhead
Index Size (corpus 28.8MB)	104.9MB	356.9MB	Vectors ~3.4x the FTS-only index
Embed Cost (est.)	$0	$0.9825	~$1 one-time cost per corpus
Query Cost (est.)	$0	$0.000003	Negligible per-query cost

Per-Category MRR@5

MRR@5 (Mean Reciprocal Rank at 5) measures how high the first relevant result appears in the top 5. 1.0 = always ranked first; 0.0 = never appears in top 5.

Category	none	openai	Takeaway
clarification	0.3000	0.3000	FTS matches embeddings
cross-service	0.1667	0.3333	Embeddings double rank
exact-name	0.3625	0.3792	FTS nearly matches embeddings
natural-language	0.0731	0.1692	Embeddings lift 130%
workflow	0.3333	0.4444	Embeddings lift 33%

Recommendation

We recommend starting with FTS-only search. While embeddings improve relevance for conceptual and paraphrased queries, they also introduce ~50x query latency and substantial build overhead. For agents that iterate through multiple searches, the faster cycle time of pure FTS has anecdotally proven more valuable than the per-query relevance lift — particularly with modern models capable of query refinement.

Graceful Fallback

No embeddings (--embedding-provider none): FTS-only search, zero cost, zero API keys. Already effective for exact-match and lexical queries.
With embeddings (--embedding-provider openai): Hybrid search with better recall on conceptual and paraphrased queries. ~$1 one-time embedding cost per 28.8MB corpus.
Runtime degradation: If the embedding API is unavailable at query time, the server automatically falls back to FTS-only with a one-time warning.

Supported File Types

The indexer processes .md (Markdown) files. Files are discovered via the **/*.md glob pattern within the configured docs directory. YAML frontmatter is supported for per-file metadata and chunking overrides. Prompt templates are also discovered from *.template.md and *.template.yaml, and are excluded from search indexing.

Corpus Structure

Folder Layout

Documentation corpora use .docs-mcp.json manifests to control chunking and taxonomy. Manifests can be placed at any level of the directory tree:

my-docs/
├── .docs-mcp.json              ← root manifest (applies to guides/)
├── guides/
│   ├── retries.md
│   └── pagination.md
└── sdks/
    ├── typescript/
    │   ├── .docs-mcp.json      ← deeper manifest (exclusive precedence)
    │   └── auth.md
    └── python/
        ├── .docs-mcp.json      ← deeper manifest (exclusive precedence)
        └── auth.md

Deeper manifests take exclusive precedence. A file at sdks/typescript/auth.md is governed only by sdks/typescript/.docs-mcp.json — the root manifest is ignored for that subtree.

`.docs-mcp.json`

{
  // Required. Schema version.
  "version": "1",

  // Chunking strategy applied to all files in this directory tree.
  "strategy": {
    "chunk_by": "h2", // Split at ## headings. Options: h1, h2, h3, file
    "max_chunk_size": 8000, // Oversized chunks split recursively at finer headings
    "min_chunk_size": 200, // Tiny trailing chunks merge into preceding chunk
  },

  // Key-value pairs attached to every chunk. Each key becomes a filterable
  // enum parameter on the search_docs tool.
  "metadata": {
    "language": "typescript",
    "scope": "sdk-specific",
  },

  // Per-field search behavior. vector_collapse deduplicates cross-language
  // variants at search time (only active when no filter is set for that field).
  "taxonomy": {
    "language": { "vector_collapse": true },
  },

  // Custom instructions sent to MCP clients during initialization.
  // Helps coding agents understand what this server provides and how to use it.
  "mcpServerInstructions": "This server provides SDK documentation for Acme Corp...",

  // File-pattern overrides. Evaluated top-to-bottom; last match wins.
  // Override metadata merges with root (override keys win).
  // Override strategy replaces root strategy entirely.
  "overrides": [
    {
      "pattern": "models/**/*.md",
      "strategy": { "chunk_by": "file" },
    },
  ],
}

Full schema: schemas/docs-mcp.schema.json

Individual files can also override their manifest via YAML frontmatter (mcp_chunking_hint, metadata keys). Frontmatter takes highest precedence. See the manifest contract for full resolution rules.

Architecture

Structured as a Turborepo with four packages:

Package	Role
`@speakeasy-api/docs-mcp-cli`	CLI for validation, manifest bootstrap (`fix`), and deterministic indexing (`build`)
`@speakeasy-api/docs-mcp-core`	Core retrieval primitives, AST parsing, chunking, and LanceDB queries
`@speakeasy-api/docs-mcp-server`	Lean runtime MCP server surface
`@speakeasy-api/docs-mcp-eval`	Standalone evaluation harness — search-quality benchmarks and end-to-end agent evaluation

                +---------------------------+
                |     Agent / MCP Host      |
                +-------------+-------------+
                              |
                              | Dynamic Tool Schema (with Enums)
                              v
                +---------------------------+
                | @speakeasy-api/           |
                |   docs-mcp-server         |
                | search_docs, get_doc      |
                +-------------+-------------+
                              |
                              v
                +---------------------------+
                | @speakeasy-api/           |
                |   docs-mcp-core           |
                | LanceDB Engine            |
                | Memory-Mapped IO          |
                +-------------+-------------+
                              |
                              v
                     +-----------------+
                     | .lancedb/ index |
                     +-----------------+

MCP Tools

The tools exposed to the agent are dynamically generated based on your corpus_description and indexed metadata.

Tool	What it does
`search_docs`	Performs hybrid search. Tool names and descriptions are user-configurable. Parameters are dynamically generated with valid taxonomy injected as JSON Schema `enum`s. Supports stateless cursor pagination. Returns fallback hints on zero results.
`get_doc`	Returns a specific chunk, plus `context: N` neighboring chunks for surrounding detail.

MCP Prompts

Prompt definitions are discovered at build time and exposed over MCP via prompts/list and prompts/get.

See docs/prompt-templates.md for full usage guidelines and examples.

*.template.md — Markdown body shorthand for a single user text message
*.template.yaml — Structured prompt format for multiple messages/content parts
Mustache templating is applied to prompt text content at runtime
If both formats exist for the same prompt name, YAML is preferred and a warning is emitted during build and validate

Quick Start

FTS-Only (Recommended)

No API keys required. Zero-config full-text search:

# --- build stage ---
FROM node:22-slim AS build
RUN npm install -g @speakeasy-api/docs-mcp-cli
ARG DOCS_DIR=docs
COPY ${DOCS_DIR} /corpus
RUN docs-mcp build --docs-dir /corpus --out /index --embedding-provider none

# --- runtime stage ---
FROM node:22-slim
RUN npm install -g @speakeasy-api/docs-mcp-server
COPY --from=build /index /index
EXPOSE 20310
CMD ["docs-mcp-server", "--index-dir", "/index", "--transport", "http", "--port", "20310"]

docker build --build-arg DOCS_DIR=./docs -t docs-mcp .
docker run -p 20310:20310 docs-mcp

With Embeddings (Optional)

For hybrid FTS + semantic search, add an OpenAI embedding provider. This improves recall on conceptual and paraphrased queries at the cost of higher latency and ~$1 one-time embedding cost per 28.8MB corpus.

# --- build stage ---
FROM node:22-slim AS build
RUN npm install -g @speakeasy-api/docs-mcp-cli
ARG DOCS_DIR=docs
COPY ${DOCS_DIR} /corpus
RUN --mount=type=secret,id=OPENAI_API_KEY \
    OPENAI_API_KEY=$(cat /run/secrets/OPENAI_API_KEY) \
    docs-mcp build --docs-dir /corpus --out /index --embedding-provider openai

# --- runtime stage ---
FROM node:22-slim
RUN npm install -g @speakeasy-api/docs-mcp-server
COPY --from=build /index /index
EXPOSE 20310
CMD ["docs-mcp-server", "--index-dir", "/index", "--transport", "http", "--port", "20310"]

docker build --secret id=OPENAI_API_KEY,env=OPENAI_API_KEY \
  --build-arg DOCS_DIR=./docs -t docs-mcp .
docker run -p 20310:20310 -e OPENAI_API_KEY docs-mcp

The build secret embeds the corpus; the runtime -e OPENAI_API_KEY lets the server embed search queries.

Docker Compose (Server + Playground)

To get a server and interactive playground running together:

# docker-compose.yml
services:
  server:
    build:
      context: .
      # Uses the Dockerfile above (FTS-only or embeddings variant)
    ports:
      - "20310:20310"
    # Uncomment for embeddings:
    # environment:
    #   - OPENAI_API_KEY=${OPENAI_API_KEY}

  playground:
    image: node:22-slim
    command: >
      sh -c "npm install -g @speakeasy-api/docs-mcp-playground &&
             npx @speakeasy-api/docs-mcp-playground"
    ports:
      - "3001:3001"
    environment:
      - MCP_TARGET=http://server:20310
    depends_on:
      - server

docker compose up

Open http://localhost:3001 to explore the index interactively.

Transport Options

The MCP server supports two transport modes:

Flag	Transport	Default	Use case
`--transport stdio`	Standard I/O	Yes (default)	Direct MCP client integration (e.g. Claude Desktop, Cursor)
`--transport http`	Streamable HTTP		Containerized deployments, playground, multi-client access

When using HTTP transport, the server listens on port 20310 by default (configurable with --port).

stdio example (MCP client config):

npx @speakeasy-api/docs-mcp-server --index-dir ./dist/.lancedb

HTTP example:

npx @speakeasy-api/docs-mcp-server --index-dir ./dist/.lancedb --transport http --port 20310

Usage & Deployment

1. Authoring (Local Dev)

The fix command scans all .md files in your docs directory, analyzes their heading structure (h1/h2/h3 frequency), and generates a .docs-mcp.json manifest with the best-fit chunking strategy per file. The most common strategy becomes the default; files that differ get pattern-based overrides.

npx @speakeasy-api/docs-mcp-cli fix --docs-dir ./docs

2. Indexing (CI Build Step) Run the deterministic indexer against your corpus. The indexer reads manifests and frontmatter to chunk the docs, generates embeddings, and saves the local .lancedb directory. Cache the output directory across CI runs to make builds incremental — only changed chunks are re-embedded.

- uses: actions/cache@v4
  with:
    path: ./dist/.lancedb
    # Unique key saves the updated cache after each build
    key: docs-mcp-${{ github.run_id }}
    # Prefix match loads the most recent prior cache
    restore-keys: docs-mcp-

- run: npx @speakeasy-api/docs-mcp-cli build --docs-dir ./docs --out ./dist/.lancedb

3. Runtime (MCP Server) The .lancedb directory is packaged with the MCP server. FTS search is fully local. If the index was built with embeddings, the server calls the embedding API at query time to embed the search query.

npx @speakeasy-api/docs-mcp-server --index-dir ./dist/.lancedb

4. Playground (Optional)

Explore the index interactively in a browser. The playground connects to a running HTTP server and provides a search UI.

npx @speakeasy-api/docs-mcp-playground

Open http://localhost:3001. Requires a running HTTP server (step 3 with --transport http).

Environment Variable	Description	Default
`MCP_TARGET`	HTTP endpoint of the MCP server	`http://localhost:20310`
`PORT`	Playground server port	`3001`
`SERVER_NAME`	Display name shown in the playground UI	`speakeasy-docs`
`PLAYGROUND_PASSWORD`	Password-protect the playground (hashed via SHA256)	(none — open access)
`GRAM_API_KEY`	Enables chat mode when set	(none — chat disabled)

Evaluation

Docs MCP includes a standalone evaluation harness with two modes:

Search-quality eval (run) — drives the MCP server directly via stdio JSON-RPC, measuring retrieval metrics (MRR, NDCG, precision, latency). See docs/eval.md.
Agent eval (agent-eval) — spawns a Claude agent with docs-mcp tools, runs it against a prompt, and evaluates assertions on the output. Validates the full stack end-to-end. See docs/agent-eval.md.

License

AGPL-3.0

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

0dRelease cycle

105Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Appeared in Searches

Hybrid RAG System for Indexing DevOps eBooks and Online Documentation

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/speakeasy-api/docs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server