Skip to main content
Glama
implementation-plan.md10.9 kB
# Plan 1: llms-txt-mcp (Lean Documentation MCP) *Fast, predictable, minimal‑context documentation access for Claude Code via llms.txt, with URL‑only config, human‑readable IDs, and freshness under your control.* ## Problem Statement - **Real Impact** - AI SDK exposes its docs via `llms.txt` using repeated YAML frontmatter blocks; many tools either miss sections or dump huge context, making Claude Code outputs noisy and slow. Example: AI SDK `llms.txt` is 25K+ lines and >100 sections; naive listing pollutes the context. - Third‑party knowledge MCPs (e.g., broad, multi‑source indices) sometimes mix outdated and fresh pages (e.g., AI SDK v4 vs v5 during migration), leading to inconsistent answers and broken code. - **Root Cause** - Parsing assumptions (expecting plain markdown headings) don’t fit AI SDK’s YAML‑block format; tools lack structured section boundaries. - Tool schemas return too much text (TOCs or full files) instead of small, surgical payloads. - No source allowlist or freshness controls (TTL + ETag) → randomness in versions. - **Business Impact** - Developer time lost due to context bloat and wrong/outdated guidance. - Higher token costs; slower iteration; reduced trust in assistant outputs. ### Current State Analysis - **✅ What’s Working** - The need is clear: lean, deterministic doc access improves Claude Code productivity. - Chroma provides simple ephemeral/persistent modes; sentence‑transformers provide fast small embedding models. - **🔧 What Needs Improvement** - Robust parser for both AI SDK’s YAML‑block llms.txt and the official llms.txt. - Search‑first tool flow with tiny results; byte‑capped section retrieval; human‑readable IDs. - Strict source allowlist, TTL + ETag/Last‑Modified freshness, and unified index with host filtering. - **📊 Data/Metrics** - AI SDK `llms.txt` >25K lines; target structure discovery <1s; search <150ms on ~1–5k chunks; `docs_get` payload capped at ~60KB. ## Core Objectives - **Primary Goal**: Provide Claude Code with fast, minimal‑context, deterministic documentation retrieval from user‑specified `llms.txt` URLs. - **Secondary Goals**: - Parse both AI SDK YAML‑frontmatter llms.txt and official llms.txt formats cleanly. - Unified index across multiple sources; filter by host when needed. - Human‑readable IDs (canonical URLs + anchors) and small, predictable tool schemas. - Freshness you control: TTL + ETag/Last‑Modified revalidation. - **Success Metrics**: - Search p95 <150ms; single `docs_get` p95 <50ms (local, indexed). - `docs_get` responses respect default 60KB cap and preserve order. - No full TOC or full‑file dumps in normal flows; tools remain <~250 tokens of schema. ### MVP Decision Filter *(30 seconds max)* - [x] **Serve users/revenue?** Yes — immediate productivity gains, lower token use. - [x] **Build now or defer?** Build now — core to your daily workflow. - [x] **Maintenance burden?** Low — small surface; modest deps; TTL/ETag reduce churn. --- ## Phase-Based Implementation > Each phase ends in a working, committable state. Fewer phases where possible; no time estimates. ### Phase 1: Foundation/Core Implementation **Goals:** - Implement a lean MCP (“docs‑mcp”) that indexes user‑provided `llms.txt` URLs and exposes four tiny tools: `docs_sources`, `docs_search`, `docs_get`, `docs_refresh`. - Ensure search‑first UX and byte‑capped retrieval with human‑readable IDs. **Phase Scope:** - **Must Have** - URL‑only config; strict allowlist; Chroma backend (ephemeral by default); `thenlper/gte-small` embeddings. - Parsers for AI SDK YAML‑block llms.txt and official llms.txt (H2 sections, H1 fallback). - Unified collection `docs` with host metadata; `docs_search(q, hosts?, k)`; `docs_get(ids[], max_bytes, merge?)`. - Freshness: TTL + ETag/Last‑Modified; `docs_refresh` to force reindex. - Streaming fetch + incremental parse for large files. - **Can Defer** - Disk persistence flag (`--store=disk`), stale doc cleanup on reindex (can land in P2 if needed), diversity (MMR). - **Won’t Do** - Name:url configs; backend switching flags; listing full TOCs; complex multi‑page crawl. **Implementation:** ```python # CLI (positional URLs also accepted) URL... --ttl 24h --timeout 30 --embed-model thenlper/gte-small \ --store memory [--store-path PATH] --max-get-bytes 60000 # Or with flag form: --sources URL... --ttl 24h --timeout 30 --embed-model thenlper/gte-small \ --store memory [--store-path PATH] --max-get-bytes 60000 # Tools (FastMCP) - docs_sources() -> [{ source_url, host, lastIndexed, docCount }] - docs_search(q, hosts?: [str], k?: int=5) -> [{ id, source, title, snippet, score }] - docs_get(ids: [str], max_bytes?: int=60000, merge?: bool=false, depth?: int=0) - docs_refresh(source?: str) -> { refreshed, counts } # IDs id = canonical_url (+ '#anchor') # Parser selection try AI SDK YAML-frontmatter blocks -> blocks to sections else parse official headings -> H2 sections (H1 fallback) ``` **Quick Validation:** - [ ] `docs_search` returns tiny items (no content blobs), `k<=20`. - [ ] `docs_get` respects `max_bytes` and preserves order. - [ ] Indexing AI SDK `llms.txt` completes and `docCount>0`. **Success Criteria:** - [ ] `uv run llms-txt-mcp --sources https://ai-sdk.dev/llms.txt` produces working tools. - [ ] Search p95 <150ms (local, indexed); `docs_get` p95 <50ms single section. - [ ] Schemas stay small and deterministic. --- ### Phase 2: Enhancement/Extension **Goals:** - Improve reliability and extend storage options without expanding the tool surface. **Phase Scope:** - **Must Have** - Disk persistence: `--store=disk --store-path PATH` via Chroma PersistentClient. - Stale doc removal on reindex per source (remove ids no longer present). - Background TTL refresh triggered by tool calls (non‑blocking). - **Can Defer** - Search diversity (MMR), per‑host caps in results. - **Won’t Do** - Switching to alternate backends publicly; keep internal abstraction only. **Implementation:** - Add PersistentClient wiring; guard writes; delete stale by `source_url`. - Schedule refresh when `now - last_indexed > ttl` on `docs_search`/`docs_get`. **Quick Validation:** - [ ] Restart preserves index if `--store=disk`. - [ ] Reindex removes stale content; no orphaned hits. - [ ] Tool latency unaffected by background refresh. **Success Criteria:** - [ ] Persistent runs behave the same as ephemeral with data retained. - [ ] Stale removal verified by changing source content. - [ ] Background refresh does not block tool responses. --- ### Phase 3: Polish/Integration (if needed) **Goals:** - Add optional capabilities and prepare for broader ingestion. **Phase Scope:** - **Must Have** - None, if P1–P2 already meet needs. - **Can Defer** - Doc‑site ingestion (`https://ai-sdk.dev/docs/...`) into same chunk schema. - Diversity (MMR), per‑host caps, version pinning. - **Won’t Do** - Over‑optimization or large tool surface expansion. **Implementation:** - Introduce a `site` ingest pipeline that crawls allowed roots and reuses the unified index. **Quick Validation:** - [ ] Site‑ingested chunks interoperate seamlessly with `docs_search`/`docs_get`. - [ ] No regressions to llms.txt flows. **Success Criteria:** - [ ] Cross‑source answers (AI SDK + Tailwind + shadcn + Next.js) from one search. - [ ] Payload caps and schemas remain stable. --- ## Key Technical Decisions ### Architecture Choices - **Chroma (ephemeral by default)**: Zero‑ops locally; optional persistence later. Simple metadata filtering and a single unified collection. - **Human‑readable IDs**: Canonical URLs (+ anchors) for transparent, copy‑pasteable references. ### Implementation Patterns - **Follow Existing Conventions**: Search‑first → get flow; tiny tool schemas; deterministic shapes. - **Error Handling**: Return `{ "error": "message" }` for tool errors; validate inputs; strict allowlist for URLs ending with `/llms.txt`. - **Testing Strategy**: Unit tests for parsers; mocked HTTP (ETag/304); index/retrieval integration; performance smoke tests on large `llms.txt`. ### Dependencies and Integrations - **External**: `httpx`, `sentence-transformers` (default `thenlper/gte-small`), `chromadb`, `mcp`. - **Database Changes**: None (Chroma collection only). - **Frontend Impact**: None; tool consumers are MCP clients (e.g., Claude Code). ## Risk Mitigation - **Rollback Plan**: Disable server entry or revert CLI flags; ephemeral mode leaves no on‑disk state. - **Compatibility**: Small, stable schemas; keep tool list minimal; avoid breaking param renames. - **Testing**: Per‑phase validation; CI for unit/integration tests; perf checks on large fixtures. - **Monitoring**: Log fetch/index times, chunk counts, query latency, response sizes. ## Success Metrics **Technical:** - [ ] Search p95 <150ms; get p95 <50ms (local, indexed) - [ ] Payload cap enforced (default 60KB) and configurable - [ ] Parser handles AI SDK YAML‑block and official llms.txt accurately **Business:** - [ ] Reduced token usage vs prior tools (no TOC/full‑file dumps) - [ ] Faster doc lookup → improved developer throughput - [ ] Deterministic/fresh results (TTL + ETag) **Quality:** - [ ] No regressions in tool outputs - [ ] All tests pass (unit + integration) - [ ] README and CLI help are up‑to‑date --- ## Implementation Notes ### Solo Dev Context (for future you) - **Key files**: `src/llms_txt_mcp/server.py` (server + tools), `README.md`, `docs/implementation-plan.md`. - **Gotchas**: AI SDK files are huge; ensure streaming fetch; YAML variance; ensure slug stability; guard Chroma upsert vs add; embedding time dominates. - **Quick wins**: Host filter in search; byte cap in get; preindex on launch for instant first query. - **Debug shortcuts**: Log counts and durations; test against a local cached `llms.txt` to iterate quickly. ### Before Starting - [ ] Review MCP tool shapes for minimality - [ ] Confirm CLI flags match README - [ ] Identify acceptance tests for AI SDK `llms.txt` ### During Implementation - [ ] Commit at end of each phase - [ ] Run unit/integration tests locally - [ ] Update README and examples as you go - [ ] Monitor logs for fetch/index timings ### Phase Completion Criteria - ✅ Working end‑to‑end - ✅ Tested (unit + integration) - ✅ Documented - ✅ Committable ### Quality Gates - [ ] No breaking changes to existing functionality - [ ] Lint + type checks pass - [ ] Integration tests pass - [ ] Performance targets maintained --- ## Template Usage Guidelines (kept for reference) - Use multiple phases for complex/high‑risk changes; otherwise prefer fewer phases with complete states. - Each phase must add user value; avoid setup‑only phases. - Plan for early feedback; keep tool schemas tiny and stable.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tenequm/llms-txt-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server