Skip to main content
Glama
fix-multiple-llms-txt-formats-support-plan.md2.6 kB
# Plan: llms.txt Parser with Format Detection *TDD approach to build parsers for different llms.txt formats* ## Problem Statement Current server can't accurately parse different llms.txt formats. Each site uses different structures, leading to wrong article counts. - **Real Impact**: Can't count articles correctly across different formats - **Root Cause**: No format detection or specialized parsers - **Business Impact**: Poor data extraction ## Core Objectives - Detect llms.txt format automatically - Parse each format correctly - Return accurate article counts --- ## Phase 1: TDD Foundation **Goals:** - Write tests for format detection - Write tests for article counting - All tests should fail initially **File Structure:** - Tests: `tests/test_format_detection.py`, `tests/test_parsers.py` - Parser code: `src/llms_txt_mcp/parsers/` (to be created in Phase 2) **Test Requirements:** Format detection tests: - `zod-dev-llms-full.txt` → 'standard-full-llms-txt' - `hono-dev-llms-full.txt` → 'standard-full-llms-txt' - `orm-drizzle-team-llms-full.txt` → 'standard-full-llms-txt' - `ai-sdk-dev-llms.txt` → 'yaml-frontmatter-full-llms-txt' - `nextjs-org-docs-llms-full.txt` → 'yaml-frontmatter-full-llms-txt' - `vercel-com-docs-llms-full.txt` → 'yaml-frontmatter-full-llms-txt' - `docs-docker-com-llms.txt` → 'standard-llms-txt' Article count tests: - `docs-docker-com-llms.txt` → 721 articles - `ai-sdk-dev-llms.txt` → 132 articles - `hono-dev-llms-full.txt` → 88 articles - `nextjs-org-docs-llms-full.txt` → 363 articles - `orm-drizzle-team-llms-full.txt` → 140 articles - `vercel-com-docs-llms-full.txt` → 640 articles - `zod-dev-llms-full.txt` → 17 articles **Format Detection Notes:** - Check if URL ends with `/llms-full.txt` - if `/llms.txt` provided, should check if `/llms-full.txt` exists and prefer it - YAML frontmatter detection: Must verify document has separators (3+ dashes minimum), must contain both `title:` and `description:` fields **Article Definition:** - An article is defined by having at least a title **Success Criteria:** - [ ] Tests written for format detection - [ ] Tests written for article counting - [ ] All tests fail (no implementation yet) --- ## Phase 2: Implementation **Goals:** - Make all tests pass - Keep it simple **Success Criteria:** - [ ] All format detection tests pass - [ ] All article count tests pass - [ ] Code is minimal and clean --- ## Phase 3: Integration **Goals:** - Integrate into existing server **Success Criteria:** - [ ] Parser integrated into server - [ ] All functionality works

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tenequm/llms-txt-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server