docs-mcp-server

Overview Schema Related Servers Score Discussions

docs-mcp-server
openspec
changes
process-markdown-frontmatter

design.md•2.58 KiB

# Design: Markdown Frontmatter Processing ## Architecture ### 1. Metadata Extraction (`MarkdownMetadataExtractorMiddleware`) - **Current**: Extracts title from the first `# Heading`. - **New**: - Check for YAML frontmatter at the beginning of the content (lines between `---`). - Parse the YAML content. - If a `title` field exists, use it as the `context.title`. - Fallback to existing H1 extraction if no frontmatter title is found. - Store the raw frontmatter in the context if needed for downstream (though the splitter re-reads the content). ### 2. Document Splitting (`SemanticMarkdownSplitter`) - **Current**: Converts Markdown to HTML using `remark`, then splits based on DOM elements. - **New**: - **Preprocessing**: Detect and extract frontmatter *before* the `markdownToHtml` conversion. - **Splitting**: - Create a dedicated `Chunk` for the frontmatter. - Set its type to `frontmatter` (new `SectionContentType`). - The rest of the markdown (body) is passed to the existing `markdownToHtml` -> `splitIntoSections` pipeline. - The frontmatter chunk is prepended to the list of chunks. ## Dependencies - **`gray-matter`**: Adopt this library for robust frontmatter parsing. It handles edge cases (like `---` in code blocks) better than regex and is the industry standard. - It uses `js-yaml` internally (or similar) but abstracts the splitting logic safely. ## Data Structures ### `SectionContentType` Update `src/splitter/types.ts`: ```typescript export type SectionContentType = "text" | "code" | "table" | "heading" | "structural" | "frontmatter"; ``` ### `Chunk` No structural changes to `Chunk`, but: - `types` array can now contain `"frontmatter"`. - **Optimization**: We should consider storing the parsed frontmatter object in `chunk.metadata` (if `Chunk` interface supports it or if we extend it) to avoid downstream re-parsing. For now, we stick to the raw content in the chunk body, but ensure the extractor attaches the title to the context. ## Error Handling - **Malformed YAML**: Parsing MUST NOT crash the application. - **Middleware**: If parsing fails, log a warning and fallback to standard H1 title extraction. - **Splitter**: If parsing fails, treat the content as plain text (do not create a frontmatter chunk, or create a text chunk). ## Trade-offs - **Splitting Strategy**: Separating frontmatter before HTML conversion avoids `remark` messing it up (e.g. turning it into a `<hr>`). - **Chunk Size**: Frontmatter is usually small, but if it's huge, we might need to split it? Unlikely to exceed chunk limits in normal docs. We will treat it as a single chunk for now.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arabold/docs-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

design.md•2.58 KiB