DevFlow MCP

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

devflow-mcp
docs
chunking

README.md•6.88 kB

# Document Chunking & Structured Input **Status**: ✅ Schema Implemented (Processing Implementation Pending) Comprehensive documentation for ingesting large documents into the DevFlow MCP knowledge graph using a token-aware, hierarchical input schema. ## Overview Instead of post-hoc chunking of raw text, DevFlow MCP uses a **pre-structured JSON-LD format** that: - ✅ Enforces token limits via Zod validation (prevents API failures) - ✅ Provides optimal chunk sizes (≤512 tokens per content block) - ✅ Preserves semantic structure (hierarchical sections) - ✅ Enables context reconstruction (parent-child relationships) - ✅ Maps cleanly to knowledge graph (sections → entities, content → observations) ## Quick Start ```typescript import { StructuredDocumentSchema, type StructuredDocument } from '#types'; // Create a structured document const doc: StructuredDocument = { "@context": "https://schema.org/", "@type": "StructuredDocument", metadata: { documentId: crypto.randomUUID(), title: "My Documentation", version: "1.0.0", createdAt: Date.now(), language: "en" }, sections: [{ id: crypto.randomUUID(), type: "section", metadata: { title: "Introduction", summary: "Overview of the system", entityType: "component", order: 1 }, content: [{ id: crypto.randomUUID(), parentId: sectionId, type: "text", content: "This system provides...", metadata: { language: "en", order: 1 } }] }] }; // Validate const result = StructuredDocumentSchema.safeParse(doc); if (!result.success) { console.error(result.error.issues); } ``` ## Core Documents ### 📋 [Schema Summary](./SCHEMA_SUMMARY.md) Quick reference for the entire schema design, usage patterns, and benefits. ### 📐 [Design Rationale](./INPUT_SCHEMA_DESIGN.md) Deep dive into design decisions, token budgets, hierarchical structure, and knowledge graph mapping. ### 📝 [Complete Example](./EXAMPLE_DOCUMENT.md) Full working example of an authentication system documentation showing all schema features. ## Schema Structure ``` StructuredDocument (JSON-LD) ├─ metadata: DocumentMetadata ├─ sections: Section[] │ ├─ metadata: SectionMetadata │ ├─ subsections?: Subsection[] │ │ ├─ metadata: SubsectionMetadata │ │ └─ content: ContentBlock[] │ │ ├─ id, parentId, previousId, nextId │ │ ├─ type: "text" | "code" | "data" | "diagram" │ │ ├─ content: string (≤512 tokens) │ │ └─ metadata: ContentBlockMetadata │ └─ content?: ContentBlock[] (if no subsections) └─ crossReferences?: CrossReference[] ``` ## Token Limits (Enforced by Zod) | Element | Max Tokens | Max Characters | Maps To | |---------|------------|----------------|---------| | ContentBlock | 512 | 2,048 | Single embedding | | Subsection | 4,096 | ~16,000 | Batch processing unit | | Section | 8,192 | ~32,000 | Knowledge graph Entity | ## Knowledge Graph Mapping ```typescript // Input: Section { id: "uuid-123", metadata: { title: "Authentication System", summary: "JWT-based authentication", entityType: "component" }, content: [ { content: "The system uses JWT tokens..." }, { content: "Token validation checks..." } ] } // Output: Entity { name: "Authentication_System", entityType: "component", observations: [ "JWT-based authentication", // summary "The system uses JWT tokens...", // content[0] "Token validation checks..." // content[1] ] } // Output: Embeddings (batch generated) [ { embedding: [...], observation: "JWT-based authentication" }, { embedding: [...], observation: "The system uses JWT tokens..." }, { embedding: [...], observation: "Token validation checks..." } ] ``` ## Implementation Status ### ✅ Completed - [x] Zod schema definition (`src/types/document-input.ts`) - [x] TypeScript type exports (`src/types/index.ts`) - [x] Design documentation - [x] Example document - [x] Schema summary ### 🚧 Pending - [ ] Token counting utility with `js-tiktoken` - [ ] Document processor service - [ ] MCP tool `process_document` - [ ] MCP prompt `/document` - [ ] Integration tests - [ ] Batch embedding optimization ## Usage Patterns ### Pattern 1: LLM Generates Structure ```typescript // User provides raw text const rawText = userInput; // LLM converts using structured output const doc = await llm.generateContent({ responseSchema: zodToJsonSchema(StructuredDocumentSchema), prompt: `Convert this to structured format: ${rawText}` }); // Validate and process const validated = StructuredDocumentSchema.parse(doc); await documentProcessor.ingest(validated); ``` ### Pattern 2: Direct JSON Input ```typescript // Developer provides pre-structured JSON const doc = loadDocumentFromFile('./docs/api.json'); // Validate const result = StructuredDocumentSchema.safeParse(doc); // Process if (result.success) { await documentProcessor.ingest(result.data); } ``` ### Pattern 3: Programmatic Generation ```typescript // Generate from codebase analysis const sections = analyzeCodebase('./src'); const doc: StructuredDocument = { "@context": "https://schema.org/", "@type": "StructuredDocument", metadata: buildMetadata(), sections: sections.map(toSection) }; await documentProcessor.ingest(doc); ``` ## Benefits Over Traditional Chunking | Traditional Approach | This Schema | |---------------------|-------------| | Post-hoc splitting of raw text | Pre-structured by LLM | | Arbitrary chunk boundaries | Semantic boundaries defined | | Token counting during processing | Token limits enforced by validation | | Unpredictable chunk quality | Guaranteed optimal size | | Lost context at boundaries | Full hierarchical context | | Complex splitting algorithms | Simple: already chunked | | API failures possible | Validated before API call | ## Next Steps To complete the implementation: 1. **Add Token Counting**: Integrate `js-tiktoken` into Zod refinement 2. **Create Processor**: Service to convert StructuredDocument → KnowledgeGraph 3. **Add MCP Tool**: Tool for ingesting documents 4. **Create Prompt**: Prompt template with JSON Schema for LLM 5. **Write Tests**: Validation, processing, and integration tests ## Related Files ### Source Code - `src/types/document-input.ts` - Complete Zod schema - `src/types/validation.ts` - Knowledge graph schemas - `src/types/index.ts` - Type exports ### Documentation - `docs/chunking/SCHEMA_SUMMARY.md` - Quick reference - `docs/chunking/INPUT_SCHEMA_DESIGN.md` - Design rationale - `docs/chunking/EXAMPLE_DOCUMENT.md` - Working example ### Archive - `docs/chunking/implementation-plan.md` - Old approach (archived) - `docs/chunking/integration-points.md` - Old approach (archived)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Takin-Profit/devflow-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server