Skip to main content
Glama

documcp

by tosin2013
link-validation.md7.87 kB
# Link Validation in Knowledge Graph ## Overview DocuMCP now includes automatic link validation for documentation content, integrated directly into the Knowledge Graph memory system. This feature validates external links, tracks their status over time, and surfaces broken links during repository analysis. ## Architecture ### Components 1. **kg-link-validator.ts** - Core link validation module 2. **kg-code-integration.ts** - Automatic validation during doc analysis 3. **Knowledge Graph** - Stores validation results as entities ### Entity Type: `link_validation` ```typescript { totalLinks: number; // Total links found validLinks: number; // Links that returned HTTP 200 brokenLinks: number; // Links that failed (404, timeout, etc.) warningLinks: number; // Links that were skipped unknownLinks: number; // Links that couldn't be validated healthScore: number; // 0-100 score based on valid/total lastValidated: string; // ISO 8601 timestamp brokenLinksList: string[]; // Array of broken link URLs } ``` ### Relationships 1. **has_link_validation**: `documentation_section` → `link_validation` - Connects docs to their validation results 2. **requires_fix**: `link_validation` → `documentation_section` - Created when broken links are detected - Properties: - `severity`: "high" (>5 broken) or "medium" (1-5 broken) - `brokenLinkCount`: Number of broken links - `detectedAt`: ISO timestamp ## How It Works ### 1. Automatic Validation During Analysis When `analyze_repository` runs, it: 1. Extracts documentation content 2. Creates documentation entities in KG 3. **Automatically validates external links** (async, non-blocking) 4. Stores validation results in KG ```typescript // In kg-code-integration.ts for (const doc of extractedContent.existingDocs) { const docNode = createDocSectionEntity( projectId, doc.path, doc.title, doc.content, ); kg.addNode(docNode); // Validate links in background validateAndStoreDocumentationLinks(docNode.id, doc.content).catch((error) => console.warn(`Failed to validate links: ${error.message}`), ); } ``` ### 2. Link Extraction Supports both Markdown and HTML formats: ```markdown <!-- Markdown links --> [GitHub](https://github.com) <!-- HTML links --> <a href="https://example.com">Link</a> ``` ### 3. Validation Strategy Uses native Node.js `fetch` API with: - **HTTP HEAD requests** (faster than GET) - **5-second timeout** (configurable) - **Retry logic** (2 retries by default) - **Concurrent checking** (up to 10 simultaneous) ```typescript const result = await validateExternalLinks(urls, { timeout: 5000, // 5 seconds retryCount: 2, // Retry failed links concurrency: 10, // Check 10 links at once }); ``` ### 4. Storage in Knowledge Graph Results are stored as entities and can be queried: ```typescript // Get validation history for a doc section const history = await getLinkValidationHistory(docSectionId); // Latest validation const latest = history[0]; console.log(`Health Score: ${latest.properties.healthScore}%`); console.log(`Broken Links: ${latest.properties.brokenLinks}`); ``` ## Usage Examples ### Query Broken Links ```typescript import { getKnowledgeGraph } from "./memory/kg-integration.js"; const kg = await getKnowledgeGraph(); // Find all link validation entities with broken links const allNodes = await kg.getAllNodes(); const validations = allNodes.filter( (n) => n.type === "link_validation" && n.properties.brokenLinks > 0, ); validations.forEach((v) => { console.log(`Found ${v.properties.brokenLinks} broken links:`); v.properties.brokenLinksList.forEach((url) => console.log(` - ${url}`)); }); ``` ### Get Documentation Health Report ```typescript import { getKnowledgeGraph } from "./memory/kg-integration.js"; const kg = await getKnowledgeGraph(); // Find all documentation sections const docSections = (await kg.getAllNodes()).filter( (n) => n.type === "documentation_section", ); for (const doc of docSections) { // Get validation results const edges = await kg.findEdges({ source: doc.id, type: "has_link_validation", }); if (edges.length > 0) { const validationId = edges[0].target; const validation = (await kg.getAllNodes()).find( (n) => n.id === validationId, ); if (validation) { console.log(`\n${doc.properties.filePath}:`); console.log(` Health: ${validation.properties.healthScore}%`); console.log(` Valid: ${validation.properties.validLinks}`); console.log(` Broken: ${validation.properties.brokenLinks}`); } } } ``` ### Manual Validation ```typescript import { validateExternalLinks, storeLinkValidationInKG, } from "./memory/kg-link-validator.js"; // Validate specific URLs const result = await validateExternalLinks([ "https://github.com", "https://example.com/404", ]); console.log(result); // { // totalLinks: 2, // validLinks: 1, // brokenLinks: 1, // results: [...] // } // Store in KG await storeLinkValidationInKG(docSectionId, result); ``` ## Integration with analyze_repository The `analyze_repository` tool now includes link validation data: ```json { "success": true, "data": { "intelligentAnalysis": { "documentationHealth": { "outdatedCount": 2, "coveragePercent": 85, "totalCodeFiles": 20, "documentedFiles": 17, "linkHealth": { "totalLinks": 45, "brokenLinks": 3, "healthScore": 93 } } } } } ``` ## Configuration ### Environment Variables ```bash # Link validation timeout (milliseconds) DOCUMCP_LINK_TIMEOUT=5000 # Maximum retries for failed links DOCUMCP_LINK_RETRIES=2 # Concurrent link checks DOCUMCP_LINK_CONCURRENCY=10 ``` ### Skip Link Validation Link validation is non-blocking and runs in the background. If it fails, it logs a warning but doesn't stop the analysis. ## Performance Considerations 1. **Non-blocking**: Validation runs asynchronously after doc entities are created 2. **Cached Results**: Results stored in KG, no re-validation on subsequent reads 3. **Concurrent Checking**: Validates up to 10 links simultaneously 4. **Smart Timeouts**: 5-second timeout prevents hanging on slow servers ## Troubleshooting ### Links Not Being Validated **Issue**: Documentation sections have no validation results **Check**: 1. Are there external links in the content? 2. Check console for warnings: `Failed to validate links in...` 3. Verify network connectivity ### False Positives **Issue**: Valid links marked as broken **Solutions**: 1. Increase timeout: Some servers respond slowly 2. Check if server blocks HEAD requests (rare) 3. Verify URL is publicly accessible (not behind auth) ### Memory Storage Not Updated **Issue**: KG doesn't show validation results **Check**: ```typescript import { saveKnowledgeGraph } from "./memory/kg-integration.js"; // Manually save KG await saveKnowledgeGraph(); ``` ## Future Enhancements 1. **AST-based Internal Link Validation** - Verify internal file references exist - Check anchor links (`#section-id`) 2. **Link Health Trends** - Track link health over time - Alert on degrading link quality 3. **Batch Re-validation** - MCP tool to re-check all links - Scheduled validation for long-lived projects 4. **Link Recommendation** - Suggest fixing broken links - Recommend archive.org alternatives for dead links ## Dependencies - **linkinator** (v6.1.4) - Link validation library (installed) - **native fetch** - Node.js 20+ built-in HTTP client ## Related Documentation - [Architecture Decision Records](../adrs/) - [Phase 2: Intelligence & Learning System](../phase-2-intelligence.md) - [Memory Workflows Tutorial](../tutorials/memory-workflows.md)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tosin2013/documcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server