documcp

documcp
docs
how-to

llm-integration.md•10.1 KiB

--- id: llm-integration title: LLM Integration for Semantic Code Analysis sidebar_label: LLM Integration sidebar_position: 10 --- # LLM Integration for Semantic Code Analysis DocuMCP now includes an optional LLM integration layer that enables semantic analysis of code changes beyond AST-based syntax comparison. This feature supports the DocuMCP Orchestrator's requirements for intelligent documentation synchronization. ## Overview The LLM integration provides: - **Semantic code change analysis**: Detect behavioral changes within the same function signature - **Code execution simulation**: Validate documentation examples without running code - **Intelligent documentation suggestions**: Generate context-aware update recommendations - **Multi-provider support**: DeepSeek, OpenAI, Anthropic, and Ollama - **Graceful fallback**: Automatic fallback to AST-only analysis when LLM is unavailable ## Configuration ### Environment Variables Configure the LLM integration using environment variables: ```bash # Required: API key for your chosen provider export DOCUMCP_LLM_API_KEY="your-api-key-here" # Optional: Provider selection (default: deepseek) export DOCUMCP_LLM_PROVIDER="deepseek" # or "openai", "anthropic", "ollama" # Optional: Model name (default: deepseek-chat) export DOCUMCP_LLM_MODEL="deepseek-chat" # Optional: Custom base URL (for self-hosted or alternative endpoints) export DOCUMCP_LLM_BASE_URL="https://api.deepseek.com/v1" ``` ### Supported Providers #### DeepSeek (Default) ```bash export DOCUMCP_LLM_PROVIDER="deepseek" export DOCUMCP_LLM_API_KEY="sk-..." export DOCUMCP_LLM_MODEL="deepseek-chat" ``` #### OpenAI ```bash export DOCUMCP_LLM_PROVIDER="openai" export DOCUMCP_LLM_API_KEY="sk-..." export DOCUMCP_LLM_MODEL="gpt-4" ``` #### Anthropic ```bash export DOCUMCP_LLM_PROVIDER="anthropic" export DOCUMCP_LLM_API_KEY="sk-ant-..." export DOCUMCP_LLM_MODEL="claude-3-opus-20240229" ``` #### Ollama (Local) ```bash export DOCUMCP_LLM_PROVIDER="ollama" export DOCUMCP_LLM_BASE_URL="http://localhost:11434/v1" export DOCUMCP_LLM_MODEL="codellama" # No API key needed for local Ollama ``` ## Usage ### Semantic Code Analysis ```typescript import { SemanticAnalyzer } from "./utils/semantic-analyzer.js"; // Create analyzer with default configuration const analyzer = new SemanticAnalyzer(); await analyzer.initialize(); // Analyze semantic impact of code changes const codeBefore = ` function multiply(a: number, b: number): number { return a * b; } `; const codeAfter = ` function multiply(a: number, b: number): number { return a + b; // Bug: changed to addition! } `; const analysis = await analyzer.analyzeSemanticImpact( codeBefore, codeAfter, "multiply", ); console.log("Analysis mode:", analysis.analysisMode); // 'llm', 'ast', or 'hybrid' console.log("Behavioral change:", analysis.hasBehavioralChange); // true console.log("Breaking for examples:", analysis.breakingForExamples); // true console.log("Description:", analysis.changeDescription); console.log("Confidence:", analysis.confidence); console.log("Affected sections:", analysis.affectedDocSections); ``` ### Validating Documentation Examples ```typescript // Validate that documentation examples work with current implementation const examples = [ "const result = multiply(6, 7); // Should return 42", "const doubled = multiply(21, 2); // Should return 42", ]; const implementation = ` function multiply(a: number, b: number): number { return a * b; } `; const validation = await analyzer.validateExamples(examples, implementation); console.log("Valid:", validation.isValid); console.log("Confidence:", validation.overallConfidence); console.log("Manual review needed:", validation.requiresManualReview); // Check individual examples validation.examples.forEach((ex, i) => { console.log(`Example ${i + 1}:`); console.log(" Valid:", ex.isValid); console.log(" Issues:", ex.issues); }); ``` ### Batch Analysis ```typescript const changes = [ { before: "function add(x: number, y: number) { return x + y; }", after: "function add(x: number, y: number) { return x - y; }", name: "add", }, { before: "function greet(name: string) { return `Hello ${name}`; }", after: "function greet(name: string) { return `Hi ${name}!`; }", name: "greet", }, ]; const results = await analyzer.analyzeBatch(changes); results.forEach((result, i) => { console.log(`Change ${i + 1}:`, result.changeDescription); }); ``` ### Custom Configuration ```typescript import { SemanticAnalyzer, createSemanticAnalyzer, } from "./utils/semantic-analyzer.js"; // Disable LLM (AST-only mode) const astOnlyAnalyzer = createSemanticAnalyzer({ useLLM: false, }); // Custom confidence threshold for hybrid mode const strictAnalyzer = createSemanticAnalyzer({ confidenceThreshold: 0.9, // Higher threshold = more likely to use hybrid mode }); // Custom LLM configuration const customAnalyzer = createSemanticAnalyzer({ llmConfig: { provider: "openai", apiKey: "custom-key", model: "gpt-4", }, }); ``` ## Analysis Modes The semantic analyzer operates in three modes: ### LLM Mode - **When**: LLM is available and confidence is above threshold (default: 0.7) - **Advantages**: Deep semantic understanding, detects behavioral changes - **Use case**: Critical code changes affecting public APIs ### AST Mode - **When**: LLM is unavailable or disabled - **Advantages**: Fast, reliable, no external dependencies - **Use case**: Quick syntax checks, CI/CD environments without LLM access ### Hybrid Mode - **When**: LLM confidence is below threshold - **Advantages**: Combines LLM insights with AST verification - **Use case**: Complex changes requiring both semantic and structural analysis ## Rate Limiting The LLM client includes built-in rate limiting to prevent API quota exhaustion: - Default: 10 requests per minute - Automatic backoff when limit is reached - Configurable per-instance ## Error Handling The integration is designed to fail gracefully: ```typescript // If LLM fails, analyzer falls back to AST mode const analyzer = new SemanticAnalyzer(); const result = await analyzer.analyzeSemanticImpact(before, after); // Check which mode was used if (result.analysisMode === "ast" && !result.llmAvailable) { console.warn("LLM unavailable, using AST analysis only"); } // Low confidence analysis if (result.confidence < 0.5) { console.warn("Low confidence analysis - manual review recommended"); } ``` ## Best Practices ### 1. Set Appropriate Thresholds ```typescript // For critical code paths const criticalAnalyzer = createSemanticAnalyzer({ confidenceThreshold: 0.9, // High threshold }); // For routine changes const routineAnalyzer = createSemanticAnalyzer({ confidenceThreshold: 0.6, // Lower threshold }); ``` ### 2. Check Availability Before Relying on LLM ```typescript if (!analyzer.isLLMAvailable()) { console.warn("LLM not configured - using AST analysis only"); } ``` ### 3. Handle Low Confidence Results ```typescript const result = await analyzer.analyzeSemanticImpact(before, after); if (result.confidence < 0.7) { // Trigger manual review workflow console.log("Manual review required for:", result.changeDescription); } ``` ### 4. Use Batch Analysis for Multiple Changes ```typescript // More efficient than individual calls const results = await analyzer.analyzeBatch(changes); ``` ### 5. Validate Examples Before Publishing ```typescript const validation = await analyzer.validateExamples(examples, implementation); if (!validation.isValid) { console.error("Some examples may be invalid:"); validation.suggestions.forEach((s) => console.error(" -", s)); // Don't publish until examples are fixed throw new Error("Invalid documentation examples detected"); } ``` ## Integration with DocuMCP Orchestrator This LLM integration layer is designed to support the [DocuMCP Orchestrator](https://github.com/tosin2013/documcp-orchestrator) requirements: - **ADR-009**: Content Accuracy Validation Framework - **ADR-010**: LLM-Validated Documentation Examples The orchestrator uses these capabilities to: 1. Detect when code changes require documentation updates 2. Validate that documentation examples match code behavior 3. Generate intelligent update suggestions 4. Maintain documentation accuracy over time ## Troubleshooting ### LLM Not Available **Symptom**: `analyzer.isLLMAvailable()` returns `false` **Solutions**: - Check that `DOCUMCP_LLM_API_KEY` is set - Verify API key is valid - For Ollama: ensure server is running at specified base URL ### Low Confidence Results **Symptom**: `result.confidence < 0.7` **Solutions**: - Review the change manually - Use hybrid mode by setting lower threshold - Check if code change is particularly complex ### Rate Limit Errors **Symptom**: Requests timing out or failing **Solutions**: - Reduce number of concurrent requests - Increase rate limit window - Use batch analysis for multiple changes ### Timeout Errors **Symptom**: "LLM request timed out" **Solutions**: - Increase timeout in configuration - Check network connectivity to LLM provider - Consider using a faster model ## Security Considerations 1. **API Keys**: Never commit API keys to version control 2. **Code Privacy**: Be aware that code is sent to external LLM providers 3. **Rate Limits**: Monitor API usage to avoid unexpected costs 4. **Fallback**: System works without LLM for sensitive environments ## Performance - **LLM Analysis**: ~2-5 seconds per code change - **AST Analysis**: ~50-100ms per code change - **Hybrid Analysis**: ~2-5 seconds (LLM) + ~100ms (AST) - **Rate Limit**: 10 requests/minute (default) ## Future Enhancements Planned improvements: - Caching of LLM responses for identical code changes - Support for additional LLM providers - Fine-tuned models for specific languages - Streaming responses for large code bases - Confidence calibration based on historical accuracy ## Related Documentation - [AST-based Code Analysis](../reference/ast-analyzer.md) - [Drift Detection](../reference/drift-detector.md) - [DocuMCP Orchestrator](https://github.com/tosin2013/documcp-orchestrator) - [ADR-009: Content Accuracy Validation](../adrs/adr-0009-content-accuracy-validation-framework.md)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tosin2013/documcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

llm-integration.md•10.1 KiB