n8n-MCP

MIT License

71,301

Overview InspectNew Endpoints Schema Related Servers Reviews Score

TEMPLATE_MINING_ANALYSIS.md•9.99 kB

# Template Mining Analysis - Alternative to P0-R3 **Date**: 2025-10-02 **Context**: Analyzing whether to fix `get_node_for_task` (28% failure rate) or replace it with template-based configuration extraction ## Executive Summary **RECOMMENDATION**: Replace `get_node_for_task` with template-based configuration extraction. The template database contains 2,646 real-world workflows with rich node configurations that far exceed the 31 hardcoded task templates. ## Key Findings ### 1. Template Database Coverage - **Total Templates**: 2,646 production workflows from n8n.io - **Unique Node Types**: 543 (covers 103% of our 525 core nodes) - **Metadata Coverage**: 100% (AI-generated structured metadata) ### 2. Node Type Coverage in Templates Top node types by template usage: ``` 3,820 templates: n8n-nodes-base.httpRequest (144% of total templates!) 3,678 templates: n8n-nodes-base.set 2,445 templates: n8n-nodes-base.code 1,700 templates: n8n-nodes-base.googleSheets 1,471 templates: @n8n/n8n-nodes-langchain.agent 1,269 templates: @n8n/n8n-nodes-langchain.lmChatOpenAi 792 templates: n8n-nodes-base.telegram 702 templates: n8n-nodes-base.httpRequestTool 596 templates: n8n-nodes-base.gmail 466 templates: n8n-nodes-base.webhook ``` **Comparison**: - Hardcoded task templates: 31 tasks covering 5.9% of nodes - Real templates: 2,646 templates with 2-3k examples for common nodes ### 3. Database Structure ```sql CREATE TABLE templates ( id INTEGER PRIMARY KEY, workflow_id INTEGER UNIQUE NOT NULL, name TEXT NOT NULL, description TEXT, -- Node information nodes_used TEXT, -- JSON array: ["n8n-nodes-base.httpRequest", ...] workflow_json_compressed TEXT, -- Base64 encoded gzip of full workflow -- Metadata (100% coverage) metadata_json TEXT, -- AI-generated structured metadata -- Stats views INTEGER DEFAULT 0, created_at DATETIME, -- ... ); ``` ### 4. Real Configuration Examples #### HTTP Request Node Configurations **Simple URL fetch**: ```json { "url": "https://api.example.com/data", "options": {} } ``` **With authentication**: ```json { "url": "=https://api.wavespeed.ai/api/v3/predictions/{{ $json.data.id }}/result", "options": {}, "authentication": "genericCredentialType", "genericAuthType": "httpHeaderAuth" } ``` **Complex expressions**: ```json { "url": "=https://image.pollinations.ai/prompt/{{$('Social Media Content Factory').item.json.output.description.replaceAll(' ','-').replaceAll(',','').replaceAll('.','') }}", "options": {} } ``` #### Webhook Node Configurations **Basic webhook**: ```json { "path": "ytube", "options": {}, "httpMethod": "POST", "responseMode": "responseNode" } ``` **With binary data**: ```json { "path": "your-endpoint", "options": { "binaryPropertyName": "data" }, "httpMethod": "POST" } ``` ### 5. AI-Generated Metadata Each template has structured metadata including: ```json { "categories": ["automation", "integration", "data processing"], "complexity": "medium", "use_cases": [ "Extract transaction data from Gmail", "Automate bookkeeping", "Expense tracking" ], "estimated_setup_minutes": 30, "required_services": ["Gmail", "Google Sheets", "Google Gemini"], "key_features": [ "Fetch emails by label", "Extract transaction data", "Use LLM for structured output" ], "target_audience": ["Accountants", "Small business owners"] } ``` ## Comparison: Task Templates vs Real Templates ### Current Approach (get_node_for_task) **Pros**: - Curated configurations with best practices - Predictable, stable responses - Fast lookup (no decompression needed) **Cons**: - Only 31 tasks (5.9% node coverage) - 28% failure rate (users can't find what they need) - Requires manual maintenance - Static configurations without real-world context - Usage ratio 22.5:1 (search_nodes is preferred) ### Template-Based Approach **Pros**: - 2,646 real workflows with 2-3k examples for common nodes - 100% metadata coverage for semantic matching - Real-world patterns and best practices - Covers 543 node types (103% coverage) - Self-updating (templates fetched from n8n.io) - Rich context (use cases, complexity, setup time) **Cons**: - Requires decompression for full workflow access - May contain template-specific context (but can be filtered) - Need ranking/filtering logic for best matches ## Proposed Implementation Strategy ### Phase 1: Extract Node Configurations from Templates Create a new service: `TemplateConfigExtractor` ```typescript interface ExtractedNodeConfig { nodeType: string; configuration: Record<string, any>; source: { templateId: number; templateName: string; templateViews: number; useCases: string[]; complexity: 'simple' | 'medium' | 'complex'; }; patterns: { hasAuthentication: boolean; hasExpressions: boolean; hasOptionalFields: boolean; }; } class TemplateConfigExtractor { async extractConfigsForNode( nodeType: string, options?: { complexity?: 'simple' | 'medium' | 'complex'; requiresAuth?: boolean; limit?: number; } ): Promise<ExtractedNodeConfig[]> { // 1. Query templates containing nodeType // 2. Decompress workflow_json_compressed // 3. Extract node configurations // 4. Rank by popularity + complexity match // 5. Return top N configurations } } ``` ### Phase 2: Integrate with Existing Tools **Option A**: Enhance `get_node_essentials` - Add `includeExamples: boolean` parameter - Return 2-3 real configurations from templates - Preserve existing compact format **Option B**: Enhance `get_node_info` - Add `examples` section with template-sourced configs - Include source attribution (template name, views) **Option C**: New tool `get_node_examples` - Dedicated tool for retrieving configuration examples - Query by node type, complexity, use case - Returns ranked list of real configurations ### Phase 3: Deprecate get_node_for_task - Mark as deprecated in tool documentation - Redirect to enhanced tools - Remove after 2-3 version cycles ## Performance Considerations ### Decompression Cost - Average compressed size: 6-12 KB - Decompression time: ~5-10ms per template - Caching strategy needed for frequently accessed templates ### Query Strategy ```sql -- Fast: Get templates for a node type (no decompression) SELECT id, name, views, metadata_json FROM templates WHERE nodes_used LIKE '%n8n-nodes-base.httpRequest%' ORDER BY views DESC LIMIT 10; -- Then decompress only top matches ``` ### Caching - Cache decompressed workflows for popular templates (top 100) - TTL: 1 hour - Estimated memory: 100 * 50KB = 5MB ## Impact on P0-R3 **Original P0-R3 Plan**: Expand task library from 31 to 100+ tasks using fuzzy matching **New Approach**: Mine 2,646 templates for real configurations **Impact Assessment**: | Metric | Original Plan | Template Mining | |--------|--------------|-----------------| | Configuration examples | 100 (estimated) | 2,646+ actual | | Node coverage | ~20% | 103% | | Maintenance | High (manual) | Low (auto-fetch) | | Accuracy | Curated | Production-tested | | Context richness | Limited | Rich metadata | | Development time | 2-3 weeks | 1 week | **Recommendation**: PIVOT to template mining approach for P0-R3 ## Implementation Estimate ### Week 1: Core Infrastructure - Day 1-2: Create `TemplateConfigExtractor` service - Day 3: Implement caching layer - Day 4-5: Testing and optimization ### Week 2: Integration - Day 1-2: Enhance `get_node_essentials` with examples - Day 3: Update tool documentation - Day 4-5: Integration testing **Total**: 2 weeks vs 3 weeks for original plan ## Validation Tests ```typescript // Test: Extract HTTP Request configs const configs = await extractor.extractConfigsForNode( 'n8n-nodes-base.httpRequest', { complexity: 'simple', limit: 5 } ); // Expected: 5 configs from top templates // - Simple URL fetch // - With authentication // - With custom headers // - With expressions // - With error handling // Test: Extract webhook configs const webhookConfigs = await extractor.extractConfigsForNode( 'n8n-nodes-base.webhook', { limit: 3 } ); // Expected: 3 configs showing different patterns // - Basic POST webhook // - With response node // - With binary data handling ``` ## Risks and Mitigation ### Risk 1: Template Quality Varies - **Mitigation**: Filter by views (popularity) and metadata complexity rating - Only use templates with >1000 views for examples ### Risk 2: Decompression Performance - **Mitigation**: Cache decompressed popular templates - Implement lazy loading (decompress on demand) ### Risk 3: Template-Specific Context - **Mitigation**: Extract only node configuration, strip workflow-specific context - Provide source attribution for context ### Risk 4: Breaking Changes in Template Structure - **Mitigation**: Robust error handling in decompression - Fallback to cached configs if template fetch fails ## Success Metrics **Before** (get_node_for_task): - 392 calls, 72% success rate - 28% failure rate - 31 task templates - 5.9% node coverage **Target** (template-based): - 90%+ success rate for configuration discovery - 100%+ node coverage - 2,646+ real-world examples - Self-updating from n8n.io ## Next Steps 1. ✅ Complete template database analysis 2. ⏳ Create `TemplateConfigExtractor` service 3. ⏳ Implement caching layer 4. ⏳ Enhance `get_node_essentials` with examples 5. ⏳ Update P0 implementation plan 6. ⏳ Begin implementation ## Conclusion The template database provides a vastly superior alternative to hardcoded task templates: - **2,646 templates** vs 31 tasks (85x more examples) - **103% node coverage** vs 5.9% coverage (17x improvement) - **Real-world configurations** vs synthetic examples - **Self-updating** vs manual maintenance - **Rich metadata** for semantic matching **Recommendation**: Pivot P0-R3 from "expand task library" to "mine template configurations"

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/88-888/n8n-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server