Skip to main content
Glama

Octocode MCP

.cursorrulesβ€’21.6 kB
# Octocode-MCP Cursor Rules - Don't create docs which are summerizing works if not asked to! ## Project Overview This is an **MCP (Model Context Protocol) server** that creates tools for AI assistants to analyze GitHub repositories, search code, and explore npm packages. The project emphasizes clean architecture, security-first design, and token efficiency. ## Core Principles ### Senior Engineering Mindset - **Think like a senior software engineer and architect** - consider system-wide implications, maintainability, and long-term consequences - **When dealing with code changes** - check full flow with other files and find the best way to implement solution across the entire system - **Holistic analysis** - understand how changes ripple through the codebase before implementation - **Architecture-first approach** - design the solution, then implement with clean patterns ### Clean Code & Architecture - **Prefer clean, readable code** over clever optimizations - **Follow established patterns** - see existing examples in `src/mcp/tools/` - **Keep architecture clean** - maintain separation of concerns between tools, security, caching, and utilities - **Efficient solutions without over-engineering** - solve the problem simply and effectively - **Preserve existing structure** - maintain the current modular organization ### Development Workflow - **Use yarn** for all package management (see `package.json` scripts) - **Always lint after changes**: `yarn lint` (required before builds) - **Smart scripts for mass changes** - prefer automated solutions for repetitive tasks ### Testing Strategy - **After big changes**: Review implementation first, then update tests intelligently - **Reduce test churn** - make smart, targeted test changes rather than running every fix - **Use Vitest** with coverage - see `vitest.config.ts` for configuration ## MCP-Specific Guidelines ### Tool Development - **Extend BaseCommandBuilder** for new CLI-based tools (`src/mcp/tools/utils/BaseCommandBuilder.ts`) - **Use security validation wrapper** - all tools must use `withSecurityValidation` - **Implement bulk operations** - support multiple queries per tool call for efficiency - **Follow progressive refinement** - broad discovery β†’ context β†’ targeted β†’ deep-dive - **Add proper Zod schemas** in `src/mcp/tools/scheme/` for all tool parameters ### Security & Performance - **Security first** - all inputs/outputs go through content sanitization - **Token efficiency** - use minification, partial content access, structured responses - **24-hour caching** - implement caching for expensive operations - **Error recovery** - graceful degradation with smart fallbacks ### Code Organization ``` src/ β”œβ”€β”€ mcp/tools/ # Tool implementations β”‚ β”œβ”€β”€ scheme/ # Zod validation schemas β”‚ └── utils/ # Shared tool utilities β”œβ”€β”€ security/ # Content sanitization & validation β”œβ”€β”€ utils/ # Core utilities (cache, github API, etc.) └── types.ts # Shared type definitions ``` ## TypeScript & Code Quality ### Type Safety - **Strict TypeScript** - use strict mode, no `any` without explicit reasoning - **Zod validation** for all external inputs and API responses - **Proper error handling** with typed error responses - **Use type guards** for runtime type checking ### Code Style - **ESLint + Prettier** configuration is enforced - **No console.log** - use proper error handling (see `.eslintrc.json`) - **Prefer const** over let, never use var - **Unused parameters** should be prefixed with `_` ## Architecture Patterns ### Tool Registration Pattern ```typescript // Follow this pattern for new tools export function registerNewTool(server: McpServer, options: ToolOptions) { server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [{ name: TOOL_NAMES.NEW_TOOL, description: "..." }] })); server.setRequestHandler(CallToolRequestSchema, withSecurityValidation(async (request) => { // Implementation with BaseCommandBuilder }) ); } ``` ### Command Builder Pattern - **Extend BaseCommandBuilder** for CLI tools - **Implement required abstract methods** - **Use proper parameter validation** - **Support bulk operations** (multiple queries) ### Security Wrapper Pattern - **Always wrap tool handlers** with `withSecurityValidation` - **Sanitize all inputs** before processing - **Filter sensitive content** from outputs ## Performance Guidelines ### Token Efficiency - **Minify content** using appropriate strategies (see `src/utils/minifier.ts`) - **Partial file access** - use line ranges instead of full files when possible - **Structured responses** - consistent, predictable formats - **Bulk operations** - process multiple queries in single calls ### Caching Strategy - **Cache successful operations** only - **24-hour TTL** for GitHub/NPM data - **MD5 cache keys** from parameters - **Memory limits** - 1000 key maximum ## Testing Best Practices ### Test Organization - **Unit tests** in `tests/` directory mirroring `src/` structure - **Integration tests** for tool workflows - **Mock external APIs** - don't hit real GitHub/NPM in tests - **Coverage requirements** - maintain good coverage without over-testing ### Test Patterns ```typescript // Follow this pattern for tool tests describe('ToolName', () => { beforeEach(() => { vi.clearAllMocks(); }); it('should handle valid input', async () => { // Test implementation }); it('should handle errors gracefully', async () => { // Error handling tests }); }); ``` ## Smart Project Rules ### When Adding New Tools 1. **Create Zod schema** in `scheme/` directory first 2. **Extend BaseCommandBuilder** if CLI-based 3. **Add to TOOL_NAMES** constant 4. **Register in main index.ts** 5. **Add comprehensive tests** 6. **Update documentation** ### When Modifying Existing Tools 1. **Analyze full system impact** - trace through all dependent files and understand complete data flow 2. **Check impact** on other tools using `toolRelationships.ts` 3. **Update schemas** if parameters change 4. **Maintain backward compatibility** where possible 5. **Update related tests** intelligently 6. **Run full test suite** before committing ### When Optimizing Performance 1. **Profile first** - identify actual bottlenecks 2. **Consider caching** for expensive operations 3. **Optimize minification** strategies for new file types 4. **Bulk operations** over sequential calls 5. **Token efficiency** over raw performance ### When Handling Security 1. **Validate all inputs** with Zod schemas 2. **Sanitize content** using existing patterns 3. **Add new regex patterns** to `security/regexes.ts` if needed 4. **Test security edge cases** 5. **Document security implications** ### Dependencies & Upgrades - **Prefer existing dependencies** over adding new ones - **Use exact versions** for security-critical packages - **Test thoroughly** after dependency updates - **Check compatibility** with MCP SDK versions ### Documentation - **Update ARCHITECTURE.md** for significant changes - **Add JSDoc comments** for public APIs - **Include examples** in complex implementations - **Document security considerations** ## Commands Reference ```bash # Development yarn build:dev # Build without linting yarn build:watch # Watch mode development yarn test:watch # Test watch mode yarn test:coverage # Coverage report # Quality yarn lint # Required before commits yarn lint:fix # Auto-fix linting issues yarn format # Format code with Prettier # Testing yarn test # Run all tests yarn test:ui # Visual test interface # Debugging yarn debug # Debug MCP server # Distribution yarn dxt:pack # Create DXT package yarn dxt:release # Full release process ``` ## Common Patterns to Follow ### Error Handling ```typescript try { // Operation } catch (error) { return createErrorResponse(`Operation failed: ${error.message}`); } ``` ### Bulk Query Processing ```typescript const results = await Promise.allSettled( queries.map(query => processQuery(query)) ); // Handle partial failures gracefully ``` ### Content Minification ```typescript const minifiedContent = await minifyContent(content, filePath); // Always minify before returning large content ``` Remember: This project creates tools that AI assistants use to understand and analyze code. Every decision should optimize for AI comprehension, security, and efficiency. --- # Octocode-MCP Architecture Documentation ## Overview **Octocode-MCP** is a Model Context Protocol (MCP) server that provides AI assistants with advanced GitHub repository analysis, code discovery, and npm package exploration capabilities. It's designed with a research-driven approach, emphasizing progressive refinement, security, and token efficiency. ## System Architecture ### Core Philosophy The system follows key architectural principles: 1. **Research-Driven**: Define goals β†’ broad discovery β†’ narrow focus β†’ cross-validate sources 2. **Progressive Refinement**: Start broad, then apply specific filters based on findings 3. **Token Efficiency**: Content minification, partial file access, optimized responses 4. **Security First**: Content sanitization, input validation, malicious content detection 5. **Resilient Design**: Fallback mechanisms, error recovery, graceful degradation ### Architecture Components #### 1. **Entry Point & Server** (`src/index.ts`) - **MCP Server Initialization**: Sets up Model Context Protocol server - **Tool Registration**: Registers all 10 tools with error handling - **Graceful Shutdown**: Handles process signals and cleanup (cache clearing) - **Error Recovery**: Continues operation even if individual tools fail #### 2. **Security Layer** (`src/security/`) **Content Sanitizer** (`contentSanitizer.ts`): - **Secret Detection**: Identifies and redacts API keys, tokens, credentials - **Content Filtering**: Removes potentially malicious patterns - **Length Limits**: Enforces 1MB max content, 10K max line length - **Parameter Validation**: Sanitizes all user inputs **Regex Patterns** (`regexes.ts`): - Pattern library for detecting various secret types - Used for content sanitization across all tools #### 3. **Content Optimization** (`src/utils/minifier.ts`) **Multi-Strategy Minification**: - **Terser**: JavaScript/TypeScript files with advanced optimization - **Conservative**: Python, YAML, indentation-sensitive languages - **Aggressive**: HTML, CSS, C-style languages with comment removal - **JSON**: Proper JSON parsing and compression - **Markdown**: Specialized handling preserving structure - **General**: Plain text optimization **File Type Detection**: 50+ file extensions with appropriate strategies #### 4. **Caching System** (`src/utils/cache.ts`) - **24-hour TTL**: Balances freshness with performance - **1000 key limit**: Prevents unbounded memory growth - **MD5 key generation**: Efficient cache key creation from parameters - **Success-only caching**: Only caches successful responses #### 5. **Tool Architecture** (`src/mcp/tools/`) **Base Command Builder** (`utils/BaseCommandBuilder.ts`): - Abstract base class for all CLI command construction - Handles query formatting, flag management, parameter normalization - Supports both GitHub and NPM command types **Tool Relationships** (`utils/toolRelationships.ts`): - Defines interconnections between tools - Provides fallback suggestions based on context - Enables progressive refinement workflows **Security Validation Wrapper** (`utils/withSecurityValidation.ts`): - Applied to all tools for consistent security - Input parameter sanitization - Content filtering before response ### Tool Categories #### **GitHub Analysis Tools** 1. **`github_search_code`**: Code search with progressive refinement strategy 2. **`github_fetch_content`**: File content retrieval with partial access 3. **`github_search_repositories`**: Repository discovery and exploration 4. **`github_search_commits`**: Commit history and change analysis 5. **`github_search_pull_requests`**: PR analysis with optional diff content 6. **`github_search_issues`**: Issue tracking and bug analysis 7. **`github_view_repo_structure`**: Repository structure exploration #### **Package Management Tools** 8. **`package_search`**: NPM and Python package discovery #### **Infrastructure Tools** 9. **`api_status_check`**: GitHub/NPM connection verification ## Data Flow Architecture ### Request Processing Flow 1. **Input Validation**: Zod schema validation for all parameters 2. **Security Check**: Parameter sanitization and validation 3. **Cache Lookup**: Check for existing cached results 4. **Command Building**: Construct CLI commands using BaseCommandBuilder 5. **Execution**: Execute commands with error handling 6. **Content Processing**: Minification and optimization 7. **Security Filtering**: Final content sanitization 8. **Response Caching**: Cache successful responses 9. **Client Response**: Return optimized, secure results ### Error Handling & Fallbacks - **Tool-level**: Individual tools have built-in error recovery - **Command-level**: Multiple retry strategies and alternative approaches - **Content-level**: Graceful degradation when minification fails - **System-level**: Server continues operation despite individual tool failures ## Research Strategy Implementation ### Progressive Refinement Pattern ``` Phase 1: DISCOVERY - Broad search with minimal filters - Understand codebase structure Phase 2: CONTEXT - Analyze initial results - Identify relevant patterns Phase 3: TARGETED - Apply specific filters based on findings - Focus on relevant code sections Phase 4: DEEP-DIVE - Detailed analysis of specific files - Cross-reference findings ``` ### Multi-Tool Workflows Tools are designed to work together through defined relationships: - **Prerequisites**: Tools that should be run first - **Next Steps**: Logical follow-up tools - **Fallbacks**: Alternative tools when primary fails ### Smart Fallbacks Each tool provides context-aware fallback suggestions: - No results β†’ broader search scope - Access denied β†’ authentication check - Rate limits β†’ alternative approaches ## Security Implementation ### Content Sanitization - **Multi-layer approach**: Input validation + output filtering - **Pattern-based detection**: Comprehensive regex library for secrets - **Safe defaults**: Conservative approach to unknown content ### Input Validation - **Schema validation**: Zod-based parameter validation - **Parameter sanitization**: Remove potentially dangerous characters - **Length limits**: Prevent resource exhaustion attacks ### Output Security - **Content filtering**: Remove sensitive information from responses - **Minification safety**: Preserve functionality while reducing tokens - **Warning system**: Alert users to potential security issues ## Performance Optimizations ### Token Efficiency - **Smart minification**: File-type-aware compression strategies - **Partial content**: Range-based file reading - **Structured responses**: Optimized data formats - **Content deduplication**: Avoid redundant information ### Caching Strategy - **Intelligent expiration**: 24-hour TTL balances freshness/performance - **Selective caching**: Only cache successful operations - **Memory management**: 1000 key limit prevents unbounded growth ### Response Optimization - **Structured data**: Consistent, predictable response formats - **Minimal overhead**: Remove unnecessary metadata - **Compressed content**: Reduce token usage without losing information ## Engineering Excellence The system is built on five core engineering pillars that ensure robust, secure, and maintainable code: ### **πŸ”’ Security First** - **Input Validation**: Zod schemas + sanitization - **Secret Detection**: 50+ pattern library - **Safe Defaults**: Conservative approach - **Output Filtering**: Content sanitization ### **⚑ High Performance** - **Intelligent Caching**: 24h TTL, 1000 keys, MD5 - **Smart Minification**: 6 strategies, 50+ file types - **Partial Content Access**: Line ranges, context control - **Parallel Operations**: Multi-query support ### **πŸ›‘οΈ Reliability** - **4-Layer Error Handling**: Toolβ†’Commandβ†’Contentβ†’System - **Smart Fallbacks**: Context-aware alternatives - **Graceful Degradation**: Continue on partial failures - **Health Monitoring**: Connection validation ### **✨ Code Quality** - **Type Safety**: TypeScript + Zod validation - **Comprehensive Testing**: Vitest + coverage reports - **Code Standards**: ESLint + Prettier - **Living Documentation**: Architecture + API docs ### **πŸ”§ Maintainability** - **Modular Design**: BaseCommandBuilder pattern - **Clean Abstractions**: Security wrapper, tool relationships - **Easy Extension**: Plugin architecture for new tools - **Coding Standards**: Consistent patterns across tools ## Technology Stack ### Core Dependencies - **@modelcontextprotocol/sdk**: MCP protocol implementation - **zod**: Runtime type validation and schema definition - **axios**: HTTP client for external API calls - **node-cache**: In-memory caching solution ### Content Processing - **terser**: JavaScript/TypeScript minification - **clean-css**: CSS optimization - **html-minifier-terser**: HTML compression ### Development & Quality - **TypeScript**: Type safety and developer experience - **ESLint + Prettier**: Code quality and formatting - **Vitest**: Testing framework with coverage - **Rollup**: Build system and bundling ## Deployment & Integration ### Distribution - **NPM Package**: Easy installation and updates - **DXT Extension**: Desktop integration capability - **Docker Support**: Containerized deployment option ### Integration Points - **MCP Protocol**: Standard interface for AI assistants - **GitHub CLI**: Leverages official GitHub tooling - **NPM CLI**: Uses standard npm commands - **Standard I/O**: Communicates via stdin/stdout ## Future Extensibility The architecture supports easy extension through: 1. **New Tools**: Add tools by implementing BaseCommandBuilder pattern 2. **Additional APIs**: Extend beyond GitHub/NPM with same patterns 3. **Security Enhancements**: Modular security layer for new threat vectors 4. **Performance Optimizations**: Pluggable caching and minification strategies ## Bulk Operations Methodology ### Why Bulk Operations Are Superior Octocode-MCP implements a **bulk-first approach** that significantly outperforms traditional single-query methods. This methodology provides substantial improvements in efficiency, reasoning quality, and user experience. ### Efficiency Advantages #### **1. Reduced Latency** - **Traditional**: Multiple sequential round-trips between LLM and tools - **Bulk**: Single request handles multiple related queries simultaneously - **Improvement**: 3-5x faster execution for multi-step research tasks #### **2. Better API Utilization** - **Parallel Processing**: Execute up to 5 queries simultaneously per tool - **Connection Reuse**: Single CLI session handles multiple operations - **Rate Limit Optimization**: Batch operations are more API-friendly #### **3. Enhanced Context** - **Comparative Analysis**: LLM receives all results together for cross-referencing - **Progressive Refinement**: Can plan and execute complete research workflows upfront - **Holistic Understanding**: Full context enables better pattern recognition ### LLM Reasoning Enhancement #### **Complete Context Advantage** The bulk approach provides LLMs with comprehensive datasets that enable: 1. **Cross-Reference Analysis**: Compare results across different queries simultaneously 2. **Pattern Recognition**: Identify trends and relationships across multiple data sources 3. **Consistency Validation**: Check for contradictions and verify information accuracy 4. **Comprehensive Coverage**: Ensure no critical information is missed #### **Progressive Refinement in Single Call** Instead of iterative back-and-forth, the LLM can: - Plan complete research strategy upfront - Execute broad-to-specific query progression - Analyze all results together for final insights - Generate comprehensive reports with full context ### Implementation Benefits #### **Smart Query Planning** - **Relationship Mapping**: Identify related queries that should be executed together - **Progressive Structure**: Automatically structure broad β†’ specific β†’ validation queries - **Fallback Preparation**: Include alternative queries for error recovery #### **Coordinated Error Handling** - **Partial Success**: Continue with successful queries even if some fail - **Intelligent Fallbacks**: Use related query results to compensate for failures - **Context Preservation**: Maintain research continuity despite individual query issues ### Real-World Performance Impact - **Research Time**: Reduced from 30+ seconds to 5-8 seconds for complex analyses - **API Efficiency**: 60-70% reduction in total API calls - **Result Quality**: 200-400% improvement in context comprehensiveness - **User Experience**: Single interaction vs. multiple back-and-forth exchanges This bulk methodology represents a fundamental shift from reactive, sequential processing to proactive, parallel research orchestration, delivering superior results with dramatically improved efficiency. This architecture provides a robust, secure, and efficient foundation for AI-assisted code research and analysis.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bgauryy/octocode-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server