Octocode MCP

octocode-mcp
packages
octocode-local

ARCHITECTURE.md•21.7 kB

# Architecture > High-level architecture of @octocode/local - MCP server for local file system research ## Bird's Eye View **What**: An MCP (Model Context Protocol) server that provides AI assistants with fast, secure local file system exploration using native Linux commands (`rg`, `ls`, `find`). **How**: TypeScript + Node.js ≥18, using `@modelcontextprotocol/sdk` for MCP transport, Zod for schema validation, and `spawn()` for safe command execution. Exposes 4 tools: `local_ripgrep`, `local_view_structure`, `local_find_files`, `local_fetch_content`. **Why**: - **Security-first**: Whitelist-only command execution, path validation, symlink attack prevention - **Token efficiency**: Structured pagination, minification, bulk operations (1-5 queries per call) - **Progressive refinement**: Discovery → targeted → deep-dive workflow ## Entry Points - **Main**: `src/index.ts:22` - `main()` function, creates McpServer, registers tools/prompts - **Tool Registration**: `src/tools/toolsManager.ts:47` - `registerTools()` registers all 4 tools - **Prompt Registration**: `src/tools/toolsManager.ts:116` - `registerPrompts()` registers AI guidance prompts - **Path Security**: `src/security/pathValidator.ts:191` - Global `pathValidator` instance **Start here**: Read `src/index.ts`, then explore `src/tools/toolsManager.ts` for tool registration pattern. ## Code Map ### `/src/tools/` **Purpose**: Tool implementations - each file is a complete tool handler **Key files**: - `toolsManager.ts:47-111` - Tool registration using `server.registerTool()` pattern - `local_ripgrep.ts:21-231` - Pattern search using ripgrep, returns structured matches with byte offsets - `local_view_structure.ts` - Directory listing using `ls`, supports pagination and filtering - `local_find_files.ts` - File discovery using `find`, metadata-based filtering - `local_fetch_content.ts` - File content extraction with matchString targeting - `hints.ts` - Context-aware hints for each tool's result status **Invariants**: - All tools MUST use `validateToolPath()` before execution - All tools MUST return `SearchContentResult | ViewStructureResult | FindFilesResult | FetchContentResult` - All tools MUST support bulk operations via `executeBulkOperation()` **API Boundary**: Tools are the public interface; everything else is internal. ### `/src/commands/` **Purpose**: Safe command builders using the Builder pattern **Key files**: - `BaseCommandBuilder.ts:10-74` - Abstract base class with `addFlag()`, `addOption()`, `addEscapedArg()`, `build()` - `RipgrepCommandBuilder.ts:5-288` - Builds `rg` commands from `RipgrepQuery` - `FindCommandBuilder.ts` - Builds `find` commands from `FindFilesQuery` - `LsCommandBuilder.ts` - Builds `ls` commands from `ViewStructureQuery` **Invariants**: - All user input MUST go through `addEscapedArg()` (uses `escapeShellArg()`) - Command builders NEVER execute commands - they only build `{command, args}` tuples ### `/src/security/` **Purpose**: Multi-layer security - path validation, command validation, execution context **Key files**: - `pathValidator.ts:41-191` - `PathValidator` class: validates paths against allowed roots, resolves symlinks - `commandValidator.ts:12-181` - `validateCommand()`: whitelist check + position-aware arg validation - `executionContextValidator.ts` - Validates cwd and execution environment - `ignoredPathFilter.ts:11-82` - Filters sensitive paths (`.git`, `.env`, `.ssh`) - `securityConstants.ts:12-43` - `ALLOWED_COMMANDS = ['rg', 'ls', 'find']`, `DANGEROUS_PATTERNS` - `patternsConstants.ts` - Regex patterns for ignored files/paths **Invariants**: - ONLY commands in `ALLOWED_COMMANDS` can execute (`src/security/securityConstants.ts:12`) - Symlinks are ALWAYS resolved before path validation (security requirement) - Path must be within `allowedRoots` (set via `WORKSPACE_ROOT` env or `process.cwd()`) ### `/src/scheme/` **Purpose**: Zod schemas for input validation and tool descriptions **Key files**: - `baseSchema.ts` - `BaseQuerySchema` with `researchGoal`, `reasoning` fields; `createBulkQuerySchema()` - `local_ripgrep.ts:38-332` - `RipgrepQuerySchema` with 40+ options, `applyWorkflowMode()`, `validateRipgrepQuery()` - `local_view_structure.ts` - `ViewStructureQuerySchema` with pagination options - `local_find_files.ts` - `FindFilesQuerySchema` with metadata filters - `local_fetch_content.ts` - `FetchContentQuerySchema` with matchString support - `responsePriority.ts` - Key ordering for YAML response formatting **Invariants**: - All tool inputs MUST be validated against Zod schemas before processing - Bulk schemas wrap single schemas with `queries: z.array(SingleSchema).min(1).max(5)` ### `/src/utils/` **Purpose**: Shared utilities for execution, pagination, response formatting **Key files**: - `exec.ts:14-118` - `safeExec()`: spawns command with validation, timeout, output limits - `bulkOperations.ts:76-211` - `executeBulkOperation()`: parallel query processing + response formatting - `pagination.ts` - Character-based and entity-based pagination utilities - `minifier.ts` - Content minification for token efficiency - `responses.ts` - `createResponseFormat()` for YAML output - `tokenValidation.ts` - Validates response size against MCP 25K token limit - `toolHelpers.ts` - `validateToolPath()`, `createErrorResult()` helpers **Invariants**: - `safeExec()` MUST be the only way to execute shell commands - Responses MUST be validated against token limit before returning ### `/src/errors/` **Purpose**: Centralized error codes and typed error handling **Key files**: - `errorCodes.ts:15-351` - `ERROR_CODES` enum, `ToolError` class, factory functions `ToolErrors.*` **Invariants**: - All errors MUST use `ErrorCode` from `ERROR_CODES` - Error results MUST include `errorCode` field for programmatic handling ### `/src/prompts/` **Purpose**: AI guidance prompts for tool usage patterns **Key files**: - `research_local_explorer.md/ts` - Local exploration workflow ### `/src/constants.ts` **Purpose**: Centralized configuration values **Key exports** (`src/constants.ts:4-183`): - `TOOL_NAMES` - Tool name constants - `DEFAULTS` - Timeout (30s), max output (10MB), context lines (5) - `RESOURCE_LIMITS` - Token limits (25K), pagination defaults, file size thresholds - `SECURITY_DEFAULTS` - Symlink handling configuration ## System Boundaries & Layers ``` ┌─────────────────────────────────────────────────────────────────┐ │ MCP Transport │ │ (StdioServerTransport) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Tool Registration Layer │ │ toolsManager.ts: registerTools() + registerPrompts() │ │ - Zod schema validation (BulkXxxQuerySchema.parse()) │ │ - Bulk operation wrapper (executeBulkOperation()) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Tool Implementation │ │ local_ripgrep.ts, local_view_structure.ts, etc. │ │ - Path validation (validateToolPath()) │ │ - Command building (XxxCommandBuilder) │ │ - Result formatting (structured output + hints) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Command Execution │ │ exec.ts: safeExec() → spawn() │ │ - Command validation (validateCommand()) │ │ - Context validation (validateExecutionContext()) │ │ - Timeout + output size limits │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Security Layer │ │ - pathValidator: allowed roots + symlink resolution │ │ - commandValidator: whitelist + dangerous pattern detection │ │ - ignoredPathFilter: sensitive file filtering │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ OS Commands (rg, ls, find) │ └─────────────────────────────────────────────────────────────────┘ ``` **Rules**: - Data flows DOWN only (no callbacks to upper layers) - Security validation happens at EVERY layer boundary - All external input (queries) validated by Zod before processing **Dependency Direction**: - `tools/` → `commands/`, `security/`, `utils/`, `scheme/` - `commands/` → `security/` (for `escapeShellArg`) - `utils/exec.ts` → `security/` (for validation) - NEVER: `security/` → `tools/` (security is foundational) ## Key Abstractions & Types - **`McpServer`** (`@modelcontextprotocol/sdk`) - MCP server instance, used for tool/prompt registration - **Used by**: `src/index.ts:23`, `src/tools/toolsManager.ts:47` - **`BaseCommandBuilder`** (`src/commands/BaseCommandBuilder.ts:10`) - Abstract builder for safe command construction - **Used by**: `RipgrepCommandBuilder`, `FindCommandBuilder`, `LsCommandBuilder` - **Extends**: None (base class) - **`PathValidator`** (`src/security/pathValidator.ts:41`) - Validates paths against allowed roots - **Used by**: `validateToolPath()` in all tools - **Singleton**: `pathValidator` exported at line 191 - **`RipgrepQuery`** (`src/scheme/local_ripgrep.ts:332`) - Typed query for ripgrep tool (40+ options) - **Used by**: `searchContentRipgrep()`, `RipgrepCommandBuilder` - **`SearchContentResult`** (`src/types.ts:84`) - Structured result with `files[]`, `pagination`, `hints` - **Used by**: `local_ripgrep.ts` return type - **`ToolError`** (`src/errors/errorCodes.ts:154`) - Typed error with `errorCode`, `category`, `recoverability` - **Used by**: Error handling throughout - **`BulkQuery<T>`** / **`BulkResult<T>`** (`src/types.ts:359-370`) - Generic bulk operation wrappers - **Used by**: `executeBulkOperation()` ## Architectural Decisions ### ADR-001: Whitelist-Only Command Execution **Date:** 2024 | **Status:** Accepted **Context:** MCP servers execute on user machines with AI-generated inputs. Command injection is a critical risk. **Alternatives:** - Blacklist dangerous patterns (incomplete, bypassable) - Sanitize all inputs (complex, error-prone) - Whitelist allowed commands (restrictive but safe) **Decision:** Only `rg`, `ls`, `find` commands are allowed (`src/security/securityConstants.ts:12`). All other commands fail validation. **Consequences:** - PROS: Eliminates command injection risk; simple to audit; easy to extend - CONS: Cannot add new commands without code change; limits flexibility ### ADR-002: Bulk Operations (1-5 Queries Per Call) **Date:** 2024 | **Status:** Accepted **Context:** AI assistants often need multiple related queries. Single-query tools waste round trips. **Alternatives:** - Single query per call (simple but inefficient) - Unlimited queries (risk of abuse/timeout) - Fixed batch size 1-5 (balanced) **Decision:** All tools accept `queries: T[]` array with 1-5 items, processed in parallel via `Promise.allSettled()`. **Consequences:** - PROS: 5x reduction in round trips; parallel execution; graceful partial failures - CONS: More complex response structure; harder to debug individual queries ### ADR-003: Symlink Resolution for Security **Date:** 2024 | **Status:** Accepted **Context:** Symlinks can point outside workspace, enabling path traversal attacks. **Alternatives:** - Block all symlinks (breaks legitimate use cases) - Follow symlinks blindly (security hole) - Resolve and validate targets (secure + functional) **Decision:** `PathValidator.validate()` ALWAYS resolves symlinks via `fs.realpathSync()` before checking against allowed roots (`src/security/pathValidator.ts:111-136`). **Consequences:** - PROS: Prevents symlink-based attacks; transparent to users; allows legitimate symlinks within workspace - CONS: Slight performance overhead; may confuse users when symlink target is rejected ### ADR-004: Spawn Over Shell Execution **Date:** 2024 | **Status:** Accepted **Context:** Node.js `exec()` uses shell, enabling injection via shell metacharacters. **Alternatives:** - `exec()` with shell (convenient but dangerous) - `spawn()` without shell (safe but requires array args) - `execFile()` (similar to spawn) **Decision:** Use `spawn()` with args array (`src/utils/exec.ts:37`). Arguments passed directly to process, no shell interpretation. **Consequences:** - PROS: Shell metacharacters are harmless; no escaping needed for most args - CONS: Cannot use shell features (pipes, redirects); must build args array manually ### ADR-005: YAML Response Format **Date:** 2024 | **Status:** Accepted **Context:** MCP responses need to be readable by both AI and humans. **Alternatives:** - JSON (verbose, hard to read) - Plain text (unstructured) - YAML (readable, structured) **Decision:** Use YAML-like format via `createResponseFormat()` with key priority ordering (`src/scheme/responsePriority.ts`). **Consequences:** - PROS: Human-readable; preserves structure; token-efficient vs JSON - CONS: Custom format (not standard YAML); requires priority configuration ## Cross-Cutting Concerns ### Error Handling **Strategy**: Typed errors with `ErrorCode` enum; graceful degradation in bulk operations. **Invariants**: - All errors MUST include `errorCode` from `ERROR_CODES` (`src/errors/errorCodes.ts:15-35`) - Bulk operations MUST NOT fail entirely if one query fails (use `Promise.allSettled`) - User-facing errors MUST include actionable hints **Examples**: `src/errors/errorCodes.ts:258-351` - `ToolErrors` factory functions ### Testing **Framework**: Vitest | **Location**: `tests/` (mirrors `src/` structure) | **Run**: `yarn test` **Philosophy**: - Unit tests for commands, utils, security (`tests/commands/`, `tests/utils/`, `tests/security/`) - Integration tests for full tool flows (`tests/integration/`) - Security-focused tests: penetration, symlink attacks, race conditions (`tests/security/`) - Mock filesystem operations where possible **Coverage**: 23 test files covering all major components ### Configuration **Files**: - `package.json` - Dependencies, scripts, metadata - `vitest.config.ts` - Test configuration - `rollup.config.js` - Build configuration **Env Vars**: - `WORKSPACE_ROOT` - Override default workspace root (defaults to `process.cwd()`) **Precedence**: Env vars > defaults in `src/constants.ts` ### Security **Auth**: N/A (local tool, no network auth) **Path Security** (`src/security/pathValidator.ts`): - Paths validated against `allowedRoots` (workspace boundaries) - Symlinks resolved to real paths before validation - Ignored patterns block `.git`, `.env`, `.ssh`, etc. **Command Security** (`src/security/commandValidator.ts`): - Whitelist: only `rg`, `ls`, `find` allowed - Position-aware validation: patterns vs paths treated differently - Dangerous patterns blocked: `;&|$(){}[]<>` **Execution Security** (`src/utils/exec.ts`): - `spawn()` without shell (no injection via metacharacters) - Timeout: 30s default - Output limit: 10MB - Context validation before execution ### Observability **Logging**: Minimal (errors only via stderr) **Debug**: Tools support `debug: true` option to enable verbose output **Metrics**: `SearchStats` returned by ripgrep (`src/types.ts:112-119`): matchCount, filesSearched, bytesSearched, searchTime ## Dependencies & Build ### Key Dependencies - **`@modelcontextprotocol/sdk`**: ^1.18.1 - MCP server/transport implementation - **`zod`**: ^3.25.26 - Runtime schema validation - **`octocode-utils`**: ^5.0.0 - Shared utilities (minifier, YAML formatter) ### Build Commands ```bash yarn install # Install dependencies yarn build # Lint + clean + rollup bundle yarn build:dev # Clean + rollup (skip lint) yarn test # Run all tests yarn test:watch # TDD mode yarn lint # ESLint check yarn lint:fix # Auto-fix lint issues ``` ### Output - `dist/index.js` - Single bundled file (Rollup) - Executable via `node dist/index.js` or as MCP server ## Design Patterns & Constraints ### Patterns Used - **Builder Pattern**: Command construction (`src/commands/BaseCommandBuilder.ts:10`) - Fluent API: `builder.addFlag('-l').addOption('-A', 3).build()` - **Singleton**: `pathValidator` instance (`src/security/pathValidator.ts:191`) - **Factory**: `ToolErrors.*` functions (`src/errors/errorCodes.ts:258`) - **Strategy**: Workflow modes in ripgrep (`src/scheme/local_ripgrep.ts:339-374`) ### Anti-Patterns to Avoid - **AVOID direct `exec()` calls**: Always use `safeExec()` wrapper - **AVOID shell strings**: Use `spawn()` with args array, never `exec('cmd ' + userInput)` - **AVOID bypassing PathValidator**: All paths MUST be validated before use - **AVOID hardcoded limits**: Use `RESOURCE_LIMITS` constants ### Assumptions - Node.js ≥18.0.0 available - `rg` (ripgrep), `ls`, `find` commands available on system PATH - Workspace root is a trusted directory - AI assistant handles response pagination ### Constraints **Technical**: - MCP response limit: 25K tokens (~100K chars) - Command timeout: 30 seconds - Output size: 10MB max - Bulk queries: 1-5 per call **Security**: - Commands restricted to whitelist - Paths restricted to workspace root - Symlinks resolved before validation ## Contributors Guide ### Bug Fixes 1. **Locate**: Most bugs in `src/tools/` (tool logic) or `src/security/` (validation) 2. **Fix**: Edit source file, add regression test in `tests/` 3. **Test**: `yarn test` (full) or `npx vitest run tests/path/to/file.test.ts` (single) 4. **Verify**: `yarn lint && yarn build` ### New Features 1. **Design**: Review this doc, identify affected layers 2. **Schema**: Add/modify Zod schema in `src/scheme/` 3. **Implement**: Follow pattern from existing tools (e.g., `local_ripgrep.ts`) 4. **Security**: If new command needed, add to `ALLOWED_COMMANDS` and document 5. **Test**: Add tests in `tests/tools/` and `tests/security/` 6. **Document**: Update this file if architecture changes ### Adding a New Tool 1. Create schema in `src/scheme/new_tool.ts` (extend `BaseQuerySchema`) 2. Create command builder in `src/commands/NewToolCommandBuilder.ts` (extend `BaseCommandBuilder`) 3. Create tool implementation in `src/tools/new_tool.ts` 4. Register in `src/tools/toolsManager.ts:registerTools()` 5. Add tool name to `src/constants.ts:TOOL_NAMES` 6. If new OS command needed: - Add to `ALLOWED_COMMANDS` in `src/security/securityConstants.ts` - Add position-aware validation in `src/security/commandValidator.ts:getPatternArgPositions()` 7. Add comprehensive tests in `tests/tools/new_tool.test.ts` ### Navigation Tips - **Find tool implementation**: Look in `src/tools/local_*.ts` - **Understand query options**: Read Zod schema in `src/scheme/local_*.ts` - **Debug command building**: Check `src/commands/*CommandBuilder.ts` - **Security concerns**: Start with `src/security/pathValidator.ts` - **Add new constant**: Edit `src/constants.ts` (RESOURCE_LIMITS or DEFAULTS) --- **Last Updated:** November 2025 **Package Version:** 1.1.6 **Questions:** https://github.com/bgauryy/octocode-mcp/issues

Latest Blog Posts

Federated Learning with MCP: Building Privacy-Preserving Agents Across Distributed Edges
By Om-Shree-0709 on December 21, 2025.
Secure
mcp
Learning
What Is Context Bloat in MCP?
By Om-Shree-0709 on December 16, 2025.
mcp
Context Bloat
MCP Moves to the Linux Foundation: Neutral Stewardship for Agentic Infrastructure
By Om-Shree-0709 on December 15, 2025.
mcp
anthropic
Linux Foundation

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bgauryy/octocode-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server