Local Explorer MCP

MIT License

164

Overview InspectNew Endpoints Schema Related Servers Reviews Score

local-explorer-mcp

ARCHITECTURE.md•27 kB

# Architecture > High-level architecture of local-explorer-mcp: An MCP server for intelligent local file system exploration ## Bird's Eye View local-explorer-mcp is a Model Context Protocol (MCP) server that provides AI agents with four specialized tools for exploring local file systems efficiently and securely. Built with TypeScript and Node.js, it wraps native Unix/Linux commands (ripgrep, find, ls) with multiple security layers, intelligent pagination, and token optimization to make codebase research faster and safer. The server follows a clean layered architecture: **MCP Interface → Tools → Command Builders → Security Validators → Native Commands**. Every file system operation passes through multiple security checkpoints (path validation, command validation, execution context validation) before executing, and all responses are automatically paginated and token-optimized for AI consumption. ## Entry Points - **Main Server**: `src/index.ts:main()` - Initializes MCP server, registers tools and prompts, configures security - **Tool Registration**: `src/tools/toolsManager.ts:registerTools()` - Registers all 4 tools with the MCP server - **Configuration**: `src/constants.ts` - All limits, defaults, and resource constraints - **Security Root**: `src/security/pathValidator.ts:PathValidator` - Global path validator protecting against traversal attacks **Start Here**: Read `src/index.ts` to understand server initialization, then explore `src/tools/toolsManager.ts` to see how tools are wired up, and `src/constants.ts` for all configurable limits. ## Code Map ### `/src/index.ts` **Purpose**: Server entry point and lifecycle management **Key Exports**: `main()` - Initializes MCP server, registers tools/prompts, sets up workspace root **Invariant**: ALWAYS sets `WORKSPACE_ROOT` from environment or `process.cwd()` as the allowed root path **Startup Flow**: Create server → Add workspace root → Register tools → Register prompts → Connect transport ### `/src/tools/` - Tool Implementations (Public API) **Purpose**: Core MCP tool implementations exposed to AI agents **Key Files**: - `toolsManager.ts` - Central registration hub, wires tools to MCP server - `local_ripgrep.ts` - Pattern search using ripgrep (fastest for code search) - `local_view_structure.ts` - Directory exploration using ls/readdir - `local_find_files.ts` - File discovery by metadata (name, size, time) - `local_fetch_content.ts` - Smart file reading with pattern extraction - `hints.ts` - Tool usage hints and workflow recommendations - `connections.ts` - Tool connection graph (which tool to use next) **Invariant**: All tools MUST use `executeBulkOperation()` for consistent error handling and formatting **API Boundary**: PUBLIC - These are the tools exposed via MCP protocol ### `/src/commands/` - Command Builder Pattern **Purpose**: Type-safe, validated shell command construction **Key Files**: - `BaseCommandBuilder.ts` - Abstract base with safe arg handling (escaping, validation) - `RipgrepCommandBuilder.ts` - Builds `rg` commands with all flags - `FindCommandBuilder.ts` - Builds `find` commands with predicates - `LsCommandBuilder.ts` - Builds `ls` commands (currently minimal, mostly uses Node.js fs) **Invariant**: Command builders NEVER execute commands directly - they only build `{command, args}` objects **Pattern**: `new XCommandBuilder().addFlag().addOption().build()` → `{command, args}` **Security**: All user input MUST pass through `escapeShellArg()` before being added to commands ### `/src/security/` - Multi-Layer Security System **Purpose**: Prevent path traversal, command injection, and unauthorized file access **Key Files**: - `pathValidator.ts` - Path traversal prevention, symlink resolution, allowed root enforcement - `commandValidator.ts` - Shell argument escaping, dangerous pattern detection - `executionContextValidator.ts` - Working directory validation before command execution - `ignoredPathFilter.ts` - Blocks sensitive files (.env, credentials, node_modules, etc.) - `patternsConstants.ts` - Centralized lists of dangerous patterns and file extensions - `securityConstants.ts` - Security configuration defaults **Invariant (CRITICAL)**: 1. **Path Security**: ALL paths MUST be validated via `pathValidator.validate()` - symlinks ALWAYS resolved to real paths 2. **Command Security**: ALL shell arguments MUST pass through `validateCommand()` before execution 3. **Context Security**: ALL commands MUST validate their working directory via `validateExecutionContext()` 4. **No Bypass**: Security checks CANNOT be disabled or skipped - they're hardcoded **Symlink Behavior**: - **Path validation**: ALWAYS resolves symlinks (security requirement, prevents attacks) - **Tool traversal**: By default does NOT follow symlinks (performance), user can opt-in via `followSymlinks` option **Flow**: `pathValidator.validate(path)` → `validateCommand(cmd, args)` → `validateExecutionContext(cwd)` → Execute ### `/src/scheme/` - Zod Validation Schemas **Purpose**: Runtime type validation and schema definitions for tool inputs **Key Files**: - `baseSchema.ts` - Shared base query schema, bulk query wrapper factory - `local_ripgrep.ts` - Ripgrep query schema, workflow modes - `local_view_structure.ts` - Directory listing query schema - `local_find_files.ts` - File search query schema - `local_fetch_content.ts` - Content reading query schema **Pattern**: Each schema exports `{QueryName}Schema` (single query) and `Bulk{QueryName}Schema` (array wrapper) **Invariant**: ALL tool inputs MUST be validated via Zod schemas - invalid inputs throw parse errors **Usage**: `const parsed = BulkRipgrepQuerySchema.parse(args);` - throws if validation fails ### `/src/utils/` - Cross-Cutting Utilities **Purpose**: Reusable utilities for bulk operations, caching, pagination, memory, tokens **Key Files**: - `bulkOperations.ts` - **CRITICAL**: Parallel query processing, error isolation, response formatting - `exec.ts` - Safe command execution with timeout, output limits, memory tracking - `memoryManager.ts` - Global memory reservation system (100MB total, 10MB per operation) - `pagination.ts` - Universal pagination for files, entries, matches - `cache.ts` - Command output caching with TTL (15 min default) - `tokenValidation.ts` - Estimates tokens, enforces 25K MCP response limit - `responses.ts` - Standardized response formatting with hints - `errors.ts` - Custom error classes and error handling utilities - `minifier.ts` - Code minification for token efficiency (removes whitespace) - `performanceMetrics.ts` - Execution time tracking and metrics - `toolHelpers.ts` - Common tool helper functions - `promiseUtils.ts` - Promise utilities for parallel execution **Critical Function**: `executeBulkOperation<TQuery, TResult>(queries, processor, config)` - ALL tools use this **Flow**: Parse queries → Reserve memory → Execute in parallel → Format response → Release memory **Invariant**: `executeBulkOperation()` ALWAYS catches errors and converts them to error results (never throws) ### `/src/types.ts` - Type Definitions **Purpose**: Shared TypeScript interfaces and types **Key Exports**: `ExecResult`, `ExecOptions`, `ValidationResult`, query/result types for each tool **Pattern**: Each tool has `{Tool}Query` and `{Tool}Result` interfaces ### `/src/prompts/` - MCP Prompts **Purpose**: Pre-defined prompts for common workflows **Key Files**: - `architecture.ts` - Registers architecture documentation generation prompt - `architecture.md` - The prompt template itself (system instructions) **Pattern**: `registerArchitecturePrompt(server)` - prompts guide agents through complex workflows ### `/tests/` - Test Suite **Purpose**: Comprehensive testing with focus on security and integration **Structure**: - `tests/tools/` - Tool functionality tests (ripgrep, find, ls, fetch) - `tests/commands/` - Command builder tests - `tests/security/` - **CRITICAL**: Security penetration tests, symlink attacks, race conditions - `tests/integration/` - End-to-end tests with real file systems - `tests/utils/` - Utility function tests **Test Framework**: Vitest **Run Tests**: `yarn test` (all), `yarn test:watch` (watch mode), `yarn test:coverage` (coverage report) **Philosophy**: Security tests are most important - they verify protections against real attack vectors ## Architectural Invariants 1. **Security-First**: ALL file system operations MUST pass through 3 security layers (path → command → context) 2. **No Direct Execution**: Commands MUST be built via builders, validated, then executed via `safeExec()` 3. **Bulk-Only Interface**: ALL tools MUST accept query arrays and use `executeBulkOperation()` for consistency 4. **Token Bounded**: ALL responses MUST respect the 25K token MCP limit with automatic pagination 5. **Memory Bounded**: Global 100MB limit, 10MB per operation, enforced via memory manager reservations 6. **Symlink Resolution**: Path validation ALWAYS resolves symlinks for security (cannot be disabled) 7. **Error Isolation**: Failed queries in a bulk operation MUST NOT crash other queries 8. **Idempotent Validation**: Running validation twice on same input MUST produce same result 9. **Cache Transparency**: Caching is transparent - tools don't know if results are cached 10. **No Side Effects**: Tools MUST be read-only (no file writes, moves, deletes) ## System Boundaries & Layers ``` ┌─────────────────────────────────────────────────────────────────┐ │ MCP Protocol Interface (External Boundary) │ │ - StdioServerTransport (stdin/stdout communication) │ │ - CallToolResult responses │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ Tool Layer (Public API) │ │ - local_ripgrep, local_view_structure │ │ - local_find_files, local_fetch_content │ │ - Input: Zod-validated query objects │ │ - Output: Standardized results with hints │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ Bulk Operations & Orchestration (Internal) │ │ - executeBulkOperation() - parallel processing │ │ - Error isolation, response formatting │ │ - Token validation, memory reservation │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ Command Builders (Internal) │ │ - Type-safe command construction │ │ - Argument escaping and formatting │ │ - Output: {command, args} objects │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ Security Validation Layer (Critical - Cannot Be Bypassed) │ │ 1. pathValidator.validate(path) - traversal prevention │ │ 2. validateCommand(cmd, args) - injection prevention │ │ 3. validateExecutionContext(cwd) - context validation │ │ 4. shouldIgnore(path) - sensitive file filtering │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ Command Execution (Internal) │ │ - safeExec() - spawn with timeout & output limits │ │ - Memory tracking, error handling │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ Native Commands (External) │ │ - ripgrep (rg), find, ls, Node.js fs │ │ - File system operations │ └─────────────────────────────────────────────────────────────────┘ ``` **Rules**: - Downward dependencies only - upper layers can call lower layers, never reverse - Security layer CANNOT be bypassed - all execution flows through it - MCP interface is the only external boundary - everything else is internal - Tools are stateless - each request is independent ## Key Abstractions & Types ### Core Abstractions - **`BaseCommandBuilder`** (`src/commands/BaseCommandBuilder.ts:13`) - Abstract command builder with type-safe argument handling - **`PathValidator`** (`src/security/pathValidator.ts:55`) - Centralized path security enforcer - **`MemoryManager`** (`src/utils/memoryManager.ts:42`) - Global memory reservation system (singleton) - **`BulkOperation`** (`src/utils/bulkOperations.ts:77`) - Parallel query processor with error isolation ### Type Contracts - **`ValidationResult`** (`src/types.ts:30`) - Standard return type for all validators: `{isValid, sanitizedPath?, error?}` - **`ExecResult`** (`src/types.ts:8`) - Standard command execution result: `{code, stdout, stderr, success}` - **`BaseResult`** - All tool results extend this: `{status: 'hasResults' | 'empty' | 'error', error?, hints?}` ### Query/Result Pattern Each tool follows this pattern: ```typescript // Query: Input validated by Zod schema interface {Tool}Query extends BaseQuery { path: string; // Required for most tools researchGoal?: string; // Helps track research sessions reasoning?: string; // Why this query matters // ... tool-specific options } // Result: Output with status indicator interface {Tool}Result extends BaseResult { status: 'hasResults' | 'empty' | 'error'; // ... tool-specific data hints?: string[]; // Next-step recommendations } ``` **Example**: `RipgrepQuery` → `executeBulkOperation()` → `SearchContentResult[]` ## Cross-Cutting Concerns ### Testing **Framework**: Vitest 3.2.4 **Location**: `tests/` directory mirroring `src/` structure **Run**: `yarn test` (all), `yarn test:watch` (watch), `yarn test:coverage` (coverage) **Philosophy**: Security tests are paramount - they verify protections against real attack vectors (symlink attacks, path traversal, command injection, race conditions) **Coverage Areas**: - Unit tests for all tools, builders, utilities - Integration tests with real file system operations - Security penetration tests for all attack vectors - Performance tests for large directories ### Error Handling **Strategy**: Fail-safe with error isolation in bulk operations **Pattern**: 1. Validation errors throw immediately (input validation via Zod) 2. Execution errors are caught and converted to error results 3. Bulk operations never throw - they return error results for failed queries 4. Security violations always reject (path traversal, command injection) **Invariant**: A failed query MUST NOT crash other queries in the same bulk operation **Custom Errors**: `src/utils/errors.ts` - Domain-specific error classes ### Configuration **Files**: - `src/constants.ts` - All limits, defaults, resource constraints - `manifest.json` - MCP server metadata - `package.json` - Dependencies, scripts, Node version requirement (>=18.0.0) - Environment: `WORKSPACE_ROOT` - Sets allowed root directory (defaults to `process.cwd()`) **Key Limits** (from `RESOURCE_LIMITS`): - MCP response: 25,000 tokens max - Command timeout: 30 seconds - Output size: 10MB per command - Memory: 100MB global, 10MB per operation - Large file threshold: 100KB (requires pagination) - Cache TTL: 15 minutes ### Performance **Critical Optimizations**: 1. **Bulk Operations**: Parallel query processing with `Promise.all()` 2. **Pagination**: Automatic for large results (files, entries, matches) 3. **Caching**: Command output cached with TTL (15 min) - `node-cache` 4. **Discovery Mode**: `filesOnly=true` for ripgrep (25x faster, no match content) 5. **Minification**: Removes whitespace from code (30-60% token savings) 6. **Memory Reservations**: Prevents OOM by tracking and limiting concurrent operations 7. **Streaming**: Large output streams are chunked to avoid buffering entire results **Hot Paths**: - `executeBulkOperation()` - Used by all tools, optimized for parallel execution - `pathValidator.validate()` - Called for every file system operation, uses caching - `safeExec()` - Command execution with timeout and output limits ### Observability **Logging**: Currently minimal (process exits on uncaught errors) **Metrics**: `performanceMetrics.ts` tracks execution times **Debug**: `yarn debug` - Runs MCP inspector for debugging tool calls **Token Tracking**: `tokenValidation.ts` estimates and warns about high token usage **Memory Tracking**: `memoryManager.getCurrentUsage()` - Real-time memory usage stats **Debug Tips**: - Set `NODE_ENV=development` for verbose error messages - Use `yarn test:ui` for visual test debugging - Check command builder output with `.getArgs()` method - Use MCP inspector to see raw request/response: `yarn debug` ## Dependencies & Build ### Runtime Dependencies - **`@modelcontextprotocol/sdk`** (^1.18.1) - MCP protocol implementation - **`zod`** (^3.25.26) - Runtime schema validation for all tool inputs - **`node-cache`** (^5.1.2) - In-memory caching with TTL - **`octocode-utils`** (^5.0.0) - Shared utilities from octocode-mcp ### External System Dependencies - **`ripgrep`** - Required for `local_ripgrep` tool, must be in PATH - **`find`** - Unix find command (available on all *nix systems) - **`ls`** - Unix ls command (though mostly using Node.js `fs` now) ### Build System **Bundler**: Rollup 4.46.2 **Config**: `rollup.config.js` - Bundles to single `dist/index.js` with CommonJS plugins **TypeScript**: 5.9.2 with strict mode **Target**: Node.js >=18.0.0 (ES modules) **Build Commands**: ```bash yarn build # Lint + clean + build yarn build:dev # Build without linting yarn build:watch # Watch mode for development yarn clean # Remove dist/ ``` **Output**: Single bundled file `dist/index.js` (ES module with shebang for CLI execution) ## Architectural Decision Records (ADRs) ### Security Layer is Mandatory **Status**: Accepted **Context**: File system operations can expose sensitive data or enable attacks **Decision**: ALL operations pass through 3 security layers (path, command, context) that CANNOT be bypassed **Consequences**: - ✅ Strong security guarantees against known attack vectors - ✅ Symlinks always resolved (prevents traversal) - ✅ Sensitive files automatically blocked (.env, credentials, etc.) - ⚠️ Performance overhead (~1-2ms per validation) - ⚠️ Some legitimate use cases may be blocked (edge cases) ### Bulk-Only Tool Interface **Status**: Accepted **Context**: AI agents benefit from sending multiple queries in one request **Decision**: ALL tools accept arrays of queries and use `executeBulkOperation()` **Consequences**: - ✅ Parallel execution for better performance - ✅ Error isolation (one failed query doesn't crash others) - ✅ Consistent response format across all tools - ⚠️ Single queries still wrapped in arrays (slight overhead) ### Token-Based Pagination **Status**: Accepted **Context**: MCP protocol has 25K token response limit, large results must be split **Decision**: Automatic pagination based on estimated token count, not just item count **Consequences**: - ✅ Prevents exceeding MCP limits (avoids protocol errors) - ✅ Optimizes token usage for AI context windows - ⚠️ Token estimation is approximate (uses 4 chars/token heuristic) - ⚠️ Pagination API exposed to users (adds complexity) ### Memory Reservation System **Status**: Accepted **Context**: Concurrent operations can cause OOM crashes **Decision**: Global memory manager tracks reservations, rejects operations when limit exceeded **Consequences**: - ✅ Prevents OOM crashes under heavy load - ✅ Fair resource allocation across concurrent requests - ⚠️ Operations can be rejected if memory is exhausted - ⚠️ Requires manual `reserve()`/`release()` pattern (not automatic) ### Command Builders Over String Templates **Status**: Accepted **Context**: String concatenation for shell commands is injection-prone **Decision**: Type-safe command builder classes that auto-escape arguments **Consequences**: - ✅ Strong typing prevents many bugs at compile time - ✅ Automatic escaping reduces injection risk - ✅ Easier to test (can inspect args before execution) - ⚠️ More verbose than string templates - ⚠️ Each command needs its own builder class ## Code Style & Patterns ### Naming Conventions - **Interfaces**: PascalCase with descriptive nouns (`PathValidator`, `ExecResult`) - **Functions**: camelCase with verb prefixes (`validatePath`, `executeCommand`) - **Constants**: SCREAMING_SNAKE_CASE (`TOOL_NAMES`, `RESOURCE_LIMITS`) - **Files**: camelCase matching primary export (`pathValidator.ts`, `bulkOperations.ts`) - **Types**: PascalCase, often with suffix (`ValidationResult`, `RipgrepQuery`) ### Common Patterns **Validation Pattern**: ```typescript const validation = validator.validate(input); if (!validation.isValid) { throw new Error(validation.error); } // Use validation.sanitizedPath ``` **Builder Pattern**: ```typescript const {command, args} = new RipgrepCommandBuilder() .addPattern(pattern) .addPath(path) .addFlag('--json') .build(); ``` **Bulk Operation Pattern**: ```typescript return executeBulkOperation( queries, processQuery, {toolName: TOOL_NAMES.LOCAL_RIPGREP} ); ``` **Memory Management Pattern**: ```typescript const id = memoryManager.reserve(size, toolName); if (!id) throw new Error('Insufficient memory'); try { // ... operation } finally { memoryManager.release(id); } ``` ### Avoid - **String concatenation for commands** - Use builders - **Synchronous fs operations** - Use async/await with promises - **Throwing in bulk processors** - Return error results instead - **Bypassing security validation** - All paths MUST be validated - **Unbounded operations** - Always set timeout and size limits - **Mutation of shared state** - Keep operations pure and stateless ## For Contributors ### To Fix a Bug 1. **Identify the layer**: Tool → Builder → Security → Execution? 2. **Write a failing test** in appropriate `tests/` subdirectory 3. **Fix the code** in `src/` 4. **Verify security**: If touching paths/commands, run `yarn test tests/security/` 5. **Check linting**: `yarn lint` (auto-fix with `yarn lint:fix`) ### To Add a Feature 1. **For new tool**: Copy pattern from `src/tools/local_ripgrep.ts` - Create schema in `src/scheme/` - Implement tool in `src/tools/` - Register in `src/tools/toolsManager.ts` - Add tests in `tests/tools/` 2. **For new option**: Add to Zod schema first, then implement 3. **For new command**: Create new builder extending `BaseCommandBuilder` ### To Find Something - **Tool implementation**: `src/tools/local_{toolname}.ts` - **Input validation**: `src/scheme/local_{toolname}.ts` - **Command building**: `src/commands/{Command}Builder.ts` - **Security logic**: `src/security/{validator|filter}.ts` - **Response formatting**: `src/utils/responses.ts` - **Resource limits**: `src/constants.ts → RESOURCE_LIMITS` ### To Add a Security Feature 1. **Identify attack vector**: What are you preventing? 2. **Add to security layer**: `src/security/` - path/command/context validator 3. **Write penetration test**: `tests/security/` - Verify the attack is blocked 4. **Document invariant**: Update this file's "Architectural Invariants" 5. See `tests/security/` for examples: symlink attacks, path traversal, race conditions ## API Stability **Public (Stable)**: - MCP tool interfaces (local_ripgrep, local_view_structure, local_find_files, local_fetch_content) - Tool query/result types and Zod schemas - Constants in `src/constants.ts` (especially `RESOURCE_LIMITS`) **Internal (May Change)**: - Command builder implementations - Utility function signatures - Internal type definitions - Response formatting details (though structure stays consistent) **Experimental**: - Prompt templates in `src/prompts/` - Hints and workflow recommendations - Token estimation heuristics **Breaking Changes Policy**: Major version bump for public API changes, minor version for internal refactoring --- **Updated**: November 4, 2025 **By**: Octocode Team **Questions**: Open an issue on GitHub or check existing docs in `README.md` **Related Docs**: See `README.md` for user guide, `RIPGREP_IMPROVEMENTS.md` for optimization notes

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bgauryy/local-explorer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server