ARCHITECTURE.mdβ’27 kB
# Architecture
> High-level architecture of local-explorer-mcp: An MCP server for intelligent local file system exploration
## Bird's Eye View
local-explorer-mcp is a Model Context Protocol (MCP) server that provides AI agents with four specialized tools for exploring local file systems efficiently and securely. Built with TypeScript and Node.js, it wraps native Unix/Linux commands (ripgrep, find, ls) with multiple security layers, intelligent pagination, and token optimization to make codebase research faster and safer.
The server follows a clean layered architecture: **MCP Interface β Tools β Command Builders β Security Validators β Native Commands**. Every file system operation passes through multiple security checkpoints (path validation, command validation, execution context validation) before executing, and all responses are automatically paginated and token-optimized for AI consumption.
## Entry Points
- **Main Server**: `src/index.ts:main()` - Initializes MCP server, registers tools and prompts, configures security
- **Tool Registration**: `src/tools/toolsManager.ts:registerTools()` - Registers all 4 tools with the MCP server
- **Configuration**: `src/constants.ts` - All limits, defaults, and resource constraints
- **Security Root**: `src/security/pathValidator.ts:PathValidator` - Global path validator protecting against traversal attacks
**Start Here**: Read `src/index.ts` to understand server initialization, then explore `src/tools/toolsManager.ts` to see how tools are wired up, and `src/constants.ts` for all configurable limits.
## Code Map
### `/src/index.ts`
**Purpose**: Server entry point and lifecycle management
**Key Exports**: `main()` - Initializes MCP server, registers tools/prompts, sets up workspace root
**Invariant**: ALWAYS sets `WORKSPACE_ROOT` from environment or `process.cwd()` as the allowed root path
**Startup Flow**: Create server β Add workspace root β Register tools β Register prompts β Connect transport
### `/src/tools/` - Tool Implementations (Public API)
**Purpose**: Core MCP tool implementations exposed to AI agents
**Key Files**:
- `toolsManager.ts` - Central registration hub, wires tools to MCP server
- `local_ripgrep.ts` - Pattern search using ripgrep (fastest for code search)
- `local_view_structure.ts` - Directory exploration using ls/readdir
- `local_find_files.ts` - File discovery by metadata (name, size, time)
- `local_fetch_content.ts` - Smart file reading with pattern extraction
- `hints.ts` - Tool usage hints and workflow recommendations
- `connections.ts` - Tool connection graph (which tool to use next)
**Invariant**: All tools MUST use `executeBulkOperation()` for consistent error handling and formatting
**API Boundary**: PUBLIC - These are the tools exposed via MCP protocol
### `/src/commands/` - Command Builder Pattern
**Purpose**: Type-safe, validated shell command construction
**Key Files**:
- `BaseCommandBuilder.ts` - Abstract base with safe arg handling (escaping, validation)
- `RipgrepCommandBuilder.ts` - Builds `rg` commands with all flags
- `FindCommandBuilder.ts` - Builds `find` commands with predicates
- `LsCommandBuilder.ts` - Builds `ls` commands (currently minimal, mostly uses Node.js fs)
**Invariant**: Command builders NEVER execute commands directly - they only build `{command, args}` objects
**Pattern**: `new XCommandBuilder().addFlag().addOption().build()` β `{command, args}`
**Security**: All user input MUST pass through `escapeShellArg()` before being added to commands
### `/src/security/` - Multi-Layer Security System
**Purpose**: Prevent path traversal, command injection, and unauthorized file access
**Key Files**:
- `pathValidator.ts` - Path traversal prevention, symlink resolution, allowed root enforcement
- `commandValidator.ts` - Shell argument escaping, dangerous pattern detection
- `executionContextValidator.ts` - Working directory validation before command execution
- `ignoredPathFilter.ts` - Blocks sensitive files (.env, credentials, node_modules, etc.)
- `patternsConstants.ts` - Centralized lists of dangerous patterns and file extensions
- `securityConstants.ts` - Security configuration defaults
**Invariant (CRITICAL)**:
1. **Path Security**: ALL paths MUST be validated via `pathValidator.validate()` - symlinks ALWAYS resolved to real paths
2. **Command Security**: ALL shell arguments MUST pass through `validateCommand()` before execution
3. **Context Security**: ALL commands MUST validate their working directory via `validateExecutionContext()`
4. **No Bypass**: Security checks CANNOT be disabled or skipped - they're hardcoded
**Symlink Behavior**:
- **Path validation**: ALWAYS resolves symlinks (security requirement, prevents attacks)
- **Tool traversal**: By default does NOT follow symlinks (performance), user can opt-in via `followSymlinks` option
**Flow**: `pathValidator.validate(path)` β `validateCommand(cmd, args)` β `validateExecutionContext(cwd)` β Execute
### `/src/scheme/` - Zod Validation Schemas
**Purpose**: Runtime type validation and schema definitions for tool inputs
**Key Files**:
- `baseSchema.ts` - Shared base query schema, bulk query wrapper factory
- `local_ripgrep.ts` - Ripgrep query schema, workflow modes
- `local_view_structure.ts` - Directory listing query schema
- `local_find_files.ts` - File search query schema
- `local_fetch_content.ts` - Content reading query schema
**Pattern**: Each schema exports `{QueryName}Schema` (single query) and `Bulk{QueryName}Schema` (array wrapper)
**Invariant**: ALL tool inputs MUST be validated via Zod schemas - invalid inputs throw parse errors
**Usage**: `const parsed = BulkRipgrepQuerySchema.parse(args);` - throws if validation fails
### `/src/utils/` - Cross-Cutting Utilities
**Purpose**: Reusable utilities for bulk operations, caching, pagination, memory, tokens
**Key Files**:
- `bulkOperations.ts` - **CRITICAL**: Parallel query processing, error isolation, response formatting
- `exec.ts` - Safe command execution with timeout, output limits, memory tracking
- `memoryManager.ts` - Global memory reservation system (100MB total, 10MB per operation)
- `pagination.ts` - Universal pagination for files, entries, matches
- `cache.ts` - Command output caching with TTL (15 min default)
- `tokenValidation.ts` - Estimates tokens, enforces 25K MCP response limit
- `responses.ts` - Standardized response formatting with hints
- `errors.ts` - Custom error classes and error handling utilities
- `minifier.ts` - Code minification for token efficiency (removes whitespace)
- `performanceMetrics.ts` - Execution time tracking and metrics
- `toolHelpers.ts` - Common tool helper functions
- `promiseUtils.ts` - Promise utilities for parallel execution
**Critical Function**: `executeBulkOperation<TQuery, TResult>(queries, processor, config)` - ALL tools use this
**Flow**: Parse queries β Reserve memory β Execute in parallel β Format response β Release memory
**Invariant**: `executeBulkOperation()` ALWAYS catches errors and converts them to error results (never throws)
### `/src/types.ts` - Type Definitions
**Purpose**: Shared TypeScript interfaces and types
**Key Exports**: `ExecResult`, `ExecOptions`, `ValidationResult`, query/result types for each tool
**Pattern**: Each tool has `{Tool}Query` and `{Tool}Result` interfaces
### `/src/prompts/` - MCP Prompts
**Purpose**: Pre-defined prompts for common workflows
**Key Files**:
- `architecture.ts` - Registers architecture documentation generation prompt
- `architecture.md` - The prompt template itself (system instructions)
**Pattern**: `registerArchitecturePrompt(server)` - prompts guide agents through complex workflows
### `/tests/` - Test Suite
**Purpose**: Comprehensive testing with focus on security and integration
**Structure**:
- `tests/tools/` - Tool functionality tests (ripgrep, find, ls, fetch)
- `tests/commands/` - Command builder tests
- `tests/security/` - **CRITICAL**: Security penetration tests, symlink attacks, race conditions
- `tests/integration/` - End-to-end tests with real file systems
- `tests/utils/` - Utility function tests
**Test Framework**: Vitest
**Run Tests**: `yarn test` (all), `yarn test:watch` (watch mode), `yarn test:coverage` (coverage report)
**Philosophy**: Security tests are most important - they verify protections against real attack vectors
## Architectural Invariants
1. **Security-First**: ALL file system operations MUST pass through 3 security layers (path β command β context)
2. **No Direct Execution**: Commands MUST be built via builders, validated, then executed via `safeExec()`
3. **Bulk-Only Interface**: ALL tools MUST accept query arrays and use `executeBulkOperation()` for consistency
4. **Token Bounded**: ALL responses MUST respect the 25K token MCP limit with automatic pagination
5. **Memory Bounded**: Global 100MB limit, 10MB per operation, enforced via memory manager reservations
6. **Symlink Resolution**: Path validation ALWAYS resolves symlinks for security (cannot be disabled)
7. **Error Isolation**: Failed queries in a bulk operation MUST NOT crash other queries
8. **Idempotent Validation**: Running validation twice on same input MUST produce same result
9. **Cache Transparency**: Caching is transparent - tools don't know if results are cached
10. **No Side Effects**: Tools MUST be read-only (no file writes, moves, deletes)
## System Boundaries & Layers
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Protocol Interface (External Boundary) β
β - StdioServerTransport (stdin/stdout communication) β
β - CallToolResult responses β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tool Layer (Public API) β
β - local_ripgrep, local_view_structure β
β - local_find_files, local_fetch_content β
β - Input: Zod-validated query objects β
β - Output: Standardized results with hints β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Bulk Operations & Orchestration (Internal) β
β - executeBulkOperation() - parallel processing β
β - Error isolation, response formatting β
β - Token validation, memory reservation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Command Builders (Internal) β
β - Type-safe command construction β
β - Argument escaping and formatting β
β - Output: {command, args} objects β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Security Validation Layer (Critical - Cannot Be Bypassed) β
β 1. pathValidator.validate(path) - traversal prevention β
β 2. validateCommand(cmd, args) - injection prevention β
β 3. validateExecutionContext(cwd) - context validation β
β 4. shouldIgnore(path) - sensitive file filtering β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Command Execution (Internal) β
β - safeExec() - spawn with timeout & output limits β
β - Memory tracking, error handling β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Native Commands (External) β
β - ripgrep (rg), find, ls, Node.js fs β
β - File system operations β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
**Rules**:
- Downward dependencies only - upper layers can call lower layers, never reverse
- Security layer CANNOT be bypassed - all execution flows through it
- MCP interface is the only external boundary - everything else is internal
- Tools are stateless - each request is independent
## Key Abstractions & Types
### Core Abstractions
- **`BaseCommandBuilder`** (`src/commands/BaseCommandBuilder.ts:13`) - Abstract command builder with type-safe argument handling
- **`PathValidator`** (`src/security/pathValidator.ts:55`) - Centralized path security enforcer
- **`MemoryManager`** (`src/utils/memoryManager.ts:42`) - Global memory reservation system (singleton)
- **`BulkOperation`** (`src/utils/bulkOperations.ts:77`) - Parallel query processor with error isolation
### Type Contracts
- **`ValidationResult`** (`src/types.ts:30`) - Standard return type for all validators: `{isValid, sanitizedPath?, error?}`
- **`ExecResult`** (`src/types.ts:8`) - Standard command execution result: `{code, stdout, stderr, success}`
- **`BaseResult`** - All tool results extend this: `{status: 'hasResults' | 'empty' | 'error', error?, hints?}`
### Query/Result Pattern
Each tool follows this pattern:
```typescript
// Query: Input validated by Zod schema
interface {Tool}Query extends BaseQuery {
path: string; // Required for most tools
researchGoal?: string; // Helps track research sessions
reasoning?: string; // Why this query matters
// ... tool-specific options
}
// Result: Output with status indicator
interface {Tool}Result extends BaseResult {
status: 'hasResults' | 'empty' | 'error';
// ... tool-specific data
hints?: string[]; // Next-step recommendations
}
```
**Example**: `RipgrepQuery` β `executeBulkOperation()` β `SearchContentResult[]`
## Cross-Cutting Concerns
### Testing
**Framework**: Vitest 3.2.4
**Location**: `tests/` directory mirroring `src/` structure
**Run**: `yarn test` (all), `yarn test:watch` (watch), `yarn test:coverage` (coverage)
**Philosophy**: Security tests are paramount - they verify protections against real attack vectors (symlink attacks, path traversal, command injection, race conditions)
**Coverage Areas**:
- Unit tests for all tools, builders, utilities
- Integration tests with real file system operations
- Security penetration tests for all attack vectors
- Performance tests for large directories
### Error Handling
**Strategy**: Fail-safe with error isolation in bulk operations
**Pattern**:
1. Validation errors throw immediately (input validation via Zod)
2. Execution errors are caught and converted to error results
3. Bulk operations never throw - they return error results for failed queries
4. Security violations always reject (path traversal, command injection)
**Invariant**: A failed query MUST NOT crash other queries in the same bulk operation
**Custom Errors**: `src/utils/errors.ts` - Domain-specific error classes
### Configuration
**Files**:
- `src/constants.ts` - All limits, defaults, resource constraints
- `manifest.json` - MCP server metadata
- `package.json` - Dependencies, scripts, Node version requirement (>=18.0.0)
- Environment: `WORKSPACE_ROOT` - Sets allowed root directory (defaults to `process.cwd()`)
**Key Limits** (from `RESOURCE_LIMITS`):
- MCP response: 25,000 tokens max
- Command timeout: 30 seconds
- Output size: 10MB per command
- Memory: 100MB global, 10MB per operation
- Large file threshold: 100KB (requires pagination)
- Cache TTL: 15 minutes
### Performance
**Critical Optimizations**:
1. **Bulk Operations**: Parallel query processing with `Promise.all()`
2. **Pagination**: Automatic for large results (files, entries, matches)
3. **Caching**: Command output cached with TTL (15 min) - `node-cache`
4. **Discovery Mode**: `filesOnly=true` for ripgrep (25x faster, no match content)
5. **Minification**: Removes whitespace from code (30-60% token savings)
6. **Memory Reservations**: Prevents OOM by tracking and limiting concurrent operations
7. **Streaming**: Large output streams are chunked to avoid buffering entire results
**Hot Paths**:
- `executeBulkOperation()` - Used by all tools, optimized for parallel execution
- `pathValidator.validate()` - Called for every file system operation, uses caching
- `safeExec()` - Command execution with timeout and output limits
### Observability
**Logging**: Currently minimal (process exits on uncaught errors)
**Metrics**: `performanceMetrics.ts` tracks execution times
**Debug**: `yarn debug` - Runs MCP inspector for debugging tool calls
**Token Tracking**: `tokenValidation.ts` estimates and warns about high token usage
**Memory Tracking**: `memoryManager.getCurrentUsage()` - Real-time memory usage stats
**Debug Tips**:
- Set `NODE_ENV=development` for verbose error messages
- Use `yarn test:ui` for visual test debugging
- Check command builder output with `.getArgs()` method
- Use MCP inspector to see raw request/response: `yarn debug`
## Dependencies & Build
### Runtime Dependencies
- **`@modelcontextprotocol/sdk`** (^1.18.1) - MCP protocol implementation
- **`zod`** (^3.25.26) - Runtime schema validation for all tool inputs
- **`node-cache`** (^5.1.2) - In-memory caching with TTL
- **`octocode-utils`** (^5.0.0) - Shared utilities from octocode-mcp
### External System Dependencies
- **`ripgrep`** - Required for `local_ripgrep` tool, must be in PATH
- **`find`** - Unix find command (available on all *nix systems)
- **`ls`** - Unix ls command (though mostly using Node.js `fs` now)
### Build System
**Bundler**: Rollup 4.46.2
**Config**: `rollup.config.js` - Bundles to single `dist/index.js` with CommonJS plugins
**TypeScript**: 5.9.2 with strict mode
**Target**: Node.js >=18.0.0 (ES modules)
**Build Commands**:
```bash
yarn build # Lint + clean + build
yarn build:dev # Build without linting
yarn build:watch # Watch mode for development
yarn clean # Remove dist/
```
**Output**: Single bundled file `dist/index.js` (ES module with shebang for CLI execution)
## Architectural Decision Records (ADRs)
### Security Layer is Mandatory
**Status**: Accepted
**Context**: File system operations can expose sensitive data or enable attacks
**Decision**: ALL operations pass through 3 security layers (path, command, context) that CANNOT be bypassed
**Consequences**:
- β
Strong security guarantees against known attack vectors
- β
Symlinks always resolved (prevents traversal)
- β
Sensitive files automatically blocked (.env, credentials, etc.)
- β οΈ Performance overhead (~1-2ms per validation)
- β οΈ Some legitimate use cases may be blocked (edge cases)
### Bulk-Only Tool Interface
**Status**: Accepted
**Context**: AI agents benefit from sending multiple queries in one request
**Decision**: ALL tools accept arrays of queries and use `executeBulkOperation()`
**Consequences**:
- β
Parallel execution for better performance
- β
Error isolation (one failed query doesn't crash others)
- β
Consistent response format across all tools
- β οΈ Single queries still wrapped in arrays (slight overhead)
### Token-Based Pagination
**Status**: Accepted
**Context**: MCP protocol has 25K token response limit, large results must be split
**Decision**: Automatic pagination based on estimated token count, not just item count
**Consequences**:
- β
Prevents exceeding MCP limits (avoids protocol errors)
- β
Optimizes token usage for AI context windows
- β οΈ Token estimation is approximate (uses 4 chars/token heuristic)
- β οΈ Pagination API exposed to users (adds complexity)
### Memory Reservation System
**Status**: Accepted
**Context**: Concurrent operations can cause OOM crashes
**Decision**: Global memory manager tracks reservations, rejects operations when limit exceeded
**Consequences**:
- β
Prevents OOM crashes under heavy load
- β
Fair resource allocation across concurrent requests
- β οΈ Operations can be rejected if memory is exhausted
- β οΈ Requires manual `reserve()`/`release()` pattern (not automatic)
### Command Builders Over String Templates
**Status**: Accepted
**Context**: String concatenation for shell commands is injection-prone
**Decision**: Type-safe command builder classes that auto-escape arguments
**Consequences**:
- β
Strong typing prevents many bugs at compile time
- β
Automatic escaping reduces injection risk
- β
Easier to test (can inspect args before execution)
- β οΈ More verbose than string templates
- β οΈ Each command needs its own builder class
## Code Style & Patterns
### Naming Conventions
- **Interfaces**: PascalCase with descriptive nouns (`PathValidator`, `ExecResult`)
- **Functions**: camelCase with verb prefixes (`validatePath`, `executeCommand`)
- **Constants**: SCREAMING_SNAKE_CASE (`TOOL_NAMES`, `RESOURCE_LIMITS`)
- **Files**: camelCase matching primary export (`pathValidator.ts`, `bulkOperations.ts`)
- **Types**: PascalCase, often with suffix (`ValidationResult`, `RipgrepQuery`)
### Common Patterns
**Validation Pattern**:
```typescript
const validation = validator.validate(input);
if (!validation.isValid) {
throw new Error(validation.error);
}
// Use validation.sanitizedPath
```
**Builder Pattern**:
```typescript
const {command, args} = new RipgrepCommandBuilder()
.addPattern(pattern)
.addPath(path)
.addFlag('--json')
.build();
```
**Bulk Operation Pattern**:
```typescript
return executeBulkOperation(
queries,
processQuery,
{toolName: TOOL_NAMES.LOCAL_RIPGREP}
);
```
**Memory Management Pattern**:
```typescript
const id = memoryManager.reserve(size, toolName);
if (!id) throw new Error('Insufficient memory');
try {
// ... operation
} finally {
memoryManager.release(id);
}
```
### Avoid
- **String concatenation for commands** - Use builders
- **Synchronous fs operations** - Use async/await with promises
- **Throwing in bulk processors** - Return error results instead
- **Bypassing security validation** - All paths MUST be validated
- **Unbounded operations** - Always set timeout and size limits
- **Mutation of shared state** - Keep operations pure and stateless
## For Contributors
### To Fix a Bug
1. **Identify the layer**: Tool β Builder β Security β Execution?
2. **Write a failing test** in appropriate `tests/` subdirectory
3. **Fix the code** in `src/`
4. **Verify security**: If touching paths/commands, run `yarn test tests/security/`
5. **Check linting**: `yarn lint` (auto-fix with `yarn lint:fix`)
### To Add a Feature
1. **For new tool**: Copy pattern from `src/tools/local_ripgrep.ts`
- Create schema in `src/scheme/`
- Implement tool in `src/tools/`
- Register in `src/tools/toolsManager.ts`
- Add tests in `tests/tools/`
2. **For new option**: Add to Zod schema first, then implement
3. **For new command**: Create new builder extending `BaseCommandBuilder`
### To Find Something
- **Tool implementation**: `src/tools/local_{toolname}.ts`
- **Input validation**: `src/scheme/local_{toolname}.ts`
- **Command building**: `src/commands/{Command}Builder.ts`
- **Security logic**: `src/security/{validator|filter}.ts`
- **Response formatting**: `src/utils/responses.ts`
- **Resource limits**: `src/constants.ts β RESOURCE_LIMITS`
### To Add a Security Feature
1. **Identify attack vector**: What are you preventing?
2. **Add to security layer**: `src/security/` - path/command/context validator
3. **Write penetration test**: `tests/security/` - Verify the attack is blocked
4. **Document invariant**: Update this file's "Architectural Invariants"
5. See `tests/security/` for examples: symlink attacks, path traversal, race conditions
## API Stability
**Public (Stable)**:
- MCP tool interfaces (local_ripgrep, local_view_structure, local_find_files, local_fetch_content)
- Tool query/result types and Zod schemas
- Constants in `src/constants.ts` (especially `RESOURCE_LIMITS`)
**Internal (May Change)**:
- Command builder implementations
- Utility function signatures
- Internal type definitions
- Response formatting details (though structure stays consistent)
**Experimental**:
- Prompt templates in `src/prompts/`
- Hints and workflow recommendations
- Token estimation heuristics
**Breaking Changes Policy**: Major version bump for public API changes, minor version for internal refactoring
---
**Updated**: November 4, 2025
**By**: Octocode Team
**Questions**: Open an issue on GitHub or check existing docs in `README.md`
**Related Docs**: See `README.md` for user guide, `RIPGREP_IMPROVEMENTS.md` for optimization notes