README.md•8.62 kB
# IntelliDiff MCP Server
An intelligent file and folder comparison MCP server with advanced text normalization and duplicate detection capabilities.
## Features
- **File Comparison**: CRC32-based exact comparison and smart text comparison with normalization
- **Folder Comparison**: Recursive directory comparison with orphan detection
- **Duplicate Detection**: Find identical files within directories
- **Text Normalization**: Handle case, whitespace, tabs, line endings, and Unicode differences
- **Line-Level Analysis**: Detailed diff output with line ranges and targeted file reading
- **Clean Output**: Markdown-formatted text responses instead of JSON bloat
- **Security**: Workspace root validation prevents path traversal attacks
- **Performance**: Streaming for large files, configurable limits, symlink loop prevention
## Installation
```bash
# Clone or download the project
cd intellidiff-mcp
# Install with uv
uv init --python 3.12
uv add "fastmcp>=2.11"
# Run the server
uv run python intellidiff_server.py /path/to/workspace/root
```
## Project Structure
The server is built with a clean modular architecture:
- **`intellidiff_server.py`** (52 lines) - Main server entry point and tool registration
- **`workspace_security.py`** - Path validation and workspace boundary enforcement
- **`file_operations.py`** - Core file utilities (CRC32, text detection, normalization)
- **`tools.py`** - Individual MCP tool implementations
- **`folder_operations.py`** - Folder comparison and duplicate detection logic
## MCP Configuration
### Local/stdio Configuration
```json
{
"mcpServers": {
"intellidiff": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/intellidiff-mcp", "python", "intellidiff_server.py", "/workspace/root"]
}
}
}
```
### Local/stdio Configuration with Environment Variables
```json
{
"mcpServers": {
"intellidiff": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/intellidiff-mcp", "python", "intellidiff_server.py", "/workspace/root"],
"env": {
"INTELLIDIFF_MAX_TEXT_SIZE": "5242880",
"INTELLIDIFF_MAX_BINARY_SIZE": "1073741824",
"INTELLIDIFF_MAX_DEPTH": "15",
"INTELLIDIFF_CHUNK_SIZE": "32768"
}
}
}
}
```
### Remote/HTTP Configuration
```json
{
"mcpServers": {
"intellidiff": {
"type": "http",
"url": "http://localhost:8000/mcp/"
}
}
}
```
Place this configuration in:
- VS Code: `.vscode/mcp.json` (project) or user settings
- Claude Desktop: `claude_desktop_config.json`
- Cursor: `.cursor/mcp.json` (project) or `~/.cursor/mcp.json` (user)
- LM Studio: `~/.lmstudio/mcp.json`
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `INTELLIDIFF_MAX_TEXT_SIZE` | 10485760 (10MB) | Maximum size for text file comparison |
| `INTELLIDIFF_MAX_BINARY_SIZE` | 1073741824 (1GB) | Maximum size for binary file CRC32 |
| `INTELLIDIFF_MAX_DEPTH` | 10 | Maximum directory recursion depth |
| `INTELLIDIFF_CHUNK_SIZE` | 65536 (64KB) | File reading chunk size |
## Tools
### `validate_workspace_path`
Validate that a path is within the workspace root.
**Parameters:**
- `path` (string): Path to validate
### `get_file_hash`
Get CRC32 hash and basic information about a file.
**Parameters:**
- `file_path` (string): Path to the file
### `compare_files`
Compare two files with various modes and options.
**Parameters:**
- `left_path` (string): Path to first file
- `right_path` (string): Path to second file
- `mode` (string): Comparison mode - "exact", "smart_text", or "binary"
- `ignore_blank_lines` (boolean): Skip empty lines during comparison
- `ignore_newline_differences` (boolean): Normalize line endings
- `ignore_whitespace` (boolean): Ignore leading/trailing whitespace
- `ignore_case` (boolean): Case-insensitive comparison
- `normalize_tabs` (boolean): Convert tabs to spaces
- `unicode_normalize` (boolean): Apply Unicode NFKC normalization
### `compare_folders`
Compare two folder structures recursively.
**Parameters:**
- `left_path` (string): Path to first folder
- `right_path` (string): Path to second folder
- `max_depth` (integer): Maximum recursion depth (default: from env var)
- `include_binary` (boolean): Include binary files in comparison
- `comparison_mode` (string): "exact" or "smart_text"
### `find_identical_files`
Find files with identical content within a folder.
**Parameters:**
- `folder_path` (string): Path to folder to scan
- `max_depth` (integer): Maximum recursion depth (default: from env var)
### `read_file_lines`
Read specific line ranges from a text file with optional context.
**Parameters:**
- `file_path` (string): Path to the text file
- `start_line` (integer): Starting line number (1-based, default: 1)
- `end_line` (integer): Ending line number (1-based, default: end of file)
- `context_lines` (integer): Additional context lines before/after range (default: 0)
## Usage Examples
### Compare Two Files
```python
# Exact comparison - clean markdown output
result = await client.call_tool("compare_files", {
"left_path": "file1.txt",
"right_path": "file2.txt",
"mode": "exact"
})
print(result.content[0].text)
# Output: ✅ **Exact Comparison**
# 📁 Left: file1.txt (CRC32: abc123)
# 📁 Right: file2.txt (CRC32: abc123)
# 🔍 Result: Identical
# Smart text comparison with normalization
result = await client.call_tool("compare_files", {
"left_path": "file1.txt",
"right_path": "file2.txt",
"mode": "smart_text",
"ignore_case": True,
"ignore_whitespace": True,
"normalize_tabs": True
})
print(result.content[0].text)
# Output: ✅ **Smart Text Comparison - Identical**
# 📁 Left: file1.txt (1.2KB)
# 📁 Right: file2.txt (1.3KB)
# 🔍 Result: Identical (normalized: case, whitespace, tabs)
```
### Compare Folders
```python
result = await client.call_tool("compare_folders", {
"left_path": "folder_a",
"right_path": "folder_b",
"max_depth": 5
})
# Folder comparison returns structured data for programmatic access
summary = result.data["summary"]
orphans = result.data["orphans"]
identical_files = result.data["identical_files"]
```
### Find Duplicates
```python
result = await client.call_tool("find_identical_files", {
"folder_path": "my_folder",
"max_depth": 10
})
# Duplicate detection returns structured data for analysis
duplicates = result.data["duplicates"]
wasted_bytes = result.data["summary"]["total_wasted_bytes"]
```
### Read Specific Lines
```python
# Read lines 10-20 with 2 lines of context
result = await client.call_tool("read_file_lines", {
"file_path": "my_file.txt",
"start_line": 10,
"end_line": 20,
"context_lines": 2
})
# Clean line-numbered output with >>> markers for requested range
print(result.content[0].text)
# Output: 8| function setup() {
# 9| console.log("Starting...");
# >>> 10| const data = loadData();
# >>> 11| if (!data) {
# >>> 12| throw new Error("No data");
# >>> 13| }
# 14| }
```
### Working with Diff Results
```python
# Compare files and get detailed diff information
result = await client.call_tool("compare_files", {
"left_path": "file1.txt",
"right_path": "file2.txt",
"mode": "smart_text"
})
# Access structured diff data
if not result.structured_content["identical"]:
change_summary = result.structured_content["change_summary"]
# Get affected line ranges
left_ranges = change_summary["line_ranges"]["left_affected"]
right_ranges = change_summary["line_ranges"]["right_affected"]
# Read specific sections that changed
for range_info in left_ranges:
lines_result = await client.call_tool("read_file_lines", {
"file_path": "file1.txt",
"start_line": range_info["start"],
"end_line": range_info["end"],
"context_lines": 3
})
print(f"Changed section: {lines_result.content[0].text}")
```
## Security
- All file paths are validated against the workspace root
- Path traversal attacks are prevented through path resolution
- Symlink loops are detected and avoided
- File size limits prevent memory exhaustion
- Read-only operations only
## Performance
- Streaming I/O for large files
- Early exit on size mismatches
- CRC32 caching for repeated operations
- Configurable chunk sizes and limits
- Progress reporting for large operations
## License
MIT License - see LICENSE file for details.