MCP File Operations Server

summary.txt•7.59 kB

# Technical Summary: MCP File Operations Server ## **Project Overview** A Model Context Protocol (MCP) server implementation that provides Claude Desktop AI with secure file system access capabilities, enabling AI-powered document management, PDF processing, and file operations within a sandboxed environment. ## **Architecture & Design** ### **Core Components** - **FastMCP Server**: Built using the `mcp.server.FastMCP` class for streamlined MCP protocol implementation - **Async Architecture**: Fully asynchronous design using Python's `asyncio` for concurrent request handling - **Modular Tool System**: 8 distinct tools registered with the MCP server for different file operations ### **Server Structure** ```python class FileMCPServer: def __init__(self): self.server = FastMCP("file-operations") self.setup_tools() ``` ## **Implemented Tools & Functionality** ### **1. Basic File Operations** - **`read_file`**: UTF-8 encoded file reading with path validation - **`write_file`**: File creation/overwrite with append mode support - **`list_files`**: Directory listing with glob pattern matching and recursive search - **`delete_file`**: Single file deletion with existence validation ### **2. Directory Management** - **`create_directory`**: Nested directory creation with `mkdir(parents=True, exist_ok=True)` - **`delete_directory`**: Recursive directory deletion with Windows-specific retry logic ### **3. PDF Processing Engine** - **`read_pdf`**: Text extraction using `pdfplumber` with page-by-page processing - **`get_pdf_info`**: Metadata extraction using `PyPDF2` and `pdfplumber` for file size, page count, title, author, dates ## **Technical Implementation Details** ### **Path Validation & Security** ```python def _validate_path(self, file_path: str) -> pathlib.Path: full_path = (DOCUMENTS_DIR / file_path).resolve() documents_abs = DOCUMENTS_DIR.resolve() if not str(full_path).startswith(str(documents_abs)): raise ValueError(f"Path {file_path} is outside the allowed documents directory") return full_path ``` **Security Features:** - Directory traversal protection via path resolution - Sandboxed access limited to `documents/` folder - Critical directory protection (prevents deletion of root documents folder) - File type validation for PDF operations ### **Error Handling & Resilience** - **Retry Logic**: Exponential backoff for Windows file locking issues - **File Lock Detection**: Identifies files in use by other processes - **Comprehensive Logging**: Dual output to file and console with timestamps - **Graceful Degradation**: Server continues operating despite individual operation failures ### **PDF Processing Implementation** ```python # Text extraction with pdfplumber with pdfplumber.open(full_path) as pdf: for page_num, page in enumerate(pdf.pages, 1): page_text = page.extract_text() if page_text: extracted_text.append(f"--- Page {page_num} ---\n{page_text}") ``` **PDF Capabilities:** - Multi-page text extraction with page numbering - Metadata parsing (title, author, creation/modification dates) - File size and page count information ## **Testing & Quality Assurance** ### **Test Coverage** - **41% code coverage** using `pytest` and `pytest-asyncio` - **10 test cases** covering core functionality of all 8 tools - **Path validation testing** with malicious input scenarios - **Async operation testing** for concurrent request handling - **Error condition testing** for file locks, permissions, and invalid inputs - **Areas for improvement**: Additional edge case testing, error handling scenarios, and PDF processing edge cases ### **Test Structure** ```python class TestFileMCPServer: def test_validate_path_within_documents(self, server, tmp_path) def test_validate_path_outside_documents(self, server, tmp_path) @pytest.mark.asyncio class TestAsyncFileOperations: async def test_write_and_read_file(self, server, tmp_path) async def test_pdf_operations(self, server, tmp_path) ``` ## **Dependencies & Environment** ### **Core Dependencies** - **`mcp>=1.0.0`**: MCP protocol implementation - **`pydantic>=2.0.0`**: Data validation and serialization - **`PyPDF2>=3.0.1`**: PDF metadata extraction - **`pdfplumber>=0.11.7`**: PDF text extraction ### **Development Tools** - **`pytest`**: Testing framework - **`pytest-asyncio`**: Async testing support - **`pytest-cov`**: Coverage reporting - **`black`**: Code formatting - **`isort`**: Import sorting - **`flake8`**: Linting ## **Integration & Deployment** ### **Claude Desktop Integration** - **JSON-RPC Protocol**: Standard MCP communication - **Configuration**: Direct Python execution via Claude Desktop config - **Working Directory**: Proper path resolution for cross-platform compatibility ### **Logging System** ```python logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", handlers=[ logging.FileHandler(log_filename, encoding="utf-8"), logging.StreamHandler(), ], ) ``` **Log Features:** - Timestamped entries with log levels - Dual output (file + console) - Rotating log files with date stamps - Detailed error tracking and debugging information ## **Performance Characteristics** ### **Concurrent Operations** - Async/await pattern for non-blocking I/O - Support for multiple simultaneous file operations - Efficient path resolution and validation ### **Memory Management** - Context managers for proper file handle cleanup - Streaming PDF processing to handle large files - Efficient string operations for text processing ## **Cross-Platform Compatibility** ### **Windows-Specific Features** - Retry logic for Windows file locking issues - Path normalization for Windows backslash handling - Permission error handling for Windows security restrictions ### **Unix/Linux Support** - Forward slash path handling - Unix-style file permissions - Cross-platform pathlib usage ## **Extensibility & Future Enhancements** ### **Modular Design** - Clear separation between tool registration and implementation - Easy addition of new file type processors - Pluggable architecture for additional AI platforms ### **Potential Enhancements** - Image processing capabilities (OCR, image metadata) - Document format conversion (DOCX, RTF, etc.) - Advanced search algorithms (fuzzy matching, semantic search) - Batch processing capabilities - Real-time file monitoring and notifications ## **Technical Challenges Solved** 1. **Windows File Locking**: Implemented retry logic with exponential backoff 2. **Path Security**: Created robust path validation preventing directory traversal 3. **PDF Processing**: Integrated multiple PDF libraries for comprehensive text extraction 4. **Async Operations**: Designed concurrent file operations without blocking 5. **Error Recovery**: Built graceful error handling for various failure scenarios 6. **Cross-Platform Compatibility**: Ensured consistent behavior across Windows and Unix systems ## **Recent Changes** - **Removed PDF Search Functionality**: The `search_pdf` tool was removed due to implementation issues - **Updated Tool Count**: Reduced from 9 to 8 tools - **Improved Test Coverage**: Coverage increased from 39% to 41% after removing problematic code - **Streamlined PDF Processing**: Focused on core PDF reading and metadata extraction This project demonstrates advanced Python development skills, including async programming, file system operations, PDF processing, API design, testing, and integration with AI platforms.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/prithvi-seri/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server