Uses python-dotenv for environment management, allowing users to configure server settings through environment variables for security, encoding, and AI embeddings.
Leverages LangChain for AI integration, vector search, and semantic understanding capabilities to enable intelligent document search across multiple file formats.
Automatically converts various document formats to Markdown for consistent representation and includes support for reading and processing Markdown files.
Uses OpenAI's embedding service for generating vector representations of documents, enabling semantic search across files with configurable API endpoints.
Requires Python 3.12+ as the runtime environment, with specific installation instructions for setting up the server with Python dependencies.
FS-MCP: Universal File Reader & Intelligent Search MCP Server
A powerful MCP (Model Context Protocol) server that provides intelligent file reading and semantic search capabilities
English
🚀 Features
- 🧠 Intelligent Text Detection: Automatically identifies text files without relying on file extensions
- 📄 Multi-Format Support: Handles text files and document formats (Word, Excel, PDF, etc.)
- 🔒 Security First: Restricted access to configured safe directories only
- 📏 Range Reading: Supports reading specific line ranges for large files
- 🔄 Document Conversion: Automatic conversion of documents to Markdown with caching
- 🔍 Vector Search: Semantic search powered by AI embeddings
- ⚡ High Performance: Batch processing and intelligent caching support
- 🌐 Multi-language: Supports both English and Chinese content
📋 Table of Contents
- Quick Start
- Installation
- Configuration
- MCP Tools
- Vector Search
- Supported Formats
- Security Features
- Integration
- Development
- Contributing
- License
🚀 Quick Start
1. Clone and Install
Using uv (Recommended):
Using pip:
2. Environment Configuration
Create a .env
file in the project root:
3. Start the Server
The server will start on http://localhost:3002
and automatically build the vector index.
🛠️ Installation
System Requirements
- Python: 3.12 or higher
- OS: Windows, macOS, Linux
- Memory: 4GB+ recommended for vector search
- Storage: 1GB+ for caching and indexes
Dependencies
Core dependencies are managed in pyproject.toml
:
fastmcp>=2.0.0
- MCP server frameworklangchain>=0.3.0
- AI and vector searchpython-dotenv>=1.1.0
- Environment management- Document processing libraries (pandas, openpyxl, python-docx, etc.)
⚙️ Configuration
Environment Variables
Variable | Default | Description |
---|---|---|
SAFE_DIRECTORY | . | Root directory for file access |
MAX_FILE_SIZE_MB | 100 | Maximum file size limit |
DEFAULT_ENCODING | utf-8 | Default file encoding |
OPENAI_EMBEDDINGS_API_KEY | - | API key for embedding service |
OPENAI_EMBEDDINGS_BASE_URL | - | Embedding service URL |
EMBEDDING_MODEL_NAME | BAAI/bge-m3 | AI model for embeddings |
EMBEDDING_CHUNK_SIZE | 1000 | Text chunk size for processing |
Advanced Configuration
For production deployments, consider:
- Setting up rate limiting
- Configuring log rotation
- Using external vector databases
- Setting up monitoring
🔧 MCP Tools
1. view_directory_tree
Purpose: Display directory structure in tree format
2. read_file_content
Purpose: Read file content with line range support
3. search_documents
Purpose: Intelligent semantic search across documents
4. rebuild_document_index
Purpose: Rebuild vector index for search
5. get_document_stats
Purpose: Get index statistics and system status
6. list_files
Purpose: List files in directory with pattern matching
7. preview_file
Purpose: Quick preview of file content
🔍 Vector Search
Capabilities
- Semantic Understanding: Search "user authentication" finds "login verification" code
- Synonym Recognition: Search "database" finds "数据库" (Chinese) content
- Multi-language Support: Handles English, Chinese, and mixed content
- Context Awareness: Understands code semantics and relationships
Search Types
- Semantic Search (
semantic
): AI-powered understanding - Filename Search (
filename
): Fast filename matching - Extension Search (
extension
): Filter by file type - Hybrid Search (
hybrid
): Combines semantic + filename
Technical Stack
- Embedding Model: BAAI/bge-m3 (1024-dimensional vectors)
- Vector Database: ChromaDB
- Text Splitting: Intelligent semantic chunking
- Incremental Updates: Hash-based change detection
📁 Supported Formats
Auto-detected Text Files
- Programming languages:
.py
,.js
,.ts
,.java
,.cpp
,.c
,.go
,.rs
, etc. - Config files:
.json
,.yaml
,.toml
,.ini
,.xml
,.env
- Documentation:
.md
,.txt
,.rst
- Web files:
.html
,.css
,.scss
- Data files:
.csv
,.tsv
- Files without extensions (auto-detected)
Document Formats (Auto-converted to Markdown)
- Microsoft Office:
.docx
,.xlsx
,.pptx
- OpenDocument:
.odt
,.ods
,.odp
- PDF:
.pdf
(text extraction) - Legacy formats:
.doc
,.xls
(limited support)
🔒 Security Features
Access Control
- Directory Restriction: Access limited to
SAFE_DIRECTORY
and subdirectories - Path Traversal Protection: Automatic prevention of
../
attacks - Symlink Control: Configurable symbolic link access
- File Size Limits: Prevents reading oversized files
Validation
- Path Sanitization: Automatic path cleaning and validation
- Permission Checks: Verify read permissions before access
- Error Handling: Graceful failure with informative messages
🔗 Integration
Claude Desktop
Add to your Claude Desktop MCP configuration:
Other MCP Clients
Connect to http://localhost:3002
using Server-Sent Events (SSE) protocol.
API Integration
The server exposes standard MCP endpoints that can be integrated with any MCP-compatible client.
🏗️ Project Structure
💻 Development
Setting Up Development Environment
Running Tests
Code Quality
Debugging
Monitor logs in real-time:
🤝 Contributing
We welcome contributions! Here's how to get started:
1. Fork and Clone
2. Create Feature Branch
3. Make Changes
- Follow the existing code style
- Add tests for new functionality
- Update documentation as needed
4. Test Your Changes
5. Submit Pull Request
- Describe your changes clearly
- Reference any related issues
- Ensure all tests pass
Development Guidelines
- Code Style: Follow PEP 8, use Black for formatting
- Testing: Maintain test coverage above 80%
- Documentation: Update README and docstrings
- Commits: Use conventional commit messages
- Security: Follow security best practices
📋 Roadmap
- Enhanced PDF Processing: Better table and image extraction
- More Embedding Models: Support for local models
- Real-time Indexing: File system watchers
- Advanced Search: Regex, proximity, faceted search
- Performance Optimization: Async processing, caching improvements
- Web Interface: Optional web UI for management
- Plugin System: Custom file type handlers
- Enterprise Features: Authentication, rate limiting, monitoring
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- FastMCP - MCP server framework
- LangChain - AI integration
- ChromaDB - Vector database
- BGE-M3 - Embedding model
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Check the
docs/
folder (when available)
中文
🚀 功能特点
- 🧠 智能文本检测: 无需依赖扩展名,自动识别文本文件
- 📄 多格式支持: 支持文本文件和文档格式(Word、Excel、PDF等)
- 🔒 安全验证: 只允许读取配置的安全目录中的文件
- 📏 按行读取: 支持指定行范围读取,便于处理大文件
- 🔄 文档转换: 自动将文档格式转换为Markdown并缓存
- 🔍 向量搜索: 基于AI嵌入的语义搜索
- ⚡ 高性能: 支持批量文件处理和智能缓存
- 🌐 多语言: 支持中英文内容处理
🚀 快速开始
1. 克隆和安装
2. 环境配置
创建 .env
文件:
3. 启动服务器
服务器将在 http://localhost:3002
启动并自动建立向量索引。
🛠️ MCP工具说明
详细的工具使用方法请参考英文部分的 MCP Tools 章节。
🔍 向量搜索功能
- 概念匹配:搜索"用户认证"能找到"登录验证"相关代码
- 同义词理解:搜索"database"能找到"数据库"相关内容
- 多语言支持:同时理解中英文代码和注释
- 上下文理解:理解代码的语义和上下文关系
📁 支持的文件格式
详细的格式支持请参考英文部分的 Supported Formats 章节。
🔒 安全特性
- 路径验证: 只允许访问配置的安全目录及其子目录
- 文件大小限制: 防止读取过大文件
- 路径遍历防护: 自动防止
../
等路径遍历攻击 - 符号链接控制: 可配置是否允许访问符号链接
🔗 集成方式
Claude Desktop集成
在 Claude Desktop 的 MCP 配置中添加:
💻 开发
开发环境设置
运行测试
🤝 贡献
欢迎贡献代码!请参考英文部分的 Contributing 章节了解详细信息。
📄 许可证
本项目采用 MIT 许可证 - 详见 LICENSE 文件。
Made with ❤️ for the AI community
This server cannot be installed
A Model Context Protocol server that provides intelligent file reading and semantic search capabilities across multiple document formats with security-first access controls.
Related MCP Servers
- -securityAlicense-qualityA Model Context Protocol server that enables LLMs to read, search, and analyze code files with advanced caching and real-time file watching capabilities.Last updated -458JavaScriptMIT License
- -securityAlicense-qualityA Model Context Protocol server that provides file system operations, analysis, and manipulation capabilities through a standardized tool interface.Last updated -1TypeScriptMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server that enables enhanced file system operations including reading, writing, copying, moving files with streaming capabilities, directory management, file watching, and change tracking.Last updated -126TypeScriptMIT License
- -securityAlicense-qualityA Model Context Protocol server that provides secure and intelligent interaction with files and filesystems, offering smart context management and token-efficient operations for working with large files and complex directory structures.Last updated -5PythonMIT License