README.mdโข16.9 kB
# AnyDocs MCP Server
[](https://python.org)
[](https://github.com/modelcontextprotocol/python-sdk)
[](LICENSE)
[](https://github.com/psf/black)
> Transform any website's documentation section into an MCP-compatible server using the Python MCP SDK.
## ๐ Overview
AnyDocs MCP Server is a comprehensive solution that turns **any website's documentation** into an interactive, AI-accessible knowledge base through the Model Context Protocol (MCP). It can scrape, index, and serve documentation from any website - from modern API docs to legacy documentation portals.
### Key Features
- **๐ Universal Website Scraping**: Turn ANY website's documentation into an interactive knowledge base
- **๐ Universal Adapter System**: Support for GitBook, Notion, Confluence, and custom documentation platforms
- **๐ Advanced Search**: Full-text search with SQLite FTS and semantic search capabilities
- **๐ Robust Authentication**: API Key, OAuth2, and JWT-based authentication
- **โก High Performance**: Async/await architecture with caching and rate limiting
- **๐๏ธ Web Management Interface**: FastAPI-based admin panel for configuration and monitoring
- **๐ Real-time Monitoring**: Health checks, metrics, and logging
- **๐ณ Docker Ready**: Complete containerization with development and production configurations
- **๐ Auto-sync**: Automatic content synchronization with source documentation
## ๐ Requirements
- Python 3.11+ (recommended: 3.11 or 3.12)
- SQLite 3.35+ (for FTS5 support)
- Optional: Redis (for caching)
- Optional: PostgreSQL/MySQL (for production)
## ๐ ๏ธ Installation
### Using uvx (Recommended)
The easiest way to run AnyDocs MCP Server is using `uvx`, which automatically manages dependencies and virtual environments:
```bash
# Run directly with uvx (no installation needed)
uvx anydocs-mcp-server
# Run with custom configuration
uvx anydocs-mcp-server --config config.yaml
# Run in debug mode
uvx anydocs-mcp-server --debug
# Install globally with uvx for repeated use
uvx install anydocs-mcp-server
# Then run anytime with:
anydocs-mcp-server --config config.yaml
```
### Quick Start
```bash
# Clone the repository
git clone https://github.com/funky1688/anydocs-mcp.git
cd anydocs-mcp
# Install dependencies using uv (recommended)
pip install uv
uv pip install -e .
# Copy environment configuration
cp .env.example .env
# Copy and customize configuration
cp config.yaml my-config.yaml
# Edit configuration (add your API keys and settings)
nano .env
# Start the server (hybrid mode - MCP + Web interface)
uv run python start.py
```
### Manual Installation
```bash
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
uv pip install -e . # for production
# OR for development:
uv pip install -e .[dev]
# Copy and configure environment
cp .env.example .env
# Edit configuration files
nano .env
# Initialize and start
uv run python start.py --mode hybrid --debug
```
### Running as a Python Module
After installation, you can also run the server as a Python module:
```bash
# Run as module
python -m anydocs_mcp
# With configuration
python -m anydocs_mcp --config config.yaml
# Debug mode
python -m anydocs_mcp --debug
```
### Docker Installation
```bash
# Development environment
docker-compose -f docker-compose.dev.yml up -d
# Production environment
docker-compose up -d
```
## ๐ Usage
### Starting the Server
AnyDocs MCP Server supports 3 startup modes:
#### 1. Hybrid Mode (Default - Recommended)
Starts both MCP server and web management interface simultaneously:
```bash
uv run python start.py
# or explicitly:
uv run python start.py --mode hybrid
```
- **MCP Server**: Available at `http://localhost:8000` (handles MCP protocol communication)
- **Web Interface**: Available at `http://localhost:8080` for management
- **Best for**: Most users who want both MCP functionality and web management
> **Important**: Always use `uv run` to ensure the correct virtual environment is used.
#### 2. MCP Server Only
Starts only the MCP server without web interface:
```bash
uv run python start.py --mode mcp
```
- **Use case**: Production deployments where only MCP protocol is needed
- **Lighter resource usage**: No web interface overhead
- **Best for**: Headless servers, CI/CD environments
#### 3. Web Interface Only
Starts only the web management interface:
```bash
uv run python start.py --mode web
```
- **Use case**: Administrative tasks, configuration management
- **Web Interface**: Available at `http://localhost:8080`
- **Best for**: Configuration, monitoring, and testing without MCP protocol
#### Additional Options
```bash
# Debug mode with auto-reload
uv run python start.py --debug
# Custom configuration file
uv run python start.py --config custom-config.yaml
# Skip dependency check (faster startup)
uv run python start.py --no-deps-check
# Kill occupied ports before starting
uv run python start.py --kill-ports
# Skip database initialization
uv run python start.py --no-db-init
```
### Command Line Options
```bash
uv run python start.py --help
# Available options:
# --mode {mcp,web,hybrid} Startup mode (default: hybrid)
# --config CONFIG Configuration file path (default: config.yaml)
# --debug Enable debug mode
# --no-deps-check Skip dependency check
# --no-db-init Skip database initialization
# --kill-ports Kill occupied ports before starting
```
### Web Management Interface
Access the web interface at `http://localhost:8080` to:
- Configure document sources
- Manage users and API keys
- Monitor system health
- View logs and metrics
- Test MCP endpoints
### MCP Client Integration
```python
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
# Connect to AnyDocs MCP Server
async with stdio_client(StdioServerParameters(
command="python",
args=["main.py"]
)) as (read, write):
async with ClientSession(read, write) as session:
# Initialize the connection
await session.initialize()
# List available tools
tools = await session.list_tools()
# Search documents
result = await session.call_tool(
"search_documents",
arguments={"query": "authentication", "limit": 10}
)
```
## ๐ Documentation Adapters
### Supported Platforms
| Platform | Status | Features |
|----------|--------|----------|
| **Any Website** | โ
| **Universal scraper for any documentation site** |
| GitBook | โ
| Full API integration, real-time sync |
| Notion | โ
| Database and page content, webhooks |
| Confluence | โ
| Space and page management, attachments |
| GitHub | โ
| Repository documentation, wikis |
| GitLab | โ
| Project documentation, wikis |
| SharePoint | โ
| Document libraries, lists |
| Slack | โ
| Channel messages, knowledge base |
| File System | โ
| Local markdown files, watch mode |
| Custom | ๐ง | Extensible adapter framework |
### Adding a New Adapter
```python
from anydocs_mcp.adapters.base import BaseDocumentAdapter
class CustomAdapter(BaseDocumentAdapter):
"""Custom documentation adapter implementation."""
async def fetch_documents(self) -> List[Document]:
"""Fetch documents from your platform."""
# Implementation here
pass
async def get_document_content(self, doc_id: str) -> str:
"""Get specific document content."""
# Implementation here
pass
```
## ๐ง Configuration
### Troubleshooting
#### Common Installation Issues
**Python-Jose Import Error**: If you encounter `No module named 'jose'` error:
```bash
# Always use uv run to ensure correct virtual environment
uv run python start.py
# If the issue persists, reinstall python-jose with cryptography extras
uv pip uninstall python-jose
uv pip install "python-jose[cryptography]"
```
**Dependency Check Failures**: If you see errors about missing `pyyaml` or `beautifulsoup4`:
```bash
# The dependency check has been fixed to use correct import names
# Ensure you're using the latest version
git pull origin main
uv pip install -e .
```
**Configuration Attribute Errors**: If you see `'AppConfig' object has no attribute 'server_host'`:
```bash
# This is fixed in the current version - ensure you have the latest code
git pull origin main
```
**Virtual Environment Issues**: If packages seem installed but imports fail:
```bash
# Always use 'uv run' to ensure correct environment
uv run python start.py
# Check if you're in the right environment
which python # Should point to your project's Python
uv pip list # Should show installed packages
```
**Setup.py Conflicts**: If you encounter conflicts with multiple setup files:
```bash
# The redundant root setup.py has been removed
# Use pyproject.toml for package management
uv pip install -e .
```
#### Environment Setup Best Practices
1. **Always use `uv run`** for executing Python scripts to ensure correct environment
2. **Use `uv pip install`** instead of `uv install` for package installation
3. **Check virtual environment** with `uv pip list` if imports fail
4. **Pull latest changes** if you encounter configuration issues
#### Port Conflicts
If ports 8000 or 8080 are occupied:
```bash
# Kill processes on required ports (Windows)
netstat -ano | findstr :8000
taskkill /F /PID <PID>
# Or use the built-in option
uv run python start.py --kill-ports
```
### Environment Variables
```bash
# Server Configuration
ANYDOCS_HOST=localhost
ANYDOCS_PORT=8000
ANYDOCS_WEB_PORT=8080
ANYDOCS_DEBUG=false
# Database
DATABASE_URL=sqlite:///data/anydocs.db
# DATABASE_URL=postgresql://user:pass@localhost/anydocs
# Authentication
JWT_SECRET_KEY=your-secret-key
API_KEY_PREFIX=anydocs_
# Document Adapters
GITBOOK_API_TOKEN=your-gitbook-token
NOTION_API_TOKEN=your-notion-token
CONFLUENCE_API_TOKEN=your-confluence-token
# Cache (Optional)
REDIS_URL=redis://localhost:6379/0
# Monitoring
ENABLE_METRICS=true
LOG_LEVEL=INFO
```
### YAML Configuration
```yaml
# config.yaml
server:
host: localhost
port: 8000
web_port: 8080
debug: false
database:
url: sqlite:///data/anydocs.db
pool_size: 10
echo: false
auth:
jwt_secret: ${JWT_SECRET_KEY}
token_expire_minutes: 1440
api_key:
prefix: anydocs_
length: 32
adapters:
gitbook:
api_token: ${GITBOOK_API_TOKEN}
base_url: https://api.gitbook.com
rate_limit: 100
notion:
api_token: ${NOTION_API_TOKEN}
version: "2022-06-28"
rate_limit: 3
```
## ๐ MCP Tools
AnyDocs MCP Server provides the following tools:
### Core Tools
- **search_documents** - Search documents with full-text and semantic search
- **get_document** - Retrieve a specific document by ID
- **list_sources** - List all configured document sources
- **summarize_content** - Summarize document content
- **ask_question** - Ask questions about document content
### AI-Powered Tools
- **generate_documentation** - AI-assisted documentation generation
- **translate_content** - Multi-language content translation
- **extract_insights** - Extract insights and analytics from documentation
- **suggest_improvements** - AI-powered content enhancement suggestions
## ๐งช Development
### Setup Development Environment
```bash
# Install development dependencies
uv pip install -e .[dev]
# Setup pre-commit hooks
uv run pre-commit install
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=src/anydocs_mcp
# Code formatting
uv run black src/ tests/
uv run isort src/ tests/
# Type checking
uv run mypy src/
# Security checks
uv run bandit -r src/
```
### Project Structure
```
anydocs-mcp/
โโโ src/anydocs_mcp/ # Main package
โ โโโ adapters/ # Document adapters
โ โโโ auth/ # Authentication
โ โโโ config/ # Configuration management
โ โโโ content/ # Content processing
โ โโโ database/ # Database models and operations
โ โโโ utils/ # Utilities and helpers
โ โโโ web/ # Web interface
โ โโโ server.py # MCP server implementation
โโโ tests/ # Test suite
โโโ docs/ # Documentation
โโโ scripts/ # Utility scripts
โ โโโ setup.py # Development environment setup
โโโ pyproject.toml # Package configuration (modern Python packaging)
โโโ start.py # Main startup script
โโโ main.py # MCP server entry point
```
> **Note**: The project uses `pyproject.toml` for package configuration following modern Python packaging standards. The redundant root `setup.py` has been removed to avoid conflicts.
### Running Tests
```bash
# All tests
uv run pytest
# Unit tests only
uv run pytest tests/unit/
# Integration tests only
uv run pytest tests/integration/
# With coverage
uv run pytest --cov=src/anydocs_mcp --cov-report=html
# Performance tests
uv run pytest tests/performance/
```
## ๐ Monitoring & Observability
### Health Checks
```bash
# Check service health
curl http://localhost:8080/health
# Detailed health check
curl http://localhost:8080/health/detailed
```
### Metrics
Metrics are available at `/metrics` endpoint in Prometheus format:
- Request count and duration
- Database connection pool status
- Document sync statistics
- Error rates and types
- Cache hit/miss ratios
### Logging
Structured logging with configurable levels:
```python
# Application logs
tail -f anydocs_mcp.log
# Check logs in real-time
uv run python start.py --debug
```
## ๐ณ Docker Deployment
### Development
```bash
# Start development environment
docker-compose -f docker-compose.dev.yml up -d
# View logs
docker-compose logs -f anydocs-mcp-dev
# Access shell
docker-compose exec anydocs-mcp-dev bash
```
### Production
```bash
# Build and start production environment
docker-compose up -d
# Scale services
docker-compose up -d --scale anydocs-mcp=3
# Update services
docker-compose pull && docker-compose up -d
```
### Monitoring Stack
```bash
# Start with monitoring
docker-compose --profile monitoring up -d
# Access services
# Grafana: http://localhost:3001 (admin/admin)
# Prometheus: http://localhost:9090
```
## ๐ Security
### Authentication Methods
1. **API Keys**: Simple token-based authentication
2. **JWT Tokens**: Stateless authentication with expiration
3. **OAuth2**: Integration with external providers
### Security Best Practices
- All API endpoints require authentication
- Rate limiting on all endpoints
- Input validation and sanitization
- SQL injection prevention
- CORS configuration
- Security headers
- Audit logging
### Security Scanning
```bash
# Run security checks
uv run bandit -r src/
# Dependency vulnerability scan
uv run safety check
# SAST scanning
uv run bandit -r src/
```
## ๐ Performance
### Optimization Features
- **Async/Await**: Non-blocking I/O operations
- **Connection Pooling**: Efficient database connections
- **Caching**: Redis-based caching with TTL
- **Rate Limiting**: Prevent API abuse
- **Batch Processing**: Efficient bulk operations
- **Lazy Loading**: On-demand content loading
### Performance Monitoring
```bash
# Performance testing
uv run pytest tests/performance/
# Memory profiling
uv run python -m memory_profiler start.py
```
## ๐ค Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Workflow
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Run the test suite
6. Submit a pull request
### Code Standards
- Follow PEP 8 style guide
- Use type hints
- Write comprehensive tests
- Document public APIs
- Use meaningful commit messages
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Acknowledgments
- [Model Context Protocol](https://github.com/modelcontextprotocol) for the MCP specification
- [Python MCP SDK](https://github.com/modelcontextprotocol/python-sdk) for the SDK implementation
- All contributors and maintainers
## ๐ Support
- ๐ง Email: team@anydocs-mcp.com
- ๐ Issues: [GitHub Issues](https://github.com/your-org/anydocs-mcp/issues)
- ๐ Documentation: [Full Documentation](https://docs.anydocs-mcp.com)
---
**Made with โค๏ธ by funky1688**