Skip to main content
Glama
architecture.md9.06 kB
--- sidebar_position: 4 title: Architecture --- ## Architecture Overview DataBeak is built as a Model Context Protocol (MCP) server that provides AI assistants with comprehensive CSV data manipulation capabilities. This document explains the technical architecture and design decisions. ## Technology Stack - **Framework**: FastMCP 2.11.3+ (Model Context Protocol) - **Data Processing**: Pandas 2.2.3+, NumPy 2.1.3+ - **Package Manager**: uv (ultra-fast Python package management) - **Build System**: Hatchling - **Code Quality**: Ruff (linting and formatting), MyPy (type checking) - **Configuration**: Pydantic Settings for environment management ## Core Components ```text src/databeak/ ├── server.py # FastMCP server composition & routing ├── core/ # Core infrastructure │ ├── session.py # Session management with lifecycle │ ├── settings.py # Environment-based configuration │ └── json_schema_validate.py # Custom JSON schema validation ├── models/ # Data models and type definitions │ ├── data_models.py # Core data types & enums │ ├── data_session.py # DataFrame session wrapper │ ├── typed_dicts.py # TypedDict definitions for type safety │ ├── tool_responses.py # Pydantic response models │ ├── statistics_models.py # Statistics response types │ └── expression_models.py # Secure expression types ├── servers/ # Specialized MCP servers (server composition) │ ├── io_server.py # Load/export operations │ ├── transformation_server.py # Data transformation │ ├── statistics_server.py # Statistical analysis │ ├── discovery_server.py # Data profiling & discovery │ ├── validation_server.py # Schema validation & quality │ ├── column_server.py # Column operations │ ├── column_text_server.py # Text manipulation │ ├── row_operations_server.py # Row-level operations │ └── system_server.py # Health & system info ├── services/ # Backend operation implementations │ └── data_operations.py # Data preview and transformation utilities ├── utils/ # Utility functions │ ├── validators.py # Input validation │ ├── secure_evaluator.py # Safe expression evaluation │ ├── logging_config.py # Logging configuration │ └── pydantic_validators.py # Custom Pydantic validators ├── exceptions.py # Custom error handling └── _version.py # Dynamic version loading ``` ## Key Features ### Session Management - **Multi-session support** with automatic cleanup - **Configurable timeouts** and resource limits - **Session isolation** for concurrent users ### Data Operations - **40+ tools** covering I/O, manipulation, analysis, and validation - **Multiple format support**: CSV, JSON, Excel, Parquet, HTML, Markdown - **Streaming processing** for large files - **Type-safe operations** with Pydantic validation ### Auto-Save & History - **Automatic saving** after each operation - **Undo/redo functionality** with operation tracking - **Persistent history** with JSON storage - **Configurable strategies**: overwrite, backup, versioned ### Configuration Management - **Environment-based settings** using Pydantic Settings - **Centralized configuration** in DataBeakSettings class - **Runtime version detection** via importlib.metadata ### Code Quality & Architecture - **Zero static analysis violations** - Clean ruff compliance across all categories - **Strong type safety** - 100% mypy compliance with minimal Any usage - **High test coverage** - 880+ unit tests + 43 integration tests with 88%+ coverage - **Server composition pattern** - Modular FastMCP servers for different domains - **Context-based logging** - MCP-integrated logging for better traceability - **Clear API design** - Keyword-only boolean parameters, no boolean traps - **Defensive practices** - Proper exception handling, input validation ## Environment Variables All configuration uses the `DATABEAK_` prefix: | Variable | Default | Purpose | | ------------------------------------ | ------- | -------------------------------------- | | `DATABEAK_MAX_FILE_SIZE_MB` | 1024 | Maximum file size limit | | `DATABEAK_SESSION_TIMEOUT` | 3600 | Session timeout (seconds) | | `DATABEAK_CHUNK_SIZE` | 10000 | Processing chunk size | | `DATABEAK_MEMORY_THRESHOLD_MB` | 2048 | Memory threshold for health monitoring | | `DATABEAK_MAX_VALIDATION_VIOLATIONS` | 1000 | Max validation violations to report | | `DATABEAK_MAX_ANOMALY_SAMPLE_SIZE` | 10000 | Max sample size for anomaly detection | ## MCP Integration The server implements the Model Context Protocol standard: - **Tools**: 40+ data manipulation functions - **Resources**: Session and data access - **Prompts**: Data analysis templates - **Error Handling**: Structured error responses ### Tool Categories 1. **I/O Operations** - Load/export data in multiple formats 1. **Data Manipulation** - Transform, filter, sort, and modify data 1. **Data Analysis** - Statistics, correlations, outliers, profiling 1. **Data Validation** - Schema validation, quality checking, anomaly detection 1. **Session Management** - Stateless data processing with external context management 1. **System Tools** - Health monitoring and server information ## Design Principles 1. **Type Safety**: Full type annotations with Pydantic validation 1. **Modularity**: Clear separation of concerns across modules 1. **Performance**: Streaming operations for large datasets 1. **Reliability**: Comprehensive error handling and logging 1. **Usability**: Simple installation and configuration 1. **Maintainability**: Modern tooling and clear documentation ## Quality Standards DataBeak maintains strict code quality standards with automated enforcement: ### Code Quality Metrics - **Zero ruff violations** - Perfect linting compliance across 46 rules - **100% MyPy compliance** - Complete type safety with minimal Any usage - **Perfect MCP documentation** - Comprehensive Field descriptions, no Args sections - **High test coverage** - 983 unit tests + 43 integration tests validating all functionality - **Clean architecture** - Stateless MCP design with eliminated complexity ### Quality Enforcement Tools - **Ruff** - Comprehensive linting and formatting (46 rules enabled) - **MyPy** - Static type checking with strict configuration - **Pre-commit hooks** - Automated quality gates preventing regressions - **Custom MCP checkers** - Specialized tools for MCP documentation standards: - `check_docstring_args.py` - Ensures no Args sections in MCP tool docstrings - `check_mcp_field_descriptions.py` - Validates comprehensive Field descriptions ### Quality Commands ```bash # Run all quality checks uv run pre-commit run --all-files # Individual checks uv run ruff check src/ tests/ # Linting uv run mypy src/databeak/ # Type checking uv run pytest tests/unit/ # Unit tests scripts/check_docstring_args.py # MCP Args compliance scripts/check_mcp_field_descriptions.py # MCP Field compliance ``` ## Development Workflow ### Package Management ```bash uv sync # Install dependencies uv run databeak # Run server uv run -m pytest # Run tests uv run ruff check && uv run ruff format --check && uv run mypy src/databeak/ && uv run -m pytest ``` ### Version Management - **Single source of truth**: pyproject.toml - **Automatic synchronization**: `uv run sync-versions` - **Dynamic loading**: via importlib.metadata ### Quality Assurance - **Linting**: Ruff with comprehensive rule set - **Formatting**: Ruff with 100-character lines - **Type checking**: MyPy with strict configuration - **Testing**: Three-tier testing strategy - **Unit tests** (`tests/unit/`): Fast, isolated module testing - **Integration tests** (`tests/integration/`): Component interaction validation - **E2E tests** (`tests/e2e/`): Complete workflow verification - **Coverage target**: 80%+ with pytest-cov - See [Testing Guide](testing.md) for best practices ## Future Considerations - **Advanced transformation interface** for complex operations - **Real-time collaboration** features - **Machine learning integrations** for data insights - **Cloud storage support** for remote data sources - **Advanced visualization tools** for data exploration ______________________________________________________________________ **For implementation details and contributing guidelines, see [CONTRIBUTING.md](https://github.com/jonpspri/databeak/blob/main/CONTRIBUTING.md)**

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server