Skip to main content
Glama
index.md5.18 kB
--- sidebar_position: 1 title: API Overview --- ## API Reference Overview DataBeak provides 40+ tools for comprehensive CSV manipulation through the Model Context Protocol (MCP). All tools return structured responses and include comprehensive error handling. ## Tool Categories ### 📁 I/O Operations Tools for loading CSV data from web sources: - **`load_csv_from_url`** - Load CSV from HTTP/HTTPS URL - **`load_csv_from_content`** - Load CSV from string content - **`get_session_info`** - Get current session details and statistics - **`list_sessions`** - List all active sessions - **`close_session`** - Close and cleanup a session ### 🔧 Data Manipulation Tools for transforming and modifying CSV data: - **`filter_rows`** - Filter rows with complex conditions (AND/OR logic) - **`sort_data`** - Sort by single or multiple columns - **`select_columns`** - Select specific columns by name or pattern - **`rename_columns`** - Rename columns with mapping - **`add_column`** - Add computed columns with formulas - **`remove_columns`** - Remove unwanted columns - **`update_column`** - Update column values with transformations - **`change_column_type`** - Convert column data types - **`fill_missing_values`** - Handle null/NaN values with strategies - **`remove_duplicates`** - Remove duplicate rows with optional key columns ### 📊 Data Analysis Tools for statistical analysis and insights: - **`get_statistics`** - Descriptive statistics for numeric columns - **`get_column_statistics`** - Detailed stats for specific columns - **`get_correlation_matrix`** - Pearson, Spearman, and Kendall correlations - **`group_by_aggregate`** - Group data with aggregation functions - **`get_value_counts`** - Frequency counts for categorical data - **`detect_outliers`** - Find outliers using IQR, Z-score, or custom methods - **`profile_data`** - Comprehensive data profiling report ### ✅ Data Validation Tools for schema validation and quality checking: - **`validate_schema`** - Validate data against schema definitions - **`check_data_quality`** - Overall data quality scoring - **`find_anomalies`** - Detect statistical and pattern anomalies ### 🔄 Session Management Tools for managing data sessions: - **`list_sessions`** - List all active sessions - **`close_session`** - Close and cleanup a session - **`get_session_info`** - Get session metadata and statistics ### ⚙️ System Tools System information and health monitoring: - **`health_check`** - Server health and status - **`get_server_info`** - Server capabilities and configuration ## Common Patterns ### Error Handling All tools return consistent response format: ```json { "success": true, "data": {...}, "session_id": "uuid-here" } ``` Error responses: ```json { "success": false, "error": "Error description", "session_id": "uuid-here" } ``` ### Session Management Most tools require a `session_id` parameter. Sessions are automatically created and managed with configurable timeouts. ### Data Types DataBeak supports rich data types including: - **Strings**, **Numbers**, **Booleans** - **Dates** and **DateTime** objects - **Null values** (JSON `null` → Python `None` → pandas `NaN`) ### Filtering Conditions Filter operations support complex conditions: ```json { "conditions": [ {"column": "age", "operator": ">", "value": 18}, {"column": "status", "operator": "==", "value": "active"} ], "logic": "AND" // or "OR" } ``` ### Environment Configuration All tools respect these environment variables (all use `DATABEAK_` prefix): | Variable | Default | Purpose | | ------------------------------------- | --------- | -------------------------------- | | `DATABEAK_SESSION_TIMEOUT` | 3600 | Session timeout (seconds) | | `DATABEAK_MAX_DOWNLOAD_SIZE_MB` | 100 | Maximum URL download size (MB) | | `DATABEAK_MAX_MEMORY_USAGE_MB` | 1000 | Max DataFrame memory (MB) | | `DATABEAK_MAX_ROWS` | 1,000,000 | Max DataFrame rows | | `DATABEAK_URL_TIMEOUT_SECONDS` | 30 | URL download timeout (seconds) | | `DATABEAK_HEALTH_MEMORY_THRESHOLD_MB` | 2048 | Health monitoring threshold (MB) | See [DatabeakSettings](https://github.com/jonpspri/databeak/blob/main/src/databeak/core/settings.py) for all configuration options. ## Advanced Features ### Null Value Support Full support for null values across all operations: - JSON `null` values are preserved and handled correctly - Python `None` and pandas `NaN` compatibility - Filtering and operations work seamlessly with nulls ### Stateless Architecture Clean MCP server design: - **Session-based processing** - Data operations without internal state - **External persistence** - Context handles data persistence as needed - **Resource efficient** - No overhead from history or auto-save tracking - **MCP-aligned** - Follows Model Context Protocol server patterns ______________________________________________________________________ **For detailed examples and tutorials, see the [Quick Start Guide](../tutorials/quickstart.md)**

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server