Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
DATABEAK_MAX_ROWSNoMax DataFrame rows1000000
DATABEAK_SESSION_TIMEOUTNoSession timeout (seconds)3600
DATABEAK_MAX_MEMORY_USAGE_MBNoMax DataFrame memory (MB)1000
DATABEAK_URL_TIMEOUT_SECONDSNoURL download timeout30
DATABEAK_MAX_DOWNLOAD_SIZE_MBNoMaximum URL download size (MB)100
DATABEAK_HEALTH_MEMORY_THRESHOLD_MBNoHealth monitoring memory threshold2048

Schema

Prompts

Interactive templates invoked by user choice

NameDescription
analyze_csv_promptGenerate a prompt to analyze CSV data.
data_cleaning_promptGenerate a prompt for data cleaning suggestions.

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Tools

Functions exposed to the LLM to take actions

NameDescription
health_check

Check DataBeak server health and availability with memory monitoring.

Returns server status, session capacity, memory usage, and version information. Use before large operations to verify system readiness and resource availability.

get_server_info

Get DataBeak server capabilities and supported operations.

Returns server version, available tools, supported file formats, and resource limits. Use to discover what operations are available before planning workflows.

load_csv_from_url

Load CSV file from URL into DataBeak session.

Downloads and parses CSV data with security validation. Returns session ID and data preview for further operations.

load_csv_from_content

Load CSV data from string content into DataBeak session.

Parses CSV data directly from string with validation. Returns session ID and data preview for further operations.

get_session_info

Get comprehensive information about a specific session.

Returns session metadata, data status, and configuration. Essential for session management and workflow coordination.

get_cell_value

Get value of specific cell with coordinate targeting.

Supports column name or index targeting. Returns value with coordinates and data type information.

set_cell_value

Set value of specific cell with coordinate targeting.

Supports column name or index, tracks old and new values. Returns operation result with coordinates and data type.

get_row_data

Get data from specific row with optional column filtering.

Returns complete row data or filtered by column list. Converts pandas types for JSON serialization.

get_column_data

Get data from specific column with optional row range slicing.

Supports row range filtering for focused analysis. Returns column values with range metadata.

insert_row

Insert new row at specified index with multiple data formats.

Supports dict, list, and JSON string input with null value handling. Returns insertion result with before/after statistics.

delete_row

Delete row at specified index with comprehensive tracking.

Captures deleted data for undo operations. Returns operation result with before/after statistics.

update_row

Update specific columns in row with selective updates.

Supports partial column updates with change tracking. Returns old/new values for updated columns.

get_statistics

Get comprehensive statistical summary of numerical columns.

Computes descriptive statistics for all or specified numerical columns including count, mean, standard deviation, min/max values, and percentiles. Optimized for AI workflows with clear statistical insights and data understanding.

Returns: Comprehensive statistical analysis with per-column summaries

Statistical Metrics: 📊 Count: Number of non-null values 📈 Mean: Average value 📉 Std: Standard deviation (measure of spread) 🔢 Min/Max: Minimum and maximum values 📊 Percentiles: 25th, 50th (median), 75th quartiles

Examples: # Get statistics for all numeric columns stats = await get_statistics("session_123")

# Analyze specific columns only stats = await get_statistics("session_123", columns=["price", "quantity"]) # Analyze all numeric columns (percentiles always included) stats = await get_statistics("session_123")

AI Workflow Integration: 1. Essential for data understanding and quality assessment 2. Identifies data distribution and potential issues 3. Guides feature engineering and analysis decisions 4. Provides context for outlier detection thresholds

get_column_statistics

Get detailed statistical analysis for a single column.

Provides focused statistical analysis for a specific column including data type information, null value handling, and comprehensive numerical statistics when applicable.

Returns: Detailed statistical analysis for the specified column

Column Analysis: 🔍 Data Type: Detected pandas data type 📊 Statistics: Complete statistical summary for numeric columns 🔢 Non-null Count: Number of valid (non-null) values 📈 Distribution: Statistical distribution characteristics

Examples: # Analyze a price column stats = await get_column_statistics(ctx, "price")

# Analyze a categorical column stats = await get_column_statistics(ctx, "category")

AI Workflow Integration: 1. Deep dive analysis for specific columns of interest 2. Data quality assessment for individual features 3. Understanding column characteristics for modeling 4. Validation of data transformations

get_correlation_matrix

Calculate correlation matrix for numerical columns.

Computes pairwise correlations between numerical columns using various correlation methods. Essential for understanding relationships between variables and feature selection in analytical workflows.

Returns: Correlation matrix with pairwise correlation coefficients

Correlation Methods: 📊 Pearson: Linear relationships (default, assumes normality) 📈 Spearman: Monotonic relationships (rank-based, non-parametric) 🔄 Kendall: Concordant/discordant pairs (robust, small samples)

Examples: # Basic correlation analysis corr = await get_correlation_matrix(ctx)

# Analyze specific columns with Spearman correlation corr = await get_correlation_matrix(ctx, columns=["price", "rating", "sales"], method="spearman") # Filter correlations above threshold corr = await get_correlation_matrix(ctx, min_correlation=0.5)

AI Workflow Integration: 1. Feature selection and dimensionality reduction 2. Multicollinearity detection before modeling 3. Understanding variable relationships 4. Data validation and quality assessment

get_value_counts

Get frequency distribution of values in a column.

Analyzes the distribution of values in a specified column, providing counts and optionally percentages for each unique value. Essential for understanding categorical data and identifying common patterns.

Returns: Frequency distribution with counts/percentages for each unique value

Analysis Features: 🔢 Frequency Counts: Raw counts for each unique value 📊 Percentage Mode: Normalized frequencies as percentages 🎯 Top Values: Configurable limit for most frequent values 📈 Summary Stats: Total values, unique count, distribution insights

Examples: # Basic value counts counts = await get_value_counts(ctx, "category")

# Get percentages for top 10 values counts = await get_value_counts(ctx, "status", normalize=True, top_n=10) # Sort in ascending order counts = await get_value_counts(ctx, "grade", ascending=True)

AI Workflow Integration: 1. Categorical data analysis and encoding decisions 2. Data quality assessment (identifying rare values) 3. Understanding distribution for sampling strategies 4. Feature engineering insights for categorical variables

detect_outliers

Detect outliers in numerical columns using various algorithms.

Identifies data points that deviate significantly from the normal pattern using statistical and machine learning methods. Essential for data quality assessment and anomaly detection in analytical workflows.

Returns: Detailed outlier analysis with locations and severity scores

Detection Methods: 📊 Z-Score: Statistical method based on standard deviations 📈 IQR: Interquartile range method (robust to distribution) 🤖 Isolation Forest: ML-based method for high-dimensional data

Examples: # Basic outlier detection outliers = await detect_outliers(ctx, ["price", "quantity"])

# Use IQR method with custom threshold outliers = await detect_outliers(ctx, ["sales"], method="iqr", threshold=2.5)

AI Workflow Integration: 1. Data quality assessment and cleaning 2. Anomaly detection for fraud/error identification 3. Data preprocessing for machine learning 4. Understanding data distribution characteristics

profile_data

Generate comprehensive data profile with statistical insights.

Creates a complete analytical profile of the dataset including column characteristics, data types, null patterns, and statistical summaries. Provides holistic data understanding for analytical workflows.

Returns: Comprehensive data profile with multi-dimensional analysis

Profile Components: 📊 Column Profiles: Data types, null patterns, uniqueness 📈 Statistical Summaries: Numerical column characteristics 🔗 Correlations: Inter-variable relationships (optional) 🎯 Outliers: Anomaly detection across columns (optional) 💾 Memory Usage: Resource consumption analysis

Examples: # Full data profile profile = await profile_data(ctx)

# Quick profile without expensive computations profile = await profile_data(ctx, include_correlations=False, include_outliers=False)

AI Workflow Integration: 1. Initial data exploration and understanding 2. Automated data quality reporting 3. Feature engineering guidance 4. Data preprocessing strategy development

group_by_aggregate

Group data and compute aggregations for analytical insights.

Performs GROUP BY operations with multiple aggregation functions per column. Essential for segmentation analysis and understanding patterns across different data groups.

Returns: Grouped aggregation results with statistics per group

Aggregation Functions: 📊 count, mean, median, sum, min, max 📈 std, var (statistical measures) 🎯 first, last (positional) 📋 nunique (unique count)

Examples: # Sales analysis by region result = await group_by_aggregate(ctx, group_by=["region"], aggregations={"sales": ["sum", "mean", "count"]})

# Multi-dimensional grouping result = await group_by_aggregate(ctx, group_by=["category", "region"], aggregations={ "price": ["mean", "std"], "quantity": ["sum", "count"] })

AI Workflow Integration: 1. Segmentation analysis and market research 2. Feature engineering for categorical interactions 3. Data summarization for reporting and insights 4. Understanding group-based patterns and trends

find_cells_with_value

Find all cells containing a specific value for data discovery.

Searches through the dataset to locate all occurrences of a specific value, providing coordinates and context. Essential for data validation, quality checking, and understanding data patterns.

Returns: Locations of all matching cells with coordinates and context

Search Features: 🎯 Exact Match: Precise value matching with type consideration 🔍 Substring Search: Flexible text-based search for string columns 📍 Coordinates: Row and column positions for each match 📊 Summary Stats: Total matches, columns searched, search parameters

Examples: # Find all cells with value "ERROR" results = await find_cells_with_value(ctx, "ERROR")

# Substring search in specific columns results = await find_cells_with_value(ctx, "john", columns=["name", "email"], exact_match=False)

AI Workflow Integration: 1. Data quality assessment and error detection 2. Pattern identification and data validation 3. Reference data location and verification 4. Data cleaning and preprocessing guidance

get_data_summary

Get comprehensive data overview and structural summary.

Provides high-level overview of dataset structure, dimensions, data types, and memory usage. Essential first step in data exploration and analysis planning workflows.

Returns: Comprehensive data overview with structural information

Summary Components: 📏 Dimensions: Rows, columns, shape information 🔢 Data Types: Column type distribution and analysis 💾 Memory Usage: Resource consumption breakdown 👀 Preview: Sample rows for quick data understanding (optional) 📊 Overview: High-level dataset characteristics

Examples: # Full data summary with preview summary = await get_data_summary(ctx)

# Structure summary without preview data summary = await get_data_summary(ctx, include_preview=False)

AI Workflow Integration: 1. Initial data exploration and understanding 2. Planning analytical approaches based on data structure 3. Resource planning for large dataset processing 4. Data quality initial assessment

inspect_data_around

Inspect data around a specific coordinate for contextual analysis.

Examines the data surrounding a specific cell to understand context, patterns, and relationships. Useful for data validation, error investigation, and understanding local data patterns.

Returns: Contextual view of data around the specified coordinates

Inspection Features: 📍 Center Point: Specified cell as reference point 🔍 Radius View: Configurable area around center cell 📊 Data Context: Surrounding values for pattern analysis 🎯 Coordinates: Clear row/column reference system

Examples: # Inspect around a specific data point context = await inspect_data_around(ctx, row=50, column_name="price", radius=3)

# Minimal context view context = await inspect_data_around(ctx, row=10, column_name="status", radius=1)

AI Workflow Integration: 1. Error investigation and data quality assessment 2. Pattern recognition in local data areas 3. Understanding data relationships and context 4. Validation of data transformations and corrections

validate_schema

Validate data against a schema definition using Pandera validation framework.

This function leverages Pandera's comprehensive validation capabilities to provide robust data validation. The schema is dynamically converted to Pandera format and applied to the DataFrame for maximum validation coverage and reliability.

For more information on Pandera validation capabilities, see:

Returns: ValidateSchemaResult with validation status and detailed error information

check_data_quality

Check data quality based on predefined or custom rules.

Returns: DataQualityResult with comprehensive quality assessment results

find_anomalies

Find anomalies in the data using multiple detection methods.

Returns: FindAnomaliesResult with comprehensive anomaly detection results

filter_rows

Filter rows using flexible conditions: comprehensive null value and text matching support.

Provides powerful filtering capabilities optimized for AI-driven data analysis. Supports multiple operators, logical combinations, and comprehensive null value handling.

Examples: # Numeric filtering filter_rows(ctx, [{"column": "age", "operator": ">", "value": 25}])

# Text filtering with null handling filter_rows(ctx, [ {"column": "name", "operator": "contains", "value": "Smith"}, {"column": "email", "operator": "is_not_null"} ], mode="and") # Multiple conditions with OR logic filter_rows(ctx, [ {"column": "status", "operator": "==", "value": "active"}, {"column": "priority", "operator": "==", "value": "high"} ], mode="or")
sort_data

Sort data by one or more columns with comprehensive error handling.

Provides flexible sorting capabilities with support for multiple columns and sort directions. Handles mixed data types appropriately and maintains data integrity throughout the sorting process.

Examples: # Simple single column sort sort_data(ctx, ["age"])

# Multi-column sort with different directions sort_data(ctx, [ {"column": "department", "ascending": True}, {"column": "salary", "ascending": False} ]) # Using SortColumn objects for type safety sort_data(ctx, [ SortColumn(column="name", ascending=True), SortColumn(column="age", ascending=False) ])
remove_duplicates

Remove duplicate rows from the dataframe with comprehensive validation.

Provides flexible duplicate removal with options for column subset selection and different keep strategies. Handles edge cases and provides detailed statistics about the deduplication process.

Examples: # Remove exact duplicate rows remove_duplicates(ctx)

# Remove duplicates based on specific columns remove_duplicates(ctx, subset=["email", "name"]) # Keep last occurrence instead of first remove_duplicates(ctx, subset=["id"], keep="last") # Remove all duplicates (keep none) remove_duplicates(ctx, subset=["email"], keep="none")
fill_missing_values

Fill or remove missing values with comprehensive strategy support.

Provides multiple strategies for handling missing data, including statistical imputation methods. Handles different data types appropriately and validates strategy compatibility with column types.

Examples: # Drop rows with any missing values fill_missing_values(ctx, strategy="drop")

# Fill missing values with 0 fill_missing_values(ctx, strategy="fill", value=0) # Forward fill specific columns fill_missing_values(ctx, strategy="forward", columns=["price", "quantity"]) # Fill with column mean for numeric columns fill_missing_values(ctx, strategy="mean", columns=["age", "salary"])
select_columns

Select specific columns from dataframe, removing all others.

Validates column existence and reorders by selection order. Returns selection details with before/after column counts.

rename_columns

Rename columns in the dataframe.

Returns: Dict with rename details

Examples: # Using dictionary mapping rename_columns(ctx, {"old_col1": "new_col1", "old_col2": "new_col2"})

# Rename multiple columns rename_columns(ctx, { "FirstName": "first_name", "LastName": "last_name", "EmailAddress": "email" })
add_column

Add a new column to the dataframe.

Returns: ColumnOperationResult with operation details

Examples: # Add column with constant value add_column(ctx, "status", "active")

# Add column with list of values add_column(ctx, "scores", [85, 90, 78, 92, 88]) # Add computed column add_column(ctx, "total", formula="price * quantity") # Add column with complex formula add_column(ctx, "full_name", formula="first_name + ' ' + last_name")
remove_columns

Remove columns from the dataframe.

Returns: ColumnOperationResult with removal details

Examples: # Remove single column remove_columns(ctx, ["temp_column"])

# Remove multiple columns remove_columns(ctx, ["col1", "col2", "col3"]) # Clean up after analysis remove_columns(ctx, ["_temp", "_backup", "old_value"])
change_column_type

Change the data type of a column.

Returns: ColumnOperationResult with conversion details

Examples: # Convert string numbers to integers change_column_type(ctx, "age", "int")

# Convert to float, replacing errors with NaN change_column_type(ctx, "price", "float", errors="coerce") # Convert to datetime change_column_type(ctx, "date", "datetime") # Convert to boolean change_column_type(ctx, "is_active", "bool")
update_column

Update values in a column using various operations with discriminated unions.

Returns: ColumnOperationResult with update details

Examples: # Using discriminated union - Replace operation update_column(ctx, "status", { "type": "replace", "pattern": "N/A", "replacement": "Unknown" })

# Using discriminated union - Map operation update_column(ctx, "code", { "type": "map", "mapping": {"A": "Alpha", "B": "Beta"} }) # Using discriminated union - Fill operation update_column(ctx, "score", { "type": "fillna", "value": 0 }) # Legacy format still supported update_column(ctx, "score", { "operation": "fillna", "value": 0 })
replace_in_column

Replace patterns in a column with replacement text.

Returns: ColumnOperationResult with replacement details

Examples: # Replace with regex replace_in_column(ctx, "name", r"Mr.", "Mister")

# Remove non-digits from phone numbers replace_in_column(ctx, "phone", r"\D", "", regex=True) # Simple string replacement replace_in_column(ctx, "status", "N/A", "Unknown", regex=False) # Replace multiple spaces with single space replace_in_column(ctx, "description", r"\s+", " ")
extract_from_column

Extract patterns from a column using regex with capturing groups.

Returns: ColumnOperationResult with extraction details

Examples: # Extract email parts extract_from_column(ctx, "email", r"(.+)@(.+)")

# Extract code components extract_from_column(ctx, "product_code", r"([A-Z]{2})-(\d+)") # Extract and expand into multiple columns extract_from_column(ctx, "full_name", r"(\w+)\s+(\w+)", expand=True) # Extract year from date string extract_from_column(ctx, "date", r"\d{4}")
split_column

Split column values by delimiter.

Returns: ColumnOperationResult with split details

Examples: # Keep first part of split split_column(ctx, "full_name", " ", part_index=0)

# Keep last part split_column(ctx, "email", "@", part_index=1) # Expand into multiple columns split_column(ctx, "address", ",", expand_to_columns=True) # Expand with custom column names split_column(ctx, "name", " ", expand_to_columns=True, new_columns=["first_name", "last_name"])
transform_column_case

Transform the case of text in a column.

Returns: ColumnOperationResult with transformation details

Examples: # Convert to uppercase transform_column_case(ctx, "code", "upper")

# Convert names to title case transform_column_case(ctx, "name", "title") # Convert to lowercase for comparison transform_column_case(ctx, "email", "lower") # Capitalize sentences transform_column_case(ctx, "description", "capitalize")
strip_column

Strip whitespace or specified characters from column values.

Returns: ColumnOperationResult with strip details

Examples: # Remove leading/trailing whitespace strip_column(ctx, "name")

# Remove specific characters strip_column(ctx, "phone", "()") # Clean currency values strip_column(ctx, "price", "$,") # Remove quotes strip_column(ctx, "quoted_text", "'\"")
fill_column_nulls

Fill null/NaN values in a specific column with a specified value.

Returns: ColumnOperationResult with fill details

Examples: # Fill missing names with "Unknown" fill_column_nulls(ctx, "name", "Unknown")

# Fill missing ages with 0 fill_column_nulls(ctx, "age", 0) # Fill missing status with default fill_column_nulls(ctx, "status", "pending") # Fill missing scores with -1 fill_column_nulls(ctx, "score", -1)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server