Bigeye MCP Server

Official

Overview Schema Related Servers Score Discussions

bigeye-mcp-server

CLAUDE.md•14.7 KiB

# Claude Development Notes This file contains important information and to-do items for Claude when working on this MCP server. ## Configuration - The Bigeye MCP server uses environment variables for configuration - Workspace ID is automatically retrieved from `BIGEYE_WORKSPACE_ID` environment variable - API credentials are passed via `BIGEYE_API_KEY` and `BIGEYE_API_URL` - Docker image must be tagged with both names: `bigeye-mcp-server:latest` and `bigeye-mcp-ephemeral:latest` ## Workflow Guidelines ### When Users Ask About Tables/Columns 1. **ALWAYS search first** using `search_tables()` or `search_columns()` tools 2. Present search results as a numbered list 3. Ask user to confirm which specific object they mean 4. Only then proceed with analysis/health checks This workflow is enforced in tool descriptions with "ALWAYS USE THIS TOOL FIRST" instructions. ### When Users Ask About Issues or Incidents ✅ IMPLEMENTED **CRITICAL UNDERSTANDING - Issue ID vs Name:** - Issues have an `id` field (internal database ID like 12345) - users typically DON'T know this - Issues have a `name` field (display reference like "10921") - this is what users see and reference - When users say "incident 10921" or "issue 10921", they mean the `name` field, NOT the `id` field **Workflow:** 1. **ALWAYS use `search_issues_by_name()` first** when users reference an issue/incident by number or name 2. Present the search results with issue name, status, description, and affected tables 3. Use the returned `id` field for subsequent operations (get_related_issues, update_issue, merge_issues, etc.) **Example:** ``` User: "Show me incident 10921" ❌ WRONG: get_related_issues(starting_issue_id=10921) ✓ CORRECT: search_issues_by_name(name_query="10921") → Then use the returned 'id' field for other operations ``` **Implementation Details:** - Added `search_issues_by_name()` method to BigeyeAPIClient (bigeye_api.py:263-321) - Added `search_issues_by_name()` tool to server.py (server.py:574-644) - Supports both exact and partial matching (case-insensitive) - Performs client-side filtering since the Bigeye API doesn't support name filtering - Updated system instructions to clarify the id vs name distinction (server.py:83-119) ## Known Issues & To-Do Items ### High Priority #### 1. ~~Fix Get Issues Response Size~~ ✅ COMPLETED (v3) **Issue**: The `get_issues` tool responses are too large. The API is returning not only the issues but also all associated run history, which can be extremely long and overwhelm the context. **FIXED (Original)**: Added response optimization in `fetch_issues()` that: - Strips out historical metric runs, keeping only essential metadata - Limits events to just the most recent one - Removes large fields like `metricRunHistory`, `detailedHistory`, `allEvents` - Added `include_full_history` parameter (defaults to False) for when full history is needed - Set default page_size to 20 to limit results **FIXED (v2 - Additional Improvements)**: Added smart pagination with compact mode and separate issue details endpoint: 1. **Compact Mode** (`compact=True`, default): - Returns only minimal fields: id, name, status, priority, table, schema, warehouse, isIncident - Much smaller response footprint for listing issues - Includes hint to use `get_issue_details()` for full information 2. **Response Size Limiting** (`max_issues=15`, default): - Limits number of issues returned to prevent context overload - Response includes `truncated`, `totalAvailable`, `returnedCount` metadata - Can be set to `None` to return all issues (use with caution) 3. **New `get_issue_details(issue_id)` Tool**: - Fetches complete details for a single issue by ID - Use after identifying issues from `get_issues()` or `search_issues_by_name()` - Returns full event history, metric details, all metadata **FIXED (v3 - API Response Format Fix)**: Fixed issue where compact mode wasn't working because: - The Bigeye API returns `issue` (singular) as key, not `issues` (plural) - Issue fields like tableName/schemaName are nested in `metricMetadata`, not at top level - Large fields like `metricConfiguration` and full `events` arrays were still being returned Changes: - Handle both `issue` and `issues` response keys - Extract table/column/schema/warehouse from `metricMetadata` when not at top level - Properly strip `metricConfiguration`, full `events` array, and other large nested objects - Added `summary` and `alertCount` to compact response for better context - Normalize output to always use `issues` key **Recommended Workflow**: ``` 1. get_issues(compact=True) → lightweight list of issues 2. User identifies issue of interest (e.g., id=12345) 3. get_issue_details(issue_id=12345) → full details for that issue ``` **Implementation Details**: - `bigeye_api.py`: Added `compact` and `max_issues` parameters to `fetch_issues()` - `bigeye_api.py`: Added `fetch_single_issue(issue_id)` method - `bigeye_api.py`: Fixed response parsing to handle `issue` vs `issues` keys and extract from `metricMetadata` - `server.py`: Updated `get_issues()` tool with new parameters (compact=True, max_issues=15 defaults) - `server.py`: Added `get_issue_details()` tool ### Future Improvements - Add more granular filtering options for search results - Implement caching for frequently accessed data - Add support for bulk operations on issues - ~~Consider adding a tool to get issue details separately from the list~~ ✅ DONE - **Search improvements needed**: - Add automatic space-to-underscore conversion in table/column searches - Implement fuzzy matching or wildcards for more flexible searches - Add search result caching to avoid repeated API calls - Consider parallel searching across Bigeye and Atlan - **Cross-system integration**: - Add mapping between Bigeye and Atlan naming conventions - Help correlate the same assets across both systems - **Error handling**: - Fix JSON parsing exceptions in Bigeye responses - Add retry logic for recoverable errors - Better error messages when searches return no results ### New Tools to Implement and Test #### Core Health & Issue Tools 1. **get_active_issues** (enhancement of existing `get_issues`) - Params: `severity_filter`, `schema_filter`, `owner_filter`, `time_window` - Returns: List of current data quality issues with details - Purpose: More focused than current get_issues with better filtering - Note: Current `get_issues` exists but lacks advanced filtering 2. **get_issue_details** - Params: `issue_id` - Returns: Full issue context including metric history, root cause suggestions - Purpose: Deep dive into a specific issue (separate from list view) - Note: `get_issue_resolution_steps` exists but this would be more comprehensive #### Metric Management Tools 3. **get_metric_coverage** - Params: `table_identifier` - Returns: What metrics are configured, gaps in coverage - Purpose: Identify monitoring blind spots 4. **create_metric** - Params: `table_identifier`, `metric_type`, `configuration` - Returns: Created metric details - Purpose: Programmatically add monitoring 5. **get_metric_history** - Params: `metric_id`, `time_range` - Returns: Historical metric values and anomalies - Purpose: Trend analysis and pattern detection #### Incident Management Tools 6. **get_sla_compliance** - Params: `table_identifier`, `time_period` - Returns: Freshness/quality SLA adherence - Purpose: Track service level compliance #### Analytics & Reporting Tools 7. **generate_quality_report** - Params: `scope` (schema/owner/tag), `time_period`, `format` - Returns: Comprehensive quality metrics and trends - Purpose: Executive reporting and trending 8. **get_anomaly_patterns** - Params: `table_identifier`, `lookback_period` - Returns: Recurring issues, seasonality patterns - Purpose: Identify systemic problems #### Integration Tools (Bigeye + Atlan) 9. **validate_catalog_coverage** - Params: `atlan_catalog_filter` - Returns: Which catalog assets have/lack monitoring - Purpose: Ensure comprehensive monitoring coverage 10. **enrich_issue_context** - Params: `issue_id` - Returns: Issue details enriched with Atlan metadata (owners, documentation, tags) - Purpose: Provide full context by combining both systems **Implementation Notes:** - Start with core health tools as they provide immediate value - Lineage tools require both systems to be properly integrated - Consider rate limiting and caching for analytics tools - Test each tool with realistic data volumes - Ensure proper error handling for cross-system tools ### MCP Resources to Implement MCP Resources provide read-only access to data that can be referenced by Claude. Unlike tools which perform actions, resources are for retrieving and displaying information. #### Data Quality Resources 1. **resource://issues/active** - Description: List of currently active data quality issues - URI Parameters: `?schema={schema_name}&severity={level}&limit={n}` - Returns: JSON list of active issues with status, severity, affected tables - Update Frequency: Every 5 minutes - Use Case: Quick overview of current data quality state 2. **resource://issues/recent** - Description: Recently resolved or updated issues - URI Parameters: `?days={n}&status={status}` - Returns: JSON list of issues with resolution details - Update Frequency: Every 15 minutes - Use Case: Track issue resolution patterns #### Data Catalog Resources 3. **resource://datasources** - Description: List of all configured data sources/warehouses - URI Parameters: None - Returns: JSON list of data sources with connection status, schemas count - Update Frequency: On connection changes - Use Case: Understanding available data sources 4. **resource://schemas/{datasource_id}** - Description: List of schemas in a specific data source - URI Parameters: `datasource_id` (required) - Returns: JSON list of schemas with table counts, monitoring status - Update Frequency: Daily - Use Case: Navigate data hierarchy 5. **resource://tables/{schema_id}** - Description: List of tables in a specific schema - URI Parameters: `schema_id` (required), `?monitored_only={bool}` - Returns: JSON list of tables with row counts, column counts, monitoring status - Update Frequency: Hourly - Use Case: Browse schema contents 6. **resource://columns/{table_id}** - Description: List of columns in a specific table - URI Parameters: `table_id` (required) - Returns: JSON list of columns with data types, nullability, metric coverage - Update Frequency: On schema changes - Use Case: Understand table structure #### Monitoring Configuration Resources 7. **resource://metrics/available** - Description: List of all available metric types - URI Parameters: `?category={category}` - Returns: JSON list of metric types with descriptions, applicable data types - Update Frequency: On platform updates - Use Case: Understand monitoring capabilities 8. **resource://metrics/configured/{table_id}** - Description: List of configured metrics for a specific table - URI Parameters: `table_id` (required) - Returns: JSON list of active metrics with thresholds, schedules - Update Frequency: On configuration changes - Use Case: Review table monitoring setup 9. **resource://deltas** - Description: List of configured data deltas (comparisons) - URI Parameters: `?source_table={id}&target_table={id}` - Returns: JSON list of delta configurations with comparison rules - Update Frequency: On configuration changes - Use Case: Understand data reconciliation setup #### Data Quality Dimensions Resources 10. **resource://dimensions** - Description: Data quality dimensions and their definitions - URI Parameters: None - Returns: JSON list of DQ dimensions (accuracy, completeness, timeliness, etc.) - Update Frequency: Static - Use Case: Educational reference for data quality concepts 11. **resource://dimensions/scores/{table_id}** - Description: Current data quality scores by dimension for a table - URI Parameters: `table_id` (required), `?period={7d|30d|90d}` - Returns: JSON with scores for each dimension, trends - Update Frequency: Daily - Use Case: Holistic quality assessment #### Operational Resources 12. **resource://sla/definitions** - Description: List of defined SLAs for data freshness and quality - URI Parameters: `?schema={name}&critical_only={bool}` - Returns: JSON list of SLA definitions with thresholds - Update Frequency: On SLA changes - Use Case: Understand data contracts 13. **resource://lineage/graph/{node_id}** - Description: Lineage graph for a specific node - URI Parameters: `node_id` (required), `?depth={n}&direction={upstream|downstream}` - Returns: JSON graph structure with nodes and edges - Update Frequency: On lineage changes - Use Case: Visualize data dependencies 14. **resource://notifications/rules** - Description: List of configured notification rules - URI Parameters: `?channel={email|slack|pagerduty}` - Returns: JSON list of notification rules with conditions - Update Frequency: On configuration changes - Use Case: Understand alerting setup 15. **resource://glossary/terms** - Description: Business glossary terms related to data quality - URI Parameters: `?category={category}` - Returns: JSON list of terms with definitions, related metrics - Update Frequency: On glossary updates - Use Case: Business context for technical metrics **Resource Implementation Guidelines:** - Resources should be read-only and idempotent - Use caching to minimize API calls (respect update frequencies) - Provide clear URI parameter documentation - Include metadata like last_updated, total_count - Consider pagination for large result sets - Resources should complement tools, not duplicate them - Prefer structured JSON responses for easy parsing ## Testing When making changes: 1. Always rebuild the Docker image with both tags: ```bash docker build -t bigeye-mcp-server:latest -t bigeye-mcp-ephemeral:latest . ``` 2. Test with Claude Desktop after rebuilding 3. Commit changes with descriptive messages ## API Quirks - ~~The `/api/v1/search` endpoint requires `workspaceId` as a query parameter, not in the request body~~ **UPDATE**: The `/api/v1/search` endpoint doesn't work with workspace ID at all. We now use separate `/api/v1/tables`, `/api/v1/columns`, and `/api/v1/schemas` endpoints instead. - Workspace IDs must be integers, not strings - Some endpoints use camelCase while others use snake_case - be careful with parameter names - Search endpoints require exact matches with underscores (e.g., "sales_dashboard" not "sales dashboard") - The `/api/v1/tables`, `/api/v1/columns`, `/api/v1/schemas` endpoints properly accept `workspaceId` as a query parameter

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bigeyedata/bigeye-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•14.7 KiB