Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Data Catalogfind all users in the users dataset where the role is admin"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Data Catalog
A Model Context Protocol (MCP) server that provides AI assistants with structured access to tabular datasets from CSV files. Query, filter, and retrieve data through a clean, type-safe interface.
Features (MVP)
✅ 4 MCP Tools
list_datasets- List all available datasetsdescribe_dataset- Get schema and field informationquery_dataset- Query with filters, projections, and limitsget_by_id- Retrieve specific row by lookup key
✅ Type-Safe Schema
String, number, boolean, and enum field types
Field validation and type checking
Required field enforcement
✅ Filtering (MVP)
eq(equals)contains(case-insensitive substring)and(multiple conditions)
✅ Smart Limits
Per-dataset row limits
Truncation indicators
Configurable defaults
✅ Hot Reload
Config changes apply automatically (1-3ms)
No server restart needed
Invalid configs are rejected safely
✅ Stable MVP
Hexagonal architecture
Comprehensive test coverage
Type-safe implementation
Production-quality error handling
Quick Start
1. Install
2. Configure
Copy an example configuration:
Or create your own:
3. Run
The server starts on stdio and exposes 4 MCP tools.
Usage
MCP Client Configuration
Add to your MCP client config (e.g., Claude Desktop):
Note: Replace
/path/to/...with your actual local file paths. The MCP server runs as a Node.js process and reads theCONFIG_PATHenvironment variable at startup.
Available Tools
1. list_datasets
List all configured datasets.
Request:
Response:
2. describe_dataset
Get detailed schema information for a dataset.
Request:
Response:
3. query_dataset
Query a dataset with optional filters, field projection, and limits.
Simple Query:
With Filter:
With Multiple Filters:
With Field Projection:
Response:
4. get_by_id
Retrieve a single row by its lookup key.
Request:
Response:
If not found:
Filter Operators
The MVP supports three operators:
eq (equals)
Exact match (case-sensitive for strings).
contains (substring)
Case-insensitive substring search.
and (conjunction)
All conditions must be true.
Filter Schema
All filter expressions follow this canonical JSON structure:
Simple filter:
Compound filter (and):
Post-MVP: Additional operators coming (ne, gt, gte, lt, lte, in, or) using the same structure.
Configuration
Dataset Structure
Field Types
Type | Description | Example Values |
| Text data |
|
| Numeric data |
|
| True/false |
|
| Predefined values |
|
Configuration Format
Configuration uses JSON format. This is the primary and recommended format for the MVP.
Note: YAML support may be added in future versions, but JSON remains the canonical format.
Configuration Validation
The server validates configuration on startup and rejects invalid configs:
✅ Checks performed:
All required fields present
Field types are valid
Enum fields have non-empty
valuesarraysvisibleFieldsreference existing fieldslookupKeyreferences an existing fieldDataset IDs are unique
Limits are valid (positive, maxRows ≥ defaultRows)
CSV files exist and are readable
Fail-Fast: Invalid configuration prevents server startup with clear error messages.
CSV File Format
Requirements
Header row with column names (first row)
Column names must match field definitions (case-sensitive)
Data types must match field types
UTF-8 encoding
Standard CSV format (comma-delimited)
Example
Type Formatting
Boolean: Must be true or false (lowercase)
Number: Integers or decimals
Enum: Must match one of the configured values
Hot Reload
Configuration changes are detected automatically:
Edit
config/datasets.jsonSave the file
Changes apply in 1-3ms (catalog swap only)
Invalid changes are rejected (keeps current config)
Watch the logs:
How it works:
Config file is watched for changes
On change: validates new config
If valid: atomically swaps to new catalog
If invalid: preserves current state, logs error
No server restart needed!
AI Usage Guidelines
This server is optimized for local, schema-aware access to CSV-backed reference data – the kind of data I use for project design, exploration, documentation aggregation, and hobby systems. For high-volume or mission-critical production workloads, you would typically pair LLMs with a dedicated database-backed MCP server and keep this catalog focused on lightweight, structured datasets close to the agent.
When designing datasets:
Dataset Design for AI
Keep datasets focused:
Small, single-purpose tables work better than large multi-purpose sheets
Separate reference data (IDs, names, codes) from descriptive content
Break complex domains into multiple related datasets
Optimize for token efficiency:
Use
visibleFieldsto expose only necessary columnsKeep field names short but meaningful
Prefer IDs and codes over long text fields for filtering
Design for stable querying:
Use consistent, stable identifiers (numeric IDs, SKUs, codes)
Avoid relying on free-text names as lookup keys
Normalize categorical data (use enums, not free text)
Structure for filtering:
Tag-based fields enable flexible queries (
status,category,type)Use enums for controlled vocabularies
Boolean flags for common filters (
active,published,available)
Example patterns:
Index dataset: IDs, names, tags, status (small, frequently queried)
Detail dataset: Full records with all fields (queried by ID)
Reference dataset: Lookup tables, enums, measurement scales (small, stable)
Architecture
This project follows Hexagonal Architecture for clean separation of concerns:
Key Principles:
Domain layer has zero external dependencies
Dependencies point inward (adapters → domain)
Business logic is isolated and testable
Easy to extend with new adapters
See docs/dev/mcp-data-catalog.md for detailed architecture documentation.
Examples
The examples/ directory contains:
Configurations
minimal.json- Single dataset, basic featurestypical.json- Multiple datasets, common patternsadvanced.json- Complex scenarios with many features
Datasets
minimal.csv- 5 rows, 2 columnssample-users.csv- 10 users with rolessample-products.csv- 15 products with categoriesemployees.csv- 15 employees with departmentsinventory.csv- 20 inventory itemsorders.csv- 20 customer orders
Try them:
See examples/README.md for detailed documentation.
Development
Setup
Run in Development Mode
Run Tests
Build for Production
Output in dist/ directory.
Testing
Test Coverage:
Comprehensive test suite with high coverage
Unit tests for domain logic and use cases
Integration tests for MCP tools and hot reload
Both statement and branch coverage tracked
Test Structure:
Performance
Characteristics:
Config reload: 1-3ms (catalog swap only)
CSV load: 5-10ms per file (varies with size)
Query execution: 1-2ms for in-memory operations
Memory: O(n) where n = dataset size
CSV Loading Behavior:
CSV files are read on-demand for each query
No in-memory caching of CSV data
This keeps memory usage low but includes file I/O in query latency
Config catalog is cached; only CSV data is loaded per-query
Scalability:
Suitable for datasets up to ~100K rows
Query latency includes file read time (~5-10ms per CSV)
For high-performance or large datasets, use database backends (post-MVP)
Consider dataset design: multiple small CSVs better than one large CSV
Roadmap
✅ MVP Complete (All 6 Phases)
Hexagonal architecture
4 MCP tools (list, describe, query, get_by_id)
MVP filter operators (eq, contains, and)
Type validation (string, number, boolean, enum)
Hot reload support
Comprehensive test coverage
Error handling and logging
Complete documentation (README, dev guide, examples)
🚀 Post-MVP Features
Additional filter operators (ne, gt, gte, lt, lte, in, or)
Sorting (ORDER BY)
Complex types (arrays, nested objects)
Multiple data sources (PostgreSQL, SQLite, JSON)
Aggregations (COUNT, SUM, AVG, etc.)
Full-text search
Caching layer for performance
GraphQL-style query language
See docs/project-plans/project-plan-v1.final.md for details.
Documentation
Project Overview - Development approach and engineering narrative
Developer Guide - Architecture and internals
Configuration Reference - Complete config documentation
Examples - Sample datasets and configs
Project Plan - MVP scope and roadmap
Phase Execution - Implementation tracking
Contributing
Contributions welcome! This project follows:
Hexagonal architecture - Keep domain pure
Test-driven development - Write tests first
Type safety - Leverage TypeScript
Clean code - Follow existing patterns
See .clinerules/core-rules.md for architectural guidelines.
Troubleshooting
Server won't start
Check configuration:
Common issues:
CSV file path incorrect
Field type mismatch
Missing required fields
Duplicate dataset IDs
Queries return no results
Verify dataset:
Check:
Dataset ID is correct
CSV file has data
Filters match field types
Field names are in
visibleFields
Hot reload not working
Verify file watching:
Config file path is correct
File system permissions allow reading
Check server logs for reload confirmation
License
MIT
Credits
Built with:
Project Status
Version: 1.0.0-mvp
Status: Stable MVP
Last Updated: 2025-11-30
Note: Test and performance numbers in badges reflect the state at release. Designed for production-style workloads, but validate performance and fit for your specific environment.
All 6 phases complete:
✅ Phase 1: Skeleton & Config
✅ Phase 2: Core Use Cases
✅ Phase 3: Hot Reload
✅ Phase 4: MCP Adapter
✅ Phase 5: Hardening & Testing
✅ Phase 6: Documentation
See docs/execution/master-checklist.md for detailed progress.