Tabular Document Retriever MCP
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Tabular Document Retriever MCPFind rows in the spreadsheet that match 'marketing expenses in Q3'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Tabular Document Retriever MCP
This is a Model Context Protocol (MCP) server that transforms tabular data (CSV/Excel) into Markdown key-value pairs, embeds them into a local vector database (ChromaDB), and provides retrieval tools to contextually answer queries.
It leverages the python uv package manager, mcp SDK, and FastAPI to optionally expose the server using Server-Sent Events (SSE).
Vibe-Coded
The entire code base was generated by Antigravity with the help of Gemini 3.1 Pro (High and Fast) to make a fast proof of concept.
Features
Ingestion: Parses
.csvand.xlsxfiles and upserts Markdown-formatted strings into ChromaDB.Retrieval Engine: Uses
sentence-transformers/all-MiniLM-L6-v2locally for exact semantic search.MCP Server: Provides three tools exposed over an SSE endpoint:
retrieve_batchretrieve_singleretrieve_by_query
Dockerization: Quick spin-up of the Database and the MCP Server together without exposing the raw database to the host machine.
Prerequisites
🚀 Running the Stack
To start the server and the ChromaDB vector database locally:
docker-compose up -d --buildThis will launch:
ChromaDB internally on
chroma-db:8000.MCP Server accessible externally on
http://localhost:8000/sse.
💾 Ingesting User Data
Before you can search, you need to ingest tabular data into the running ChromaDB instance. There are two ways to ingest data: using the command-line ingestor script or via the HTTP API.
1. CLI Ingestor (src/ingestor.py)
You can use the built CLI ingestor directly from your host machine. Make sure to map environment variables appropriately to reach your local stack or run it via Docker Compose.
To run the ingestor against a locally running ChromaDB (or inside the container):
# First, ensure dependencies are synced
uv sync
# Run the ingestor (Assuming there's a file `data/my_table.csv`)
# When interacting with the dockerized ChromeDB, make sure to temporarily expose port 8000 for chroma-db, OR simply just run ingestion locally with local persistence.
uv run python -m src.ingestor data/my_table.csvFiltering Embedded Columns: You can optionally specify which columns from your file should be converted into the embedded document text. By default, all columns are used.
# Embed only the 'title' and 'description' columns
uv run python -m src.ingestor data/my_table.csv -c title -c description2. HTTP Ingestion API (src/api.py)
The application also exposes an ingestion endpoint at POST /ingest/file. You can upload .csv or .xlsx files directly to the server:
curl -X POST -F "file=@data/my_table.csv" http://localhost:8000/ingest/fileFiltering Embedded Columns via API:
Just like the CLI, you can filter which columns are embedded by appending the embed_columns query parameter:
curl -X POST -F "file=@data/my_table.csv" "http://localhost:8000/ingest/file?embed_columns=title&embed_columns=description"Note: Since the docker stack makes ChromaDB private, you can either map a port for chroma-db in docker-compose.yml temporarily, or run a one-off task using docker-compose:
docker-compose exec mcp-app python src/ingestor.py /path/to/mounted/data.csv🔐 Authentication
ChromaDB authentication can now be configured using environment variables. Both the server and the MCP clients (src/api.py, src/ingestor.py, and src/database.py) respect the following variables:
CHROMA_CLIENT_AUTH_PROVIDER: the authentication provider class (e.g.,chromadb.auth.token_authn.TokenAuthClientProvider)CHROMA_CLIENT_AUTH_CREDENTIALS: the credential/token string
When running ChromaDB in a secure environment, ensure you also configure the server (e.g., via CHROMA_SERVER_AUTH_PROVIDER set to chromadb.auth.token_authn.TokenAuthenticationServerProvider).
🚢 Kubernetes Deployment
You can deploy the full stack to a Kubernetes cluster using the provided k8s-deployment.yaml file. This file includes definitions for both the chroma-db deployment and the mcp-app matching server.
# Deploy to your Kubernetes cluster
kubectl apply -f k8s-deployment.yaml
# Check the status of the deployments
kubectl get podsEnsure you configure a valid Kubernetes Secret named chroma-auth-secret containing your auth-token if you wish to use the token authentication out-of-the-box.
🛠️ MCP Tools
Once running, any MCP client can connect to http://localhost:8000/sse via Server-Sent Events (SSE).
Available tools (all accept an optional where parameter for ChromaDB metadata filtering):
retrieve_single(row, top_k=5, where=None): Top-K search using a single row's markdown string.retrieve_batch(rows, top_k=5, where=None): Batch retrieval handling a list of markdown row strings.retrieve_by_query(query, top_k=5, where=None): Free-text query mapped exactly to ChromaDB's search.
💻 Local Testing Example
You can test the running MCP server locally using the official Python SDK. First, ensure you have the mcp package installed in your environment (uv pip install mcp or uv add mcp).
Run the example with:
uv run python tests/test_client.py(Note: The test client was recently upgraded to include testing explicit where metadata queries into ChromaDB).
🗄️ Checking ChromaDB Records
You can easily dump the ingested records directly from your local container exposed on port 8001. A utility script is provided to connect to the database and retrieve all content from the tabular_data collection.
Run the script using:
uv run python tests/dump_records.pyAlternatively, you can query the ChromaDB REST API directly using curl to list the collections and check the status of your data:
# List all collections
curl http://localhost:8001/api/v1/collections⚙️ Configuration
The MCP Server configuration is defined in src/server.py (around line 12). If you need to customize the server name, host address, or port, you can directly modify the NorthMCPServer initialization:
# src/server.py
mcp = NorthMCPServer("tabular-document-retriever", host="0.0.0.0", port=8000)Name (
"tabular-document-retriever"): The identifier for the server. Change this if you want the server to be recognized under a different name by your MCP clients.Host (
"0.0.0.0"): The network interface the server binds to.0.0.0.0allows external connections (necessary when running in Docker). You can change it to127.0.0.1orlocalhostto restrict connections to the local machine only.Port (
8000): The port the server listens on. If you change this, ensure you also update any corresponding port mapping in yourdocker-compose.ymlor client configuration.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/lbaret/chromadb-retriever-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server