Biomart MCP
by jzinno
# Biomart MCP
### A MCP server to interface with Biomart
[Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is an open protocol that standardizes how applications provide context to LLMs developed by [Anthropic](https://www.anthropic.com/). Here we use the [MCP python-sdk](https://github.com/modelcontextprotocol/python-sdk) to create a MCP server that interfaces with Biomart via the [pybiomart](https://github.com/jrderuiter/pybiomart) package.

There is a short [demo video](assets/mcp-demo.mp4) showing the MCP server in action on Claude Desktop.
## Installation
### Clone the repository
```bash
git clone https://github.com/jzinno/biomart-mcp.git
cd biomart-mcp
```
### Claude Desktop
```bash
uv run --with mcp[cli] mcp install --with pybiomart biomart-mcp.py
```
### Cursor
Via Cusror's agent mode, other models can take advantage of MCP servers as well, such as those form OpenAI or DeepSeek. Click the cursor setting cogwheel and naviagate to `Features` -> `MCP Servers` -> `Add new MCP Server`. Set the name to `biomart` (or whatever you like) and `Type` to `command`.
Set the command to:
```bash
uv run --with mcp[cli] --with pybiomart mcp run /your/path/to/biomart-mcp.py
```
### Development
```bash
# Create a virtual environment
uv venv
# MacOS/Linux
source .venv/bin/activate
# Windows
.venv\Scripts\activate
uv sync #or uv add mcp[cli] pybiomart
# Run the server in dev mode
mcp dev biomart-mcp.py
```
## Features
Biomart-MCP provides several tools to interact with Biomart databases:
- **Mart and Dataset Discovery**: List available marts and datasets to explore the Biomart database structure
- **Attribute and Filter Exploration**: View common or all available attributes and filters for specific datasets
- **Data Retrieval**: Query Biomart with specific attributes and filters to get biological data
- **ID Translation**: Convert between different biological identifiers (e.g., gene symbols to Ensembl IDs)
## Contributing
Pull requests are welcome! Some small notes on development:
- We are only using `@mcp.tool()` here by design, this is to maximize compatibility with clients that support MCP as seen in the [docs](https://modelcontextprotocol.io/clients).
- We are using `@lru_cache` to cache results of functions that are computationally expensive or make external API calls.
- We need to be mindful to not blow up the context window of the model, for example you'll see `df.to_csv(index=False).replace("\r", "")` in many places. This csv style return is much more token efficient than something like `df.to_string()` where the majority of the tokens are whitespace. Also be mindful of the fact that pulling all genes from a chromosome or similar large request will also be too large for the context window.
## Potential Future Features
There of course many more features that could be added, some maybe beyond the scope of the name `biomart-mcp`. Here are some ideas:
- Add webscraping for resource sites with `bs4`, for example we got the Ensembl gene ID for NOTCH1 then maybe in some cases it would be usful to grap the collated `Comments and Description Text from UniProtKB` section from [it's page on UCSC](https://genome.ucsc.edu/cgi-bin/hgGene?db=hg38&hgg_chrom=chr9&hgg_gene=ENST00000651671.1&hgg_start=136494433&hgg_end=136546048&hgg_type=knownGene)
- $...$