README.md•5.95 kB
---
title: Aurora-MCP
emoji: 🌿
colorFrom: green
colorTo: indigo
sdk: docker
pinned: false
---
# 🌿 Aurora-MCP
**Model Context Protocol (MCP) server providing access to datasets of natural
and synthetic small molecules, with a focus on identifying potential
mitochondrial Complex I inhibitors that may occur in plant species.**
---
## 🔍 Overview
**Aurora-MCP** is a Model Context Protocol (MCP) server and data integration layer that connects natural-product, biodiversity, and mitochondrial-inhibitor datasets.
It enables LLMs and users to query relationships between plant species, small molecules, and mitochondrial Complex I inhibition—bridging COCONUT, Laji.fi, GBIF,
and AI-derived PubMed data through structured joins and metadata schemas.
**Aurora-MCP** is a lightweight **MCP server + Hugging Face Space** designed to bridge two complementary knowledge sources:
1. 🌿 **[Aurora](https://github.com/ndaniel/aurora)** — natural-product and plant biodiversity data, mapping compounds to genera and species found in Nordic ecosystems.
2. 🧬 **[Aurora-Mito-ETL](https://github.com/ndaniel/aurora-mito-etl)** — curated PubMed-derived corpus of small-molecule inhibitors of mitochondrial Complex I (NADH dehydrogenase).
Together they form a conversational dataset where **ChatGPT** (or any MCP-compatible LLM) can reason over structured biological data, ask questions,
and perform targeted searches on small compounds, plants, and mechanistic links between them.
---
## 🧠 Concept
**Goal:** allow scientific dialogue with an LLM grounded in domain data, for example:
> *“Show me plant-derived compounds that inhibit mitochondrial Complex I.”*
> *“Find PubMed evidence for arctigenin as a Complex I inhibitor.”*
> *“List Nordic plants whose metabolites overlap with known ETC inhibitors.”*
Aurora-MCP turns your static text/TSV data into an **interactive semantic backend**, exposing programmatic tools for searching, linking, and reasoning.
a FastAPI‑based MCP endpoint (`/mcp`) that ChatGPT (or any MCP‑aware client)
can connect to. It also provides `/healthz` for status checks and simple debug HTTP routes
for local testing.
---
## 🚀 Quick start (local)
```bash
# 1. Create a clean environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2. Run the MCP HTTP server
uvicorn mcp_server.server:app --host 0.0.0.0 --port 7860
# 3. Check health
curl -s http://127.0.0.1:7860/healthz | jq
# 4. Optional: test the debug routes
curl -s 'http://127.0.0.1:7860/debug/list_files?path=data' | jq
```
You should see something like:
```json
{
"ok": true,
"mcp": "mounted at /mcp",
"tools": ["list_files","read_text"]
}
```
---
## 🧠 Using with ChatGPT (MCP)
1. Deploy this repository to a **Hugging Face Space** (Docker SDK).
2. Wait until the Space is running and `/healthz` returns 200 OK:
`https://huggingface.co/spaces/<you>/<space>/healthz`
3. In ChatGPT → **Settings → Connectors / MCP → Add Server**
- **Server URL:** `https://huggingface.co/spaces/<you>/<space>/mcp`
4. Open a new chat and try for example:
- `list_files(path="data")`
- `read_text(path="README.md")`
- (Aurora domain tools can be added similarly.)
---
## 🐳 Docker (for Hugging Face Spaces)
```dockerfile
FROM python:3.12-slim
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1
RUN apt-get update && apt-get install -y --no-install-recommends build-essential curl ca-certificates && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --upgrade pip && pip install -r requirements.txt
COPY . .
EXPOSE 7860
ENV PORT=7860
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=5 CMD curl -fsS http://127.0.0.1:${PORT}/healthz || exit 1
CMD ["uvicorn","mcp_server.server:app","--host","0.0.0.0","--port","7860"]
```
---
## 🧩 Architecture overview
| Component | Description |
|------------|-------------|
| **FastAPI app** | Hosts the `/mcp` streaming endpoint and `/healthz` check |
| **FastMCP** | MCP server layer that exposes Python functions as MCP tools |
| **Tools** | Simple functions (`list_files`, `read_text`, etc.) that can be called by MCP clients |
| **Aurora domain** | (Future) plant‑compound and inhibitor analytics from your Aurora ETL data |
---
## 📂 Project layout
```
aurora-mcp/
├── mcp_server/
│ ├── server.py # FastAPI + FastMCP entrypoint
│ ├── tools/
│ │ └── files.py # Example tools (list_files, read_text)
│ └── __init__.py
├── data/ # Local data (ignored by git)
├── requirements.txt
├── Dockerfile
├── huggingface.yaml
└── README.md
```
---
## ✅ Health & debug routes
| Endpoint | Purpose |
|-----------|----------|
| `/healthz` | lightweight JSON health check |
| `/debug/list_files` | list directory contents (no MCP) |
| `/debug/read_text` | read a file as plain text |
---
## ⚙️ Requirements
```
fastapi>=0.119
uvicorn>=0.37
mcp>=0.17.0
pydantic>=2.11.9
pandas>=2.3.3
```
Install with:
```bash
pip install -r requirements.txt
```
---
## 🧱 Hugging Face Space metadata
```yaml
# huggingface.yaml
title: Aurora-MCP
sdk: docker
emoji: 🌿
colorFrom: green
colorTo: indigo
pinned: false
```
---
## 🔍 Troubleshooting
| Symptom | Cause / Fix |
|----------|--------------|
| `GET /mcp` → 307/500 | Normal; only MCP clients can connect |
| `Task group is not initialized` | Fixed by FastMCP startup hook |
| `ModuleNotFoundError: mcp.server.fastapi` | Install correct SDK: `pip install mcp fastapi uvicorn` |
| `healthz` returns nothing | Curl 127.0.0.1 not 0.0.0.0 |
---
**Author:** Daniel Nicorici · University of Helsinki
**License:** GNU GPL v3
**URL:** https://github.com/ndaniel/aurora-mcp