Integrates with Hugging Face Spaces to provide a web-based Streamlit interface for browsing datasets and testing queries on plant species, small molecules, and mitochondrial inhibitor data.
Provides access to PubMed literature data, enabling searches for research papers on mitochondrial Complex I inhibitors and natural compounds, with tools to generate PubMed URLs from PMIDs and extract MeSH tags from scientific literature.
π Aurora-MCP
Model Context Protocol (MCP) server providing access to datasets of natural and synthetic small molecules, with a focus on identifying potential mitochondrial Complex I inhibitors that may occur in plant species.
π Overview
Aurora-MCP is a Model Context Protocol (MCP) server and data integration layer that connects natural-product, biodiversity, and mitochondrial-inhibitor datasets. It enables LLMs and users to query relationships between plant species, small molecules, and mitochondrial Complex I inhibitionβbridging COCONUT, Laji.fi, GBIF, and AI-derived PubMed data through structured joins and metadata schemas.
Aurora-MCP is a lightweight MCP server + Hugging Face Space designed to bridge two complementary knowledge sources:
πΏ Aurora β natural-product and plant biodiversity data, mapping compounds to genera and species found in Nordic ecosystems.
𧬠Aurora-Mito-ETL β curated PubMed-derived corpus of small-molecule inhibitors of mitochondrial Complex I (NADH dehydrogenase).
Together they form a conversational dataset where ChatGPT (or any MCP-compatible LLM) can reason over structured biological data, ask questions, and perform targeted searches on small compounds, plants, and mechanistic links between them.
π§ Concept
Goal: allow scientific dialogue with an LLM grounded in domain data, for example:
βShow me plant-derived compounds that inhibit mitochondrial Complex I.β
βFind PubMed evidence for arctigenin as a Complex I inhibitor.β
βList Nordic plants whose metabolites overlap with known ETC inhibitors.β
Aurora-MCP turns your static text/TSV data into an interactive semantic backend, exposing programmatic tools for searching, linking, and reasoning.
π§© Key Features
Capability | Description |
ποΈ File introspection | List and read data files under
(merged from Aurora + Aurora-Mito-ETL). |
π Regex & keyword search | Query files using regex (e.g. *inhibit. complex I , NADH oxidoreductase ). |
𧬠PubMed integration | Auto-generate PubMed URLs from PMIDs for fast evidence lookup. |
π§Ύ MeSH & compound tag extraction | Parse MeSH IDs, chemical codes, or compound names from TSV files. |
πΏ Plantβcompound linkage | Bridge plant genera/species (Aurora) with small-molecule inhibitors (Aurora-Mito-ETL). |
π§© Hugging Face Space | Streamlit demo UI to browse data and test queries visually. |
π€ MCP tools for ChatGPT | Connect directly to ChatGPT via MCP for grounded conversational access. |
π§± Architecture
π Quick Start (local)
β Add to ChatGPT (MCP)
In ChatGPT, open Settings β Model Context Protocol β Add server
Command:
python -m mcp_server.server
Working directory: this repository root (
aurora-mcp
)Start chatting about compounds, plants, or PubMed evidence β all grounded in your data!
π» Hugging Face Space
Option 1 β Python Space (recommended):
Push this repo to a new Space.
Spaces runs
app/app.py
(Streamlit UI).Use the console or HTTP
/mcp
endpoint for MCP access.
Option 2 β Docker Space:
Build via the included
Dockerfile
.Exposes:
Streamlit UI on :7860
MCP HTTP gateway on :8000/mcp
π Example MCP Tools
Tool | Purpose |
| List domain files under
. |
| Read contents of a text file. |
| Regex search within selected files. |
| Build PubMed link for given PMID. |
| Extract MeSH or chemical IDs. |
| List unique values from TSV column. |
You can easily extend these to include:
Compound β plant cross-references
Frequency summaries
Co-occurrence matrices
Filtered outputs for downstream ML models
π§© Example Conversations (MCP-ChatGPT)
User:
List natural compounds found in Arctium lappa that have PubMed evidence of Complex I inhibition.
Aurora-MCP:
Arctigenin β PubMed ID 22095235
https://pubmed.ncbi.nlm.nih.gov/22095235/
Described as an AMPK activator via inhibition of mitochondrial Complex I.Related lignans: arctiin, matairesinol.
π§ͺ Development
Extend tools in
mcp_server/tools/
.Keep MCP responses lightweight (avoid full-file dumps).
Test new tools locally:
pytest -v
βοΈ License
GNU General Public License v3.0 (GPL-3.0)
See LICENSE for details.
π€ Author
Daniel N.
University of Helsinki β Department of Computer Science / Precision Medicine
Bioinformatics β’ Machine Learning β’ Mitochondrial Metabolism β’ Natural-Product Discovery
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables querying relationships between plant species, small molecules, and mitochondrial Complex I inhibitors by bridging natural-product, biodiversity, and PubMed datasets. Allows LLMs to perform structured searches and reasoning over biological data to identify potential plant-derived mitochondrial inhibitors.