Provides natural language search over open-access datasets through the EOSC Data Commons OpenSearch service, including tools to search datasets, retrieve dataset metadata, and discover files within datasets.
🔭 EOSC Data Commons Search server
A server for the EOSC Data Commons project MatchMaker service, providing natural language search over open-access datasets. It exposes an HTTP POST endpoint and supports the Model Context Protocol (MCP) to help users discover datasets and tools via a Large Language Model–assisted search.
🧩 Endpoints
The HTTP API comprises 2 main endpoints:
/mcp: MCP server that searches for relevant data to answer a user question using the EOSC Data Commons OpenSearch serviceUses Streamable HTTP transport
Available tools:
Search datasets
Get metadata for the files in a dataset (name, description, type of files)
Search tools
Search citations related to datasets or tools
/chat: HTTP POST endpoint (JSON) for chatting with the MCP server tools via an LLM provider (API key provided through env variable at deployment)Streams Server-Sent Events (SSE) response complying with the AG-UI protocol.
It can also be used just as a MCP server through the pip package.
🔌 Connect client to MCP server
The system can be used directly as a MCP server using either STDIO, or Streamable HTTP transport.
You will need access to a pre-indexed OpenSearch instance for the MCP server to work.
Follow the instructions of your client, and use the /mcp URL of your deployed server (e.g. http://localhost:8000/mcp)
To add a new MCP server to VSCode GitHub Copilot:
Open the Command Palette (
ctrl+shift+porcmd+shift+p)Search for
MCP: Add Server...Choose
HTTP, and provide the MCP server URL http://localhost:8000/mcp
Your VSCode mcp.json should look like:
Or with STDIO transport:
Or using local folder for development:
🛠️ Development
Requirements:
uv, to easily handle scripts and virtual environmentsdocker, to deploy the OpenSearch service (or just access to a running instance)
API key for a LLM provider: e-infra CZ, Mistral.ai, or OpenRouter
📥 Install dev dependencies
Install pre-commit hooks:
Create a keys.env file with your LLM provider API key(s):
⚡️ Start dev server
Start the server in dev at http://localhost:8000, with MCP endpoint at http://localhost:8000/mcp
Default
OPENSEARCH_URL=http://localhost:9200
Customize server configuration through environment variables:
Example curl request:
Recommended model per supported provider:
einfracz/qwen3-coderoreinfracz/gpt-oss-120b(smaller, faster)mistralai/mistral-medium-latest(large is older, and not as good with tool calls)groq/moonshotai/kimi-k2-instructopenai/gpt-4.1
To build and integrate the frontend web app to the server, from the frontend folder run:
📦 Build for production
Build binary in dist/
🐳 Deploy with Docker
Create a keys.env file with the API keys:
SEARCH_API_KEY can be used to add a layer of protection against bots that might spam the LLM, if not provided no API key will be needed to query the API.
You can use the prebuilt docker image ghcr.io/eosc-data-commons/data-commons-search:main
Example compose.yml:
Build and deploy the service:
Current deployment to staging server is done automatically through GitHub Actions at each push to the main branch.
When a push is made the workflow will:
Pull the
mainbranch from the frontend repositoryBuild the frontend, and add it to
src/data_commons_search/webappBuild the docker image for the server
Publish the docker image as
main/latestThe staging infrastructure then automatically pull the
latestversion of the image and deploys it.
✅ Run tests
You need to first start the server on port 8001 (see start dev server section)
To display all logs when debugging:
🧹 Format code and type check
♻️ Reset the environment
Upgrade uv:
Clean uv cache:
🏷️ Release process
Get a PyPI API token at pypi.org/manage/account.
Run the release script providing the version bump: fix, minor, or major
Add your PyPI token to your environment, e.g. in ~/.zshrc or ~/.bashrc:
🤝 Acknowledments
The LLM provider einfracz is a service provided by e-INFRA CZ and operated by CERIT-SC Masaryk University
Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
This server cannot be installed