DataCite Librarian MCP
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@DataCite Librarian MCPsearch for DOIs with funder NSF from 2025-06"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
DataCite Librarian MCP (mcp-test)
Local Model Context Protocol server for the DataCite community: repository QA, funder compliance, CSV index analytics, search/facets, and exports over DataCite monthly and public datafiles you host on disk.
Built with FastMCP · uv · Python 3.12+ · MIT
This repository does not ship production datafiles. You must download them yourself and keep them outside git (see below). CI runs only against a small mock corpus.
Strict requirements (read before running)
You must obtain datafiles from DataCite, not from this repo.
Never commit real
part_*.jsonl.gz,YYYY-MM.csv.gz, TAR archives, or monthly/public extracts. They are gitignored underdata/local/and at the repo root.Set
DATACITE_DATA_DIRto your extracted data directory. If the variable is set but the path does not exist, the MCP errors (it does not silently use mock data).Mock data is for demos/tests only (or when
DATACITE_DATA_DIRis unset /DATACITE_USE_MOCK=1). It is not a substitute for real monthly/public files.Aggregate tools scan a limited number of records by default (
DATACITE_MAX_RECORDS, default10000). Always inspecttruncated,scan_limit, and (for indexes)coverage_pctso you do not treat a sample as the full corpus.Metadata vs index: a
YYYY-MM.csv.gzindex can list hundreds of thousands of DOIs; you need the matchingpart_*.jsonl.gzfiles for full QA/search. Usecoverage_reportto measure the gap.Respect DataCite access terms: the public annual file is openly documented; the monthly file is for DataCite Members and Consortium Organizations (authenticated S3 access). See official docs linked below.
Related MCP server: CKAN MCP Server
Obtain DataCite datafiles (official documentation)
Follow only DataCite’s documentation and portals. Do not rely on third-party mirrors unless you trust them and accept their terms.
Resource | URL |
Data files portal | |
Public data file (annual, public DOIs; documented for open use) | |
Monthly data file (members/consortium; S3 + credentials) | |
XML ↔ JSON mapping (record shape) | |
Metadata schema |
High-level download steps (summary only — details are on Support)
Public annual file
Open datafiles.datacite.org and locate the latest public release (e.g.
public-2025).Download the TAR (or equivalent) per the Public Data File page.
Extract locally to a directory you control (recommended: this repo’s
data/local/, which is gitignored).Confirm you see something like
dois/updated_YYYY-MM/part_*.jsonl.gzand/or monthlyYYYY-MM.csv.gzindexes inside the extract.
Monthly file (members)
Confirm your organization is a DataCite Member or Consortium participant.
Follow Monthly Data File for temporary AWS credentials and S3 access.
Sync/extract to
data/local/(or another path outside git).Point
DATACITE_DATA_DIRat that root.
After download — required layout for this MCP
Preferred (matches DataCite releases):
data/local/ # or any path you pass as DATACITE_DATA_DIR
STATUS.json # optional
MANIFEST.json # optional
dois/
updated_2026-06/
2026-06.csv.gz # index: doi, state, client_id, updated
part_0000.jsonl.gz # full metadata (~10k records/part typical)
part_0001.jsonl.gz
…Also supported (experimental/flat):
data/local/
2026-06.csv.gz
part_0000.jsonl.gzSee data/local/README.md and docs/DATAFILE_SCHEMA.md.
Quick start (development)
Prerequisites
Python 3.12+
Install
git clone <this-repo-url> mcp-test
cd mcp-test
uv sync --all-groupsRun MCP (stdio — for Cursor / Claude Desktop / other MCP hosts)
# Demo/mock only (no real datafiles)
uv run datacite-librarian-mcp
# Production/local datafiles (STRICT: path must exist)
export DATACITE_DATA_DIR="/absolute/path/to/data/local"
export DATACITE_MAX_RECORDS=20000 # optional; raise for fuller scans
uv run datacite-librarian-mcpExample MCP host config (mcp.json pattern):
{
"mcpServers": {
"datacite-librarian": {
"command": "uv",
"args": [
"run",
"--directory",
"/absolute/path/to/mcp-test",
"datacite-librarian-mcp"
],
"env": {
"DATACITE_DATA_DIR": "/absolute/path/to/mcp-test/data/local",
"DATACITE_MAX_RECORDS": "20000"
}
}
}
}Run MCP (HTTP, local testing)
export DATACITE_DATA_DIR="/absolute/path/to/data/local"
uv run python -c "from datacite_librarian_mcp.server import mcp; mcp.run(transport='http', host='127.0.0.1', port=8765)"Natural-language REPL (maps questions → tools; not a full LLM)
export DATACITE_DATA_DIR="/absolute/path/to/data/local"
uv run datacite-librarian-chat
# or: uv run python scripts/interactive_client.pyExamples: how many DOIs?, how many funders?, repository health for zenodo, funder compliance for European Commission.
Who this is for
Audience | Start with |
Librarians / RDM |
|
Research offices |
|
Repository operators |
|
Bibliometrics / policy |
|
Developers |
|
Teachers |
|
Call community_guide from any MCP client for persona-oriented workflows.
Tools (summary)
Discovery: community_guide, server_info, corpus_status, corpus_inventory, diff_partitions_summary
Metadata QA / compliance (needs part_*.jsonl.gz): repository_health, funder_compliance, search_dois, get_doi, check_doi_qa, list_clients, list_funders
Analytics: facets, top_subjects
CSV index only (no JSONL required): index_summary, index_client, coverage_report
Exports (writes under exports/ or DATACITE_EXPORT_DIR): export_health_issues, export_funder_issues, export_search_results
Ops: regenerate_mock_data
Configuration
Variable | Purpose |
| Corpus root (must exist if set) |
|
|
| Override mock write/read location |
| Aggregate scan ceiling (default |
|
|
| Export output directory |
Development & CI
uv sync --all-groups
uv run pytest
uv run ruff check src testsGitHub Actions (.github/workflows/ci.yml) runs ruff + pytest on Python 3.12 and 3.13 with DATACITE_USE_MOCK=1 only—no real datafiles in CI.
Project docs:
docs/DATAFILE_SCHEMA.md — file/record schema notes
docs/DEVELOPMENT.md — contributor workflow
data/local/README.md — where to put downloads
Design principles
Local-first — organizations keep datafiles; this project never distributes bulk DOI corpora.
Stream-read — gzip JSONL/CSV line-by-line; suitable for large files without a database.
Light dependencies —
fastmcp+pydantic(+ stdlib).Honest limits —
truncated,scan_limit,coverage_pcton tool outputs.Index without metadata — CSV tools help before all
part_*.jsonl.gzare downloaded.
License
MIT — see LICENSE.
DataCite bulk metadata licensing and access are governed by DataCite (public file documentation typically describes CC0 for metadata; confirm on Support). Member monthly access may be restricted. This software does not redistribute production datafiles.
Links
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/kaysiz/mcp-test'
If you have feedback or need assistance with the MCP directory API, please join our Discord server