Which integrations are available for this server?

Provides tools for searching, retrieving, and QA-checking DOIs (Digital Object Identifiers) from DataCite datafiles. Allows querying repository health and compliance for Zenodo repositories using DataCite metadata.

How do I use DataCite Librarian MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@DataCite Librarian MCP search for DOIs with funder NSF from 2025-06" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

DataCite Librarian MCP

by kaysiz

Overview Schema Related Servers Score Discussions

Python

Local

DataCite Librarian MCP (`mcp-test`)-test

Local Model Context Protocol server for the DataCite community: repository QA, funder compliance, CSV index analytics, search/facets, and exports over DataCite monthly and public datafiles you host on disk.

Built with FastMCP · uv · Python 3.12+ · MIT

This repository does not ship production datafiles. You must download them yourself and keep them outside git (see below). CI runs only against a small mock corpus.

Strict requirements

You must obtain datafiles from DataCite, not from this repo.
Never commit real part_*.jsonl.gz, YYYY-MM.csv.gz, TAR archives, or monthly/public extracts. They are gitignored under data/local/ and at the repo root.
Set DATACITE_DATA_DIR to your extracted data directory. If the variable is set but the path does not exist, the MCP errors (it does not silently use mock data).
Mock data is for demos/tests only (or when DATACITE_DATA_DIR is unset / DATACITE_USE_MOCK=1). It is not a substitute for real monthly/public files.
Aggregate tools scan a limited number of records by default (DATACITE_MAX_RECORDS, default 10000). Always inspect truncated, scan_limit, and (for indexes) coverage_pct so you do not treat a sample as the full corpus.
Metadata vs index: a YYYY-MM.csv.gz index can list hundreds of thousands of DOIs; you need the matching part_*.jsonl.gz files for full QA/search. Use coverage_report to measure the gap.
Respect DataCite access terms: the public annual file is openly documented; the monthly file is for DataCite Members and Consortium Organizations (authenticated S3 access). See official docs linked below.

Related MCP server: CKAN MCP Server

Obtain DataCite datafiles (official documentation)

Follow only DataCite’s documentation and portals. Do not rely on third-party mirrors unless you trust them and accept their terms.

Resource	URL
Data files portal	https://datafiles.datacite.org
Public data file (annual, public DOIs; documented for open use)	DataCite Support — Public Data File
Monthly data file (members/consortium; S3 + credentials)	DataCite Support — Monthly Data File
XML ↔ JSON mapping (record shape)	DataCite Support — XML to JSON
Metadata schema	https://schema.datacite.org

High-level download steps (summary only — details are on Support)

Public annual file

Open datafiles.datacite.org and locate the latest public release (e.g. public-2025).
Download the TAR (or equivalent) per the Public Data File page.
Extract locally to a directory you control (recommended: this repo’s data/local/, which is gitignored).
Confirm you see something like dois/updated_YYYY-MM/part_*.jsonl.gz and/or monthly YYYY-MM.csv.gz indexes inside the extract.

Monthly file (members)

Confirm your organization is a DataCite Member or Consortium participant.
Follow Monthly Data File for temporary AWS credentials and S3 access.
Sync/extract to data/local/ (or another path outside git).
Point DATACITE_DATA_DIR at that root.

After download — required layout for this MCP

Preferred (matches DataCite releases):

data/local/                    # or any path you pass as DATACITE_DATA_DIR
  STATUS.json                  # optional
  MANIFEST.json                # optional
  dois/
    updated_2026-06/
      2026-06.csv.gz           # index: doi, state, client_id, updated
      part_0000.jsonl.gz       # full metadata (~10k records/part typical)
      part_0001.jsonl.gz
      …

Also supported (experimental/flat):

data/local/
  2026-06.csv.gz
  part_0000.jsonl.gz

See data/local/README.md and docs/DATAFILE_SCHEMA.md.

Quick start (development)

Prerequisites

Python 3.12+
uv

Install

git clone <this-repo-url> mcp-test
cd mcp-test
uv sync --all-groups

Run MCP (stdio — for Cursor / Claude Desktop / other MCP hosts)

# Demo/mock only (no real datafiles)
uv run datacite-librarian-mcp

# Production/local datafiles (STRICT: path must exist)
export DATACITE_DATA_DIR="/absolute/path/to/data/local"
export DATACITE_MAX_RECORDS=20000   # optional; raise for fuller scans
uv run datacite-librarian-mcp

Example MCP host config (mcp.json pattern):

{
  "mcpServers": {
    "datacite-librarian": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/absolute/path/to/mcp-test",
        "datacite-librarian-mcp"
      ],
      "env": {
        "DATACITE_DATA_DIR": "/absolute/path/to/mcp-test/data/local",
        "DATACITE_MAX_RECORDS": "20000"
      }
    }
  }
}

Run MCP (HTTP, local testing)

export DATACITE_DATA_DIR="/absolute/path/to/data/local"
uv run python -c "from datacite_librarian_mcp.server import mcp; mcp.run(transport='http', host='127.0.0.1', port=8765)"

Natural-language REPL (maps questions → tools; not a full LLM)

export DATACITE_DATA_DIR="/absolute/path/to/data/local"
uv run datacite-librarian-chat
# or: uv run python scripts/interactive_client.py

Examples: how many DOIs?, how many funders?, repository health for zenodo, funder compliance for European Commission.

Who this is for

Audience	Start with
Librarians / RDM	`community_guide`, `repository_health`, `export_health_issues`
Research offices	`funder_compliance`, `export_funder_issues`
Repository operators	`index_summary`, `index_client`, `coverage_report`
Bibliometrics / policy	`facets`, `top_subjects`, `index_summary` (report `truncated`)
Developers	`server_info`, mock corpus, tests
Teachers	`datacite-librarian-chat`, mock data

Call community_guide from any MCP client for persona-oriented workflows.

Tools (summary)

Discovery: community_guide, server_info, corpus_status, corpus_inventory, diff_partitions_summary

Metadata QA / compliance (needs part_*.jsonl.gz): repository_health, funder_compliance, search_dois, get_doi, check_doi_qa, list_clients, list_funders

Analytics: facets, top_subjects

CSV index only (no JSONL required): index_summary, index_client, coverage_report

Exports (writes under exports/ or DATACITE_EXPORT_DIR): export_health_issues, export_funder_issues, export_search_results

Ops: regenerate_mock_data

Configuration

Variable	Purpose
`DATACITE_DATA_DIR`	Corpus root (must exist if set)
`DATACITE_USE_MOCK`	`1` / `true` forces mock corpus
`DATACITE_MOCK_DIR`	Override mock write/read location
`DATACITE_MAX_RECORDS`	Aggregate scan ceiling (default `10000`)
`DATACITE_DOI_LOOKUP_MAX_SCAN`	`get_doi` ceiling; `0` = full local scan
`DATACITE_EXPORT_DIR`	Export output directory

Development & CI

uv sync --all-groups
uv run pytest
uv run ruff check src tests

GitHub Actions (.github/workflows/ci.yml) runs ruff + pytest on Python 3.12 and 3.13 with DATACITE_USE_MOCK=1 only—no real datafiles in CI.

Project docs:

docs/DATAFILE_SCHEMA.md — file/record schema notes
docs/DEVELOPMENT.md — contributor workflow
data/local/README.md — where to put downloads

Design principles

Local-first — organizations keep datafiles; this project never distributes bulk DOI corpora.
Stream-read — gzip JSONL/CSV line-by-line; suitable for large files without a database.
Light dependencies — fastmcp + pydantic (+ stdlib).
Honest limits — truncated, scan_limit, coverage_pct on tool outputs.
Index without metadata — CSV tools help before all part_*.jsonl.gz are downloaded.

License

MIT — see LICENSE.

DataCite bulk metadata licensing and access are governed by DataCite (public file documentation typically describes CC0 for metadata; confirm on Support). Member monthly access may be restricted. This software does not redistribute production datafiles.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kaysiz/mcp-test'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

DataCite Librarian MCP

DataCite Librarian MCP (`mcp-test`)-test

Strict requirements

Obtain DataCite datafiles (official documentation)

High-level download steps (summary only — details are on Support)

Quick start (development)

Prerequisites

Install

Run MCP (stdio — for Cursor / Claude Desktop / other MCP hosts)

Run MCP (HTTP, local testing)

Natural-language REPL (maps questions → tools; not a full LLM)

Who this is for

Tools (summary)

Configuration

Development & CI

Design principles

License

Links

Maintenance

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API

DataCite Librarian MCP (mcp-test)-test

Strict requirements

Obtain DataCite datafiles (official documentation)

High-level download steps (summary only — details are on Support)

Quick start (development)

Prerequisites

Install

Run MCP (stdio — for Cursor / Claude Desktop / other MCP hosts)

Run MCP (HTTP, local testing)

Natural-language REPL (maps questions → tools; not a full LLM)

Who this is for

Tools (summary)

Configuration

Development & CI

Design principles

License

Links

Maintenance

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API

DataCite Librarian MCP (`mcp-test`)-test