Skip to main content
Glama
ywatanabe1989

crossref-local

CrossRef Local (crossref-local)

PyPI version Documentation Tests Coverage Python License

Live demonstration of MCP server integration with Claude Code for epilepsy seizure prediction literature review:

  • Full-text search on title, abstracts, and keywords across 167M papers (22ms response)

📄 Full demo documentation | 📊 Generated diagrams

Built for the LLM era - features that matter for AI research assistants:

Feature

Benefit

📝 Abstracts

Full text for semantic understanding

📊 Impact Factor

Filter by journal quality

🔗 Citations

Prioritize influential papers

Speed

167M records in ms, no rate limits

Perfect for: RAG systems, research assistants, literature review automation.

pip install crossref-local

From source:

git clone https://github.com/ywatanabe1989/crossref-local
cd crossref-local && make install

Database setup (1.5 TB, ~2 weeks to build):

# 1. Download CrossRef data (~100GB compressed)
aria2c "https://academictorrents.com/details/..."

# 2. Build SQLite database (~days)
pip install dois2sqlite
dois2sqlite build /path/to/crossref-data ./data/crossref.db

# 3. Build FTS5 index (~60 hours) & citations table (~days)
make fts-build-screen
make citations-build-screen
from crossref_local import search, get, count

# Full-text search (22ms for 541 matches across 167M records)
results = search("hippocampal sharp wave ripples")
for work in results:
    print(f"{work.title} ({work.year})")

# Get by DOI
work = get("10.1126/science.aax0758")
print(work.citation())

# Count matches
n = count("machine learning")  # 477,922 matches

Async API:

from crossref_local import aio

async def main():
    counts = await aio.count_many(["CRISPR", "neural network", "climate"])
    results = await aio.search("machine learning")
crossref-local search "CRISPR genome editing" -n 5
crossref-local search-by-doi 10.1038/nature12373
crossref-local status  # Configuration and database stats

With abstracts (-a flag):

$ crossref-local search "RS-1 enhances CRISPR" -n 1 -a

Found 4 matches in 128.4ms

1. RS-1 enhances CRISPR/Cas9- and TALEN-mediated knock-in efficiency (2016)
   DOI: 10.1038/ncomms10548
   Journal: Nature Communications
   Abstract: Zinc-finger nuclease, transcription activator-like effector nuclease
   and CRISPR/Cas9 are becoming major tools for genome editing...

Start the FastAPI server:

crossref-local relay --host 0.0.0.0 --port 31291

Endpoints:

# Search works (FTS5)
curl "http://localhost:31291/works?q=CRISPR&limit=10"

# Get by DOI
curl "http://localhost:31291/works/10.1038/nature12373"

# Batch DOI lookup
curl -X POST "http://localhost:31291/works/batch" \
  -H "Content-Type: application/json" \
  -d '{"dois": ["10.1038/nature12373", "10.1126/science.aax0758"]}'

# Citation endpoints
curl "http://localhost:31291/citations/10.1038/nature12373/citing"
curl "http://localhost:31291/citations/10.1038/nature12373/cited"
curl "http://localhost:31291/citations/10.1038/nature12373/count"

# Collection endpoints
curl "http://localhost:31291/collections"
curl -X POST "http://localhost:31291/collections" \
  -H "Content-Type: application/json" \
  -d '{"name": "my_papers", "query": "CRISPR", "limit": 100}'
curl "http://localhost:31291/collections/my_papers/download?format=bibtex"

# Database info
curl "http://localhost:31291/info"

HTTP mode (connect to running server):

# On local machine (if server is remote)
ssh -L 31291:127.0.0.1:31291 your-server

# Python client
from crossref_local import configure_http
configure_http("http://localhost:31291")

# Or via CLI
crossref-local --http search "CRISPR"

Run as MCP (Model Context Protocol) server:

crossref-local mcp start

Local MCP client configuration:

{
  "mcpServers": {
    "crossref-local": {
      "command": "crossref-local",
      "args": ["mcp", "start"],
      "env": {
        "CROSSREF_LOCAL_DB": "/path/to/crossref.db"
      }
    }
  }
}

Remote MCP via HTTP (recommended):

# On server: start persistent MCP server
crossref-local mcp start -t http --host 0.0.0.0 --port 8082
{
  "mcpServers": {
    "crossref-remote": {
      "url": "http://your-server:8082/mcp"
    }
  }
}

Diagnose setup:

crossref-local mcp doctor        # Check dependencies and database
crossref-local mcp list-tools    # Show available MCP tools
crossref-local mcp installation  # Show client config examples

See docs/remote-deployment.md for systemd and Docker setup.

Available tools:

  • search - Full-text search across 167M+ papers

  • search_by_doi - Get paper by DOI

  • enrich_dois - Add citation counts and references to DOIs

  • status - Database statistics

  • cache_* - Paper collection management

from crossref_local.impact_factor import ImpactFactorCalculator

with ImpactFactorCalculator() as calc:
    result = calc.calculate_impact_factor("Nature", target_year=2023)
    print(f"IF: {result['impact_factor']:.3f}")  # 54.067

Journal

IF 2023

Nature

54.07

Science

46.17

Cell

54.01

PLOS ONE

3.37

from crossref_local import get_citing, get_cited, CitationNetwork

citing = get_citing("10.1038/nature12373")  # 1539 papers
cited = get_cited("10.1038/nature12373")

# Build visualization (like Connected Papers)
network = CitationNetwork("10.1038/nature12373", depth=2)
network.save_html("citation_network.html")  # requires: pip install crossref-local[viz]

Query

Matches

Time

hippocampal sharp wave ripples

541

22ms

machine learning

477,922

113ms

CRISPR genome editing

12,170

257ms

Searching 167M records in milliseconds via FTS5.

openalex-local - Sister project with OpenAlex data:

Feature

crossref-local

openalex-local

Works

167M

284M

Abstracts

~21%

~45-60%

Update frequency

Real-time

Monthly

DOI authority

✓ (source)

Uses CrossRef

Citations

Raw references

Linked works

Concepts/Topics

Author IDs

Best for

DOI lookup, raw refs

Semantic search

When to use CrossRef: Real-time DOI updates, raw reference parsing, authoritative metadata. When to use OpenAlex: Semantic search, citation analysis, topic discovery.


Problem and Solution

#

Problem

Solution

1

CrossRef public API is rate-limited + requires internet + slow for bulk queries -- 167M works is the bottleneck for literature tools

Local SQLite + FTS5 -- full CrossRef dump (~60 GB) queryable offline; crossref_search returns in milliseconds

Part of SciTeX

CrossRef Local is part of SciTeX. When used inside the SciTeX framework, DOI resolution and citation checking integrate seamlessly:

import scitex

# Resolve DOIs and enrich bibliography
scitex.scholar.enrich_bibtex("references.bib")

# Check citation accuracy
scitex.scholar.check_citations("manuscript.tex")

The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.

  2. The freedom to study how every step works — from raw data to final manuscript.

  3. The freedom to redistribute your workflows, not just your papers.

  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.


A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
<1hResponse time
6dRelease cycle
20Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ywatanabe1989/crossref-local'

If you have feedback or need assistance with the MCP directory API, please join our Discord server