Enables searching arXiv with field-specific queries and filters, and downloading papers by ID with conversion to structured text in HTML/PDF modes.
Enables web search capabilities through the Brave Search API for finding scientific and research-related content.
Provides tools for searching Hugging Face datasets with filters and sorting options.
Provides tools for fetching LaTeX templates, compiling LaTeX documents to PDF, and reading PDF content with text extraction per page.
Allows retrieval of citations and references for arXiv papers through the Semantic Scholar Graph API.
Academia MCP
MCP server with tools to search, fetch, analyze, and report on scientific papers and datasets.
Features
ArXiv search and download
ACL Anthology search
Hugging Face datasets search
Semantic Scholar citations and references
Web search via Exa, Brave, or Tavily
Web page crawler, LaTeX compilation, PDF reading
Optional LLM-powered tools for document QA and research proposal workflows
Requirements
Python 3.12+
Install
Using pip (end users):
For development (uv + Makefile):
Quickstart
Run over HTTP (default transport):
Run over stdio (for local MCP clients like Claude Desktop):
Notes:
Transports:
stdio,sse,streamable-http.host/portare used for HTTP transports; ignored forstdio. Default port is5056(orPORT).
Claude Desktop config
Available tools (one-liners)
arxiv_search: Query arXiv with field-specific queries and filters.arxiv_download: Fetch a paper by ID and convert to structured text (HTML/PDF modes).anthology_search: Search ACL Anthology with fielded queries and optional date filtering.hf_datasets_search: Find Hugging Face datasets with filters and sorting.s2_get_citations: List papers citing a given arXiv paper (Semantic Scholar Graph).s2_get_references: List papers referenced by a given arXiv paper.visit_webpage: Fetch and normalize a web page.web_search: Unified search wrapper; available when at least one of Exa/Brave/Tavily keys is set.exa_web_search,brave_web_search,tavily_web_search: Provider-specific search.get_latex_templates_list,get_latex_template: Enumerate and fetch built-in LaTeX templates.compile_latex: Compile LaTeX to PDF inWORKSPACE_DIR.read_pdf: Extract text per page from a PDF.download_pdf_paper,review_pdf_paper: Download and optionally review PDFs (requires LLM + workspace).document_qa: Answer questions over provided document chunks (requires LLM).extract_bitflip_info,generate_research_proposals,score_research_proposals: Research proposal helpers (requires LLM).
Availability notes:
Set
WORKSPACE_DIRto enablecompile_latex,read_pdf,download_pdf_paper, andreview_pdf_paper.Set
OPENROUTER_API_KEYto enable LLM tools (document_qa,review_pdf_paper, and bitflip tools).Set one or more of
EXA_API_KEY,BRAVE_API_KEY,TAVILY_API_KEYto enableweb_searchand provider tools.
Environment variables
Set as needed, depending on which tools you use:
OPENROUTER_API_KEY: required for LLM-related tools.BASE_URL: override OpenRouter base URL.DOCUMENT_QA_MODEL_NAME: override default model fordocument_qa.BITFLIP_MODEL_NAME: override default model for bitflip tools.TAVILY_API_KEY: enables Tavily inweb_search.EXA_API_KEY: enables Exa inweb_searchandvisit_webpage.BRAVE_API_KEY: enables Brave inweb_search.WORKSPACE_DIR: directory for generated files (PDFs, temp artifacts).PORT: HTTP port (default5056).
You can put these in a .env file in the project root.
Docker
Build the image:
Run the server (HTTP):
Or use existing image: phoenix120/academia_mcp
Examples
Makefile targets
make install: install the package in editable mode with uvmake validate: run black, flake8, and mypy (strict)make test: run the test suite with pytestmake publish: build and publish using uv
LaTeX/PDF requirements
Only needed for LaTeX/PDF tools. Ensure a LaTeX distribution is installed and pdflatex is on PATH, as well as latexmk. On Debian/Ubuntu:
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables searching, fetching, and analyzing scientific papers from ArXiv, ACL Anthology, Semantic Scholar, and Hugging Face datasets, with optional LLM-powered document QA and research proposal workflows.