MCP Research Pipeline
Provides tools to fetch and process academic papers from arXiv, including extraction of text, sections, and metadata for multi-stage analysis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Research Pipelineprocess paper arXiv 1706.03762"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Research Pipeline
This is my personal learning project for exploring Hermes Agent and MCP tree architecture. The goal is not to provide a production-grade academic paper review system, though it may grow into something more useful later. The main purpose is to understand how a larger agent runtime can call a custom MCP service, how work can be split across stage-specific MCP servers, and how an aggregator can enforce which tools are visible at each step.
This project is a stage-gated MCP tree for academic paper processing. A central
aggregator is the only entry point for the staged workflow for now. It loads one
MCP server over stdio for the current workflow phase, verifies that the server
exposes exactly the tools listed in src/config/phase_manifest.json, runs that
phase, then closes the stdio session before moving to the next phase.
The core rule is enforced in src/aggregator/router.py: Stage 2 cannot call
Stage 3, 4, or 5 tools because only the Stage 2 MCP server is connected and the
router rejects any tool name outside the active phase manifest.
In my local setup, Hermes Agent is the client/runtime I use to interact with this pipeline. This repository contains the MCP research pipeline itself; Hermes is kept as a separate supporting project and connects to this service through MCP.
Project Layout
mcp-research-pipeline/
|-- main.py
|-- pipeline_server.py
|-- Dockerfile
|-- docker-compose.yml
|-- requirements.txt
|-- README.md
`-- src/
|-- aggregator/
| `-- router.py
|-- config/
| `-- phase_manifest.json
|-- db/
| `-- step_results.py
|-- servers/
| |-- ingestion_server.py
| |-- context_server.py
| |-- methodology_server.py
| |-- results_server.py
| `-- report_server.py
`-- output/
|-- step_results.sqlite
|-- report_<run_id>.json
|-- report_<run_id>.md
`-- papers/main.py is the command-line entrypoint. pipeline_server.py exposes the
pipeline as an HTTP MCP server with a run_research_pipeline tool. The actual
router, stage servers, manifest, database helper, and generated outputs live
under src/.
Related MCP server: scholar-toolkit-mcp
Architecture
Each server is a separate FastMCP stdio server:
Stage | Server | Active tools |
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
5 |
|
|
Intermediate outputs are saved in SQLite at src/output/step_results.sqlite.
Full stage outputs are persisted for recovery, but the simulated orchestrator
context passed between stages uses only compressed summaries from
StepResultsDB, not the raw full paper text.
Install
cd mcp-research-pipeline
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txtWindows PowerShell:
cd mcp-research-pipeline
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txtRun
Process a real arXiv paper:
PYTHONPATH=src python main.py --paper https://arxiv.org/abs/1706.03762 --verify-gateWindows PowerShell:
$env:PYTHONPATH = "src"
python main.py --paper https://arxiv.org/abs/1706.03762 --verify-gateThe --verify-gate flag intentionally attempts to call the Stage 3
extract_methodology tool while Stage 2 is active. The router blocks it before
the request reaches any server.
Expected progress output:
[stage_2_context] active tools: extract_abstract, extract_introduction, extract_research_questions, summarize_context
stage gate verified: Tool 'extract_methodology' is not active in phase 'stage_2_context'. Allowed tools for this phase: extract_abstract, extract_introduction, summarize_context, extract_research_questions
run_id: 20260610T120000Z
[stage_1_ingestion] active tools: detect_sections, extract_raw_text, fetch_paper
[stage_1_ingestion] tools unloaded
[stage_2_context] active tools: extract_abstract, extract_introduction, extract_research_questions, summarize_context
[stage_2_context] tools unloaded
[stage_3_methodology] active tools: extract_datasets, extract_methodology, identify_frameworks, identify_hardware
[stage_3_methodology] tools unloaded
[stage_4_results] active tools: extract_conclusion, extract_key_metrics, extract_results, summarize_conclusion
[stage_4_results] tools unloaded
[stage_5_report] active tools: compile_report, export_markdown, save_to_file
[stage_5_report] tools unloadedGenerated reports land in src/output/:
src/output/report_<run_id>.json
src/output/report_<run_id>.md
src/output/step_results.sqliteDocker
The compose file is meant to run the HTTP MCP server from pipeline_server.py
so Hermes can connect to it as an external MCP service. It joins the external
Docker network hermes-mcp, which is also used by my Hermes setup.
Create the network once if it does not exist:
docker network create hermes-mcpBuild and run the service:
docker compose up --buildThe MCP service listens on 127.0.0.1:8000 from the host. Inside the pipeline,
the aggregator starts each stage server as a local stdio child process, one stage
at a time. This preserves stdio transport while still packaging the full server
tree.
For a Hermes client running in the same Docker network, the MCP endpoint is:
http://research-mcp:8000/mcpFor a Hermes client running directly on the host, use:
http://127.0.0.1:8000/mcpStage Gate Enforcement
The enforcement has three layers:
src/config/phase_manifest.jsonmaps each phase to its allowed tool names.ResearchPipelineRouter.active_stage()starts only the MCP server for the active phase and verifiessession.list_tools()equals the manifest tools.ActiveStageClient.call_tool()rejects any tool not listed for the active phase before callingsession.call_tool().
Because the router closes the stdio session after every phase, tools from prior or future phases are not visible in the active MCP context.
Notes
PDF text extraction uses
pypdf, so quality depends on the PDF text layer.The summaries are heuristic and dependency-light. You can replace those tools with model-backed implementations later without changing the tree architecture.
fetch_paperaccepts local PDF paths, direct PDF URLs, and arXiv abstract URLs.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/KartikRane/MCP-Aggregator_Hermes'
If you have feedback or need assistance with the MCP directory API, please join our Discord server