Skip to main content
Glama
KartikRane

MCP Research Pipeline

by KartikRane

MCP Research Pipeline

This is my personal learning project for exploring Hermes Agent and MCP tree architecture. The goal is not to provide a production-grade academic paper review system, though it may grow into something more useful later. The main purpose is to understand how a larger agent runtime can call a custom MCP service, how work can be split across stage-specific MCP servers, and how an aggregator can enforce which tools are visible at each step.

This project is a stage-gated MCP tree for academic paper processing. A central aggregator is the only entry point for the staged workflow for now. It loads one MCP server over stdio for the current workflow phase, verifies that the server exposes exactly the tools listed in src/config/phase_manifest.json, runs that phase, then closes the stdio session before moving to the next phase.

The core rule is enforced in src/aggregator/router.py: Stage 2 cannot call Stage 3, 4, or 5 tools because only the Stage 2 MCP server is connected and the router rejects any tool name outside the active phase manifest.

In my local setup, Hermes Agent is the client/runtime I use to interact with this pipeline. This repository contains the MCP research pipeline itself; Hermes is kept as a separate supporting project and connects to this service through MCP.

Project Layout

mcp-research-pipeline/
|-- main.py
|-- pipeline_server.py
|-- Dockerfile
|-- docker-compose.yml
|-- requirements.txt
|-- README.md
`-- src/
    |-- aggregator/
    |   `-- router.py
    |-- config/
    |   `-- phase_manifest.json
    |-- db/
    |   `-- step_results.py
    |-- servers/
    |   |-- ingestion_server.py
    |   |-- context_server.py
    |   |-- methodology_server.py
    |   |-- results_server.py
    |   `-- report_server.py
    `-- output/
        |-- step_results.sqlite
        |-- report_<run_id>.json
        |-- report_<run_id>.md
        `-- papers/

main.py is the command-line entrypoint. pipeline_server.py exposes the pipeline as an HTTP MCP server with a run_research_pipeline tool. The actual router, stage servers, manifest, database helper, and generated outputs live under src/.

Related MCP server: scholar-toolkit-mcp

Architecture

Each server is a separate FastMCP stdio server:

Stage

Server

Active tools

1

src/servers/ingestion_server.py

fetch_paper, extract_raw_text, detect_sections

2

src/servers/context_server.py

extract_abstract, extract_introduction, summarize_context, extract_research_questions

3

src/servers/methodology_server.py

extract_methodology, identify_hardware, identify_frameworks, extract_datasets

4

src/servers/results_server.py

extract_results, extract_conclusion, summarize_conclusion, extract_key_metrics

5

src/servers/report_server.py

compile_report, export_markdown, save_to_file

Intermediate outputs are saved in SQLite at src/output/step_results.sqlite. Full stage outputs are persisted for recovery, but the simulated orchestrator context passed between stages uses only compressed summaries from StepResultsDB, not the raw full paper text.

Install

cd mcp-research-pipeline
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

Windows PowerShell:

cd mcp-research-pipeline
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Run

Process a real arXiv paper:

PYTHONPATH=src python main.py --paper https://arxiv.org/abs/1706.03762 --verify-gate

Windows PowerShell:

$env:PYTHONPATH = "src"
python main.py --paper https://arxiv.org/abs/1706.03762 --verify-gate

The --verify-gate flag intentionally attempts to call the Stage 3 extract_methodology tool while Stage 2 is active. The router blocks it before the request reaches any server.

Expected progress output:

[stage_2_context] active tools: extract_abstract, extract_introduction, extract_research_questions, summarize_context
stage gate verified: Tool 'extract_methodology' is not active in phase 'stage_2_context'. Allowed tools for this phase: extract_abstract, extract_introduction, summarize_context, extract_research_questions
run_id: 20260610T120000Z
[stage_1_ingestion] active tools: detect_sections, extract_raw_text, fetch_paper
[stage_1_ingestion] tools unloaded
[stage_2_context] active tools: extract_abstract, extract_introduction, extract_research_questions, summarize_context
[stage_2_context] tools unloaded
[stage_3_methodology] active tools: extract_datasets, extract_methodology, identify_frameworks, identify_hardware
[stage_3_methodology] tools unloaded
[stage_4_results] active tools: extract_conclusion, extract_key_metrics, extract_results, summarize_conclusion
[stage_4_results] tools unloaded
[stage_5_report] active tools: compile_report, export_markdown, save_to_file
[stage_5_report] tools unloaded

Generated reports land in src/output/:

src/output/report_<run_id>.json
src/output/report_<run_id>.md
src/output/step_results.sqlite

Docker

The compose file is meant to run the HTTP MCP server from pipeline_server.py so Hermes can connect to it as an external MCP service. It joins the external Docker network hermes-mcp, which is also used by my Hermes setup.

Create the network once if it does not exist:

docker network create hermes-mcp

Build and run the service:

docker compose up --build

The MCP service listens on 127.0.0.1:8000 from the host. Inside the pipeline, the aggregator starts each stage server as a local stdio child process, one stage at a time. This preserves stdio transport while still packaging the full server tree.

For a Hermes client running in the same Docker network, the MCP endpoint is:

http://research-mcp:8000/mcp

For a Hermes client running directly on the host, use:

http://127.0.0.1:8000/mcp

Stage Gate Enforcement

The enforcement has three layers:

  1. src/config/phase_manifest.json maps each phase to its allowed tool names.

  2. ResearchPipelineRouter.active_stage() starts only the MCP server for the active phase and verifies session.list_tools() equals the manifest tools.

  3. ActiveStageClient.call_tool() rejects any tool not listed for the active phase before calling session.call_tool().

Because the router closes the stdio session after every phase, tools from prior or future phases are not visible in the active MCP context.

Notes

  • PDF text extraction uses pypdf, so quality depends on the PDF text layer.

  • The summaries are heuristic and dependency-light. You can replace those tools with model-backed implementations later without changing the tree architecture.

  • fetch_paper accepts local PDF paths, direct PDF URLs, and arXiv abstract URLs.

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/KartikRane/MCP-Aggregator_Hermes'

If you have feedback or need assistance with the MCP directory API, please join our Discord server