Skip to main content
Glama
marc-shade

Research Paper Ingestion MCP Server

by marc-shade

Research Paper Ingestion MCP Server

MCP Python License Part of Agentic System

Autonomous knowledge acquisition from academic research papers for AGI self-improvement.

Part of the Agentic System - a 24/7 autonomous AI framework with persistent memory.

Features

Paper Discovery

  • arXiv Integration: Search and download from arXiv.org

  • Semantic Scholar: Citation analysis and academic impact metrics

  • PDF Download: Automatic paper retrieval and storage

Knowledge Extraction

  • Insight Extraction: Identify key findings and contributions

  • Citation Analysis: Understand paper influence and relationships

  • Technique Identification: Extract novel methods and approaches

Memory Integration

  • Enhanced Memory: Store extracted knowledge for AGI learning

  • Structured Entities: Create searchable memory representations

  • Citation Graphs: Track knowledge lineage

Installation

cd ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp
pip install -r requirements.txt

Configuration

Add to ~/.claude.json:

{
  "mcpServers": {
    "research-paper-mcp": {
      "command": "python3",
      "args": [
        "${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp/server.py"
      ],
      "env": {},
      "disabled": false
    }
  }
}

Available Tools

search_arxiv

Search arXiv for research papers by query.

Parameters:

  • query (required): Search query (e.g., "recursive self-improvement AGI")

  • max_results: Maximum results (default: 10)

  • sort_by: Sort order - relevance, lastUpdatedDate, submittedDate

Example:

results = mcp__research-paper-mcp__search_arxiv({
    "query": "meta-learning neural networks",
    "max_results": 20,
    "sort_by": "relevance"
})

search_semantic_scholar

Search Semantic Scholar for papers with citation metrics.

Parameters:

  • query (required): Search query

  • fields: Metadata fields to retrieve

  • limit: Maximum results (default: 10)

Example:

results = mcp__research-paper-mcp__search_semantic_scholar({
    "query": "transformer architecture attention",
    "fields": ["title", "authors", "citationCount", "year"],
    "limit": 15
})

download_paper

Download research paper PDF from URL.

Parameters:

  • url (required): PDF URL

  • paper_id (required): Unique identifier for filename

Example:

result = mcp__research-paper-mcp__download_paper({
    "url": "https://arxiv.org/pdf/1234.5678.pdf",
    "paper_id": "arxiv-1234.5678"
})

extract_insights

Extract key insights and findings from paper text.

Parameters:

  • paper_text (required): Full paper text or abstract

  • focus_areas: Optional specific areas to focus on

Example:

insights = mcp__research-paper-mcp__extract_insights({
    "paper_text": paper_abstract,
    "focus_areas": ["methodology", "results"]
})

analyze_citations

Analyze citation relationships and paper influence.

Parameters:

  • paper_id (required): Semantic Scholar or arXiv paper ID

  • depth: Citation graph depth 1-3 (default: 1)

Example:

analysis = mcp__research-paper-mcp__analyze_citations({
    "paper_id": "arxiv:1706.03762",  # "Attention Is All You Need"
    "depth": 2
})

store_paper_knowledge

Store extracted knowledge in enhanced-memory for AGI learning.

Parameters:

  • paper_metadata (required): Paper metadata dict

  • insights (required): List of key insights

  • techniques: List of novel techniques

Example:

stored = mcp__research-paper-mcp__store_paper_knowledge({
    "paper_metadata": {
        "id": "arxiv-1234.5678",
        "title": "Novel AGI Approach",
        "authors": ["Smith", "Jones"],
        "year": 2024
    },
    "insights": [
        "Achieves 95% accuracy on benchmark",
        "10x faster than previous methods"
    ],
    "techniques": [
        "Recursive meta-optimization",
        "Self-modifying architectures"
    ]
})

Usage Patterns

Autonomous Research Workflow

# 1. Search for relevant papers
arxiv_results = mcp__research-paper-mcp__search_arxiv({
    "query": "recursive self-improvement",
    "max_results": 10
})

# 2. Get citation metrics
for paper in arxiv_results['papers']:
    scholar_data = mcp__research-paper-mcp__search_semantic_scholar({
        "query": paper['title'],
        "limit": 1
    })

    # 3. Download high-impact papers
    if scholar_data['papers'][0]['citationCount'] > 50:
        pdf = mcp__research-paper-mcp__download_paper({
            "url": paper['pdf_url'],
            "paper_id": paper['id']
        })

        # 4. Extract and store insights
        insights = mcp__research-paper-mcp__extract_insights({
            "paper_text": paper['abstract']
        })

        mcp__research-paper-mcp__store_paper_knowledge({
            "paper_metadata": paper,
            "insights": insights['insights']
        })

Citation Network Analysis

# Analyze citation influence
analysis = mcp__research-paper-mcp__analyze_citations({
    "paper_id": "influential-paper-id",
    "depth": 2
})

# Identify most influential papers in field
if analysis['citation_graph']['influential_citations'] > 100:
    # Download and study this foundational paper
    pass

Storage

  • Papers Directory: ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/research-papers/

  • PDFs: Saved as {paper_id}.pdf

  • Memory Integration: Via enhanced-memory-mcp create_entities

Dependencies

  • arxiv: arXiv API Python wrapper

  • aiohttp: Async HTTP client for Semantic Scholar API

  • mcp: Model Context Protocol SDK

Future Enhancements

  1. PDF Text Extraction: Parse full paper text from PDFs

  2. Figure/Diagram Analysis: Extract visual insights

  3. Code Repository Links: Find implementation code

  4. Related Papers: Automatic discovery of connected research

  5. Trend Detection: Identify emerging research directions

  6. LLM-Powered Insight Extraction: Use GPT-4 for deeper analysis

Integration with AGI System

This MCP server closes Gap #1 from AGI_GAP_ANALYSIS.md:

Knowledge Acquisition Infrastructure

  • ✓ Research Paper Ingestion (arXiv + Semantic Scholar)

  • ⏳ Video Transcript Processing (separate MCP)

  • ⏳ GitHub Repository Analysis (future)

  • ⏳ Documentation Scraping (future)

  • ⏳ Knowledge Graph Integration (future)

Impact: System can now autonomously learn from the latest AI research!

-
security - not tested
F
license - not found
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marc-shade/research-paper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server