Skip to main content
Glama
marc-shade

Research Paper Ingestion MCP Server

by marc-shade

Research Paper Ingestion MCP Server

MCP Python License Part of Agentic System

Autonomous knowledge acquisition from academic research papers for AGI self-improvement.

Part of the Agentic System - a 24/7 autonomous AI framework with persistent memory.

Features

Paper Discovery

  • arXiv Integration: Search and download from arXiv.org

  • Semantic Scholar: Citation analysis and academic impact metrics

  • PDF Download: Automatic paper retrieval and storage

Knowledge Extraction

  • Insight Extraction: Identify key findings and contributions

  • Citation Analysis: Understand paper influence and relationships

  • Technique Identification: Extract novel methods and approaches

Memory Integration

  • Enhanced Memory: Store extracted knowledge for AGI learning

  • Structured Entities: Create searchable memory representations

  • Citation Graphs: Track knowledge lineage

Installation

cd ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp
pip install -r requirements.txt

Configuration

Add to ~/.claude.json:

{
  "mcpServers": {
    "research-paper-mcp": {
      "command": "python3",
      "args": [
        "${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp/server.py"
      ],
      "env": {},
      "disabled": false
    }
  }
}

Available Tools

search_arxiv

Search arXiv for research papers by query.

Parameters:

  • query (required): Search query (e.g., "recursive self-improvement AGI")

  • max_results: Maximum results (default: 10)

  • sort_by: Sort order - relevance, lastUpdatedDate, submittedDate

Example:

results = mcp__research-paper-mcp__search_arxiv({
    "query": "meta-learning neural networks",
    "max_results": 20,
    "sort_by": "relevance"
})

search_semantic_scholar

Search Semantic Scholar for papers with citation metrics.

Parameters:

  • query (required): Search query

  • fields: Metadata fields to retrieve

  • limit: Maximum results (default: 10)

Example:

results = mcp__research-paper-mcp__search_semantic_scholar({
    "query": "transformer architecture attention",
    "fields": ["title", "authors", "citationCount", "year"],
    "limit": 15
})

download_paper

Download research paper PDF from URL.

Parameters:

  • url (required): PDF URL

  • paper_id (required): Unique identifier for filename

Example:

result = mcp__research-paper-mcp__download_paper({
    "url": "https://arxiv.org/pdf/1234.5678.pdf",
    "paper_id": "arxiv-1234.5678"
})

extract_insights

Extract key insights and findings from paper text.

Parameters:

  • paper_text (required): Full paper text or abstract

  • focus_areas: Optional specific areas to focus on

Example:

insights = mcp__research-paper-mcp__extract_insights({
    "paper_text": paper_abstract,
    "focus_areas": ["methodology", "results"]
})

analyze_citations

Analyze citation relationships and paper influence.

Parameters:

  • paper_id (required): Semantic Scholar or arXiv paper ID

  • depth: Citation graph depth 1-3 (default: 1)

Example:

analysis = mcp__research-paper-mcp__analyze_citations({
    "paper_id": "arxiv:1706.03762",  # "Attention Is All You Need"
    "depth": 2
})

store_paper_knowledge

Store extracted knowledge in enhanced-memory for AGI learning.

Parameters:

  • paper_metadata (required): Paper metadata dict

  • insights (required): List of key insights

  • techniques: List of novel techniques

Example:

stored = mcp__research-paper-mcp__store_paper_knowledge({
    "paper_metadata": {
        "id": "arxiv-1234.5678",
        "title": "Novel AGI Approach",
        "authors": ["Smith", "Jones"],
        "year": 2024
    },
    "insights": [
        "Achieves 95% accuracy on benchmark",
        "10x faster than previous methods"
    ],
    "techniques": [
        "Recursive meta-optimization",
        "Self-modifying architectures"
    ]
})

Usage Patterns

Autonomous Research Workflow

# 1. Search for relevant papers
arxiv_results = mcp__research-paper-mcp__search_arxiv({
    "query": "recursive self-improvement",
    "max_results": 10
})

# 2. Get citation metrics
for paper in arxiv_results['papers']:
    scholar_data = mcp__research-paper-mcp__search_semantic_scholar({
        "query": paper['title'],
        "limit": 1
    })

    # 3. Download high-impact papers
    if scholar_data['papers'][0]['citationCount'] > 50:
        pdf = mcp__research-paper-mcp__download_paper({
            "url": paper['pdf_url'],
            "paper_id": paper['id']
        })

        # 4. Extract and store insights
        insights = mcp__research-paper-mcp__extract_insights({
            "paper_text": paper['abstract']
        })

        mcp__research-paper-mcp__store_paper_knowledge({
            "paper_metadata": paper,
            "insights": insights['insights']
        })

Citation Network Analysis

# Analyze citation influence
analysis = mcp__research-paper-mcp__analyze_citations({
    "paper_id": "influential-paper-id",
    "depth": 2
})

# Identify most influential papers in field
if analysis['citation_graph']['influential_citations'] > 100:
    # Download and study this foundational paper
    pass

Storage

  • Papers Directory: ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/research-papers/

  • PDFs: Saved as {paper_id}.pdf

  • Memory Integration: Via enhanced-memory-mcp create_entities

Dependencies

  • arxiv: arXiv API Python wrapper

  • aiohttp: Async HTTP client for Semantic Scholar API

  • mcp: Model Context Protocol SDK

Future Enhancements

  1. PDF Text Extraction: Parse full paper text from PDFs

  2. Figure/Diagram Analysis: Extract visual insights

  3. Code Repository Links: Find implementation code

  4. Related Papers: Automatic discovery of connected research

  5. Trend Detection: Identify emerging research directions

  6. LLM-Powered Insight Extraction: Use GPT-4 for deeper analysis

Integration with AGI System

This MCP server closes Gap #1 from AGI_GAP_ANALYSIS.md:

Knowledge Acquisition Infrastructure

  • ✓ Research Paper Ingestion (arXiv + Semantic Scholar)

  • ⏳ Video Transcript Processing (separate MCP)

  • ⏳ GitHub Repository Analysis (future)

  • ⏳ Documentation Scraping (future)

  • ⏳ Knowledge Graph Integration (future)

Impact: System can now autonomously learn from the latest AI research!

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marc-shade/research-paper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server