Skip to main content
Glama
marc-shade

Research Paper Ingestion MCP Server

by marc-shade

Research Paper Ingestion MCP Server

MCP Python License Part of Agentic System

Autonomous knowledge acquisition from academic research papers for AGI self-improvement.

Part of the Agentic System - a 24/7 autonomous AI framework with persistent memory.

Features

Paper Discovery

  • arXiv Integration: Search and download from arXiv.org

  • Semantic Scholar: Citation analysis and academic impact metrics

  • PDF Download: Automatic paper retrieval and storage

Knowledge Extraction

  • Insight Extraction: Identify key findings and contributions

  • Citation Analysis: Understand paper influence and relationships

  • Technique Identification: Extract novel methods and approaches

Memory Integration

  • Enhanced Memory: Store extracted knowledge for AGI learning

  • Structured Entities: Create searchable memory representations

  • Citation Graphs: Track knowledge lineage

Installation

cd ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp pip install -r requirements.txt

Configuration

Add to ~/.claude.json:

{ "mcpServers": { "research-paper-mcp": { "command": "python3", "args": [ "${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp/server.py" ], "env": {}, "disabled": false } } }

Available Tools

search_arxiv

Search arXiv for research papers by query.

Parameters:

  • query (required): Search query (e.g., "recursive self-improvement AGI")

  • max_results: Maximum results (default: 10)

  • sort_by: Sort order - relevance, lastUpdatedDate, submittedDate

Example:

results = mcp__research-paper-mcp__search_arxiv({ "query": "meta-learning neural networks", "max_results": 20, "sort_by": "relevance" })

search_semantic_scholar

Search Semantic Scholar for papers with citation metrics.

Parameters:

  • query (required): Search query

  • fields: Metadata fields to retrieve

  • limit: Maximum results (default: 10)

Example:

results = mcp__research-paper-mcp__search_semantic_scholar({ "query": "transformer architecture attention", "fields": ["title", "authors", "citationCount", "year"], "limit": 15 })

download_paper

Download research paper PDF from URL.

Parameters:

  • url (required): PDF URL

  • paper_id (required): Unique identifier for filename

Example:

result = mcp__research-paper-mcp__download_paper({ "url": "https://arxiv.org/pdf/1234.5678.pdf", "paper_id": "arxiv-1234.5678" })

extract_insights

Extract key insights and findings from paper text.

Parameters:

  • paper_text (required): Full paper text or abstract

  • focus_areas: Optional specific areas to focus on

Example:

insights = mcp__research-paper-mcp__extract_insights({ "paper_text": paper_abstract, "focus_areas": ["methodology", "results"] })

analyze_citations

Analyze citation relationships and paper influence.

Parameters:

  • paper_id (required): Semantic Scholar or arXiv paper ID

  • depth: Citation graph depth 1-3 (default: 1)

Example:

analysis = mcp__research-paper-mcp__analyze_citations({ "paper_id": "arxiv:1706.03762", # "Attention Is All You Need" "depth": 2 })

store_paper_knowledge

Store extracted knowledge in enhanced-memory for AGI learning.

Parameters:

  • paper_metadata (required): Paper metadata dict

  • insights (required): List of key insights

  • techniques: List of novel techniques

Example:

stored = mcp__research-paper-mcp__store_paper_knowledge({ "paper_metadata": { "id": "arxiv-1234.5678", "title": "Novel AGI Approach", "authors": ["Smith", "Jones"], "year": 2024 }, "insights": [ "Achieves 95% accuracy on benchmark", "10x faster than previous methods" ], "techniques": [ "Recursive meta-optimization", "Self-modifying architectures" ] })

Usage Patterns

Autonomous Research Workflow

# 1. Search for relevant papers arxiv_results = mcp__research-paper-mcp__search_arxiv({ "query": "recursive self-improvement", "max_results": 10 }) # 2. Get citation metrics for paper in arxiv_results['papers']: scholar_data = mcp__research-paper-mcp__search_semantic_scholar({ "query": paper['title'], "limit": 1 }) # 3. Download high-impact papers if scholar_data['papers'][0]['citationCount'] > 50: pdf = mcp__research-paper-mcp__download_paper({ "url": paper['pdf_url'], "paper_id": paper['id'] }) # 4. Extract and store insights insights = mcp__research-paper-mcp__extract_insights({ "paper_text": paper['abstract'] }) mcp__research-paper-mcp__store_paper_knowledge({ "paper_metadata": paper, "insights": insights['insights'] })

Citation Network Analysis

# Analyze citation influence analysis = mcp__research-paper-mcp__analyze_citations({ "paper_id": "influential-paper-id", "depth": 2 }) # Identify most influential papers in field if analysis['citation_graph']['influential_citations'] > 100: # Download and study this foundational paper pass

Storage

  • Papers Directory: ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/research-papers/

  • PDFs: Saved as {paper_id}.pdf

  • Memory Integration: Via enhanced-memory-mcp create_entities

Dependencies

  • arxiv: arXiv API Python wrapper

  • aiohttp: Async HTTP client for Semantic Scholar API

  • mcp: Model Context Protocol SDK

Future Enhancements

  1. PDF Text Extraction: Parse full paper text from PDFs

  2. Figure/Diagram Analysis: Extract visual insights

  3. Code Repository Links: Find implementation code

  4. Related Papers: Automatic discovery of connected research

  5. Trend Detection: Identify emerging research directions

  6. LLM-Powered Insight Extraction: Use GPT-4 for deeper analysis

Integration with AGI System

This MCP server closes Gap #1 from AGI_GAP_ANALYSIS.md:

Knowledge Acquisition Infrastructure

  • ✓ Research Paper Ingestion (arXiv + Semantic Scholar)

  • ⏳ Video Transcript Processing (separate MCP)

  • ⏳ GitHub Repository Analysis (future)

  • ⏳ Documentation Scraping (future)

  • ⏳ Knowledge Graph Integration (future)

Impact: System can now autonomously learn from the latest AI research!

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marc-shade/research-paper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server