Scientific Paper Harvester MCP Server

story-1.story.md•5.47 kB

# Story-1: MCP Server Foundation with Category Listing **Epic:** Scientific Paper Harvester MCP Server MVP **Status:** Approved **Assignee:** Developer **Sprint:** 1 **Estimated Effort:** Medium ## User Story As a developer or researcher, I want to run an MCP server that can list available categories from both arXiv and OpenAlex sources so that I can understand what scientific paper categories are available for querying and start using the system immediately. ## Acceptance Criteria - [x] **AC-1:** MCP server starts successfully using Node.js 20 LTS with ESM modules - [x] **AC-2:** `list_categories` tool works for arXiv source and returns valid category codes (e.g., cs.AI, physics.gen-ph) - [x] **AC-3:** `list_categories` tool works for OpenAlex source and returns valid concept IDs - [x] **AC-4:** CLI interface provides the same `list_categories` functionality for offline testing - [x] **AC-5:** Server responds to MCP protocol calls via stdio transport - [x] **AC-6:** Rate limiting is implemented per source to respect API limits (5 req/s max) ## Technical Requirements ### Prerequisites - Node.js 20 LTS installed - Basic MCP SDK understanding - TypeScript and ESM module support ### Implementation Tasks - [x] **T-1:** Set up project structure with TypeScript, ESM, and MCP SDK dependencies - [x] **T-2:** Create base driver interface and arXiv driver for category listing - [x] **T-3:** Create OpenAlex driver for concept/category listing - [x] **T-4:** Implement rate limiter service with per-source token bucket - [x] **T-5:** Build MCP server with `list_categories` tool using Zod validation - [x] **T-6:** Create CLI wrapper that calls the same underlying functions - [x] **T-7:** Add structured logging with Winston ### Definition of Done - [x] All acceptance criteria met - [x] Unit tests pass (>90% coverage) - [x] Integration tests pass for both MCP and CLI interfaces - [x] Code follows TypeScript/ESM standards - [x] Rate limiting respects API constraints - [x] Provide the user with exact instructions how to run the server for exploratory testing ## Technical Design ### Architecture Components - **MCP Server**: Main entry point with stdio transport - **Base Driver**: Abstract interface for data source drivers - **arXiv Driver**: Fetches categories from arXiv taxonomy - **OpenAlex Driver**: Fetches concepts from OpenAlex API - **Rate Limiter**: Token bucket implementation per source - **CLI Interface**: Command-line wrapper for offline testing ### Data Models ```typescript interface CategoryList { source: 'arxiv' | 'openalex'; categories: Category[]; } interface Category { id: string; // e.g., "cs.AI" or concept ID name: string; // Human readable name description?: string; // Optional description } interface RateLimiterState { [source: string]: { tokens: number; lastRefill: number; maxTokens: number; refillRate: number; } } ``` ### API Design - **MCP Tool**: `list_categories({ source: "arxiv" | "openalex" })` - **CLI Command**: `npx latest-science-mcp list-categories --source=arxiv` - **Response**: JSON array of category objects ## Test Plan ### Unit Tests - Rate limiter token bucket behavior with time simulation - arXiv driver category parsing from API response - OpenAlex driver concept listing and mapping - MCP tool parameter validation with Zod schemas ### Integration Tests - CLI command execution returns valid JSON - MCP server responds correctly to stdio tool calls - Rate limiting prevents excessive API calls during concurrent requests ## Dependencies ### Internal Dependencies - Project structure and build configuration - TypeScript compilation setup - ESM module resolution ### External Dependencies - arXiv API (public, no authentication) - OpenAlex API (public, no authentication) - @modelcontextprotocol/sdk for MCP protocol - Zod for runtime validation - Winston for logging - Axios for HTTP requests ## Chat Log **2025-01-XX**: Story created focusing on foundational MCP server setup with category listing as the first user-facing value. Decided to start with `list_categories` as it provides immediate utility and doesn't require complex text extraction pipeline. **2025-01-XX**: Implementation completed successfully. All acceptance criteria met: - ✅ MCP server running with Node.js 20 LTS + ESM - ✅ arXiv categories working (8 predefined categories including cs.AI, cs.LG, cs.CL, etc.) - ✅ OpenAlex concepts working (19 top-level concepts fetched from API) - ✅ CLI interface functional for both sources - ✅ MCP protocol responding correctly via stdio transport - ✅ Rate limiting implemented with token bucket per source - ✅ Full TypeScript implementation with proper error handling - ✅ README with usage instructions provided **Key Technical Decisions:** - Used predefined arXiv categories to avoid unnecessary API calls for static data - Implemented OpenAlex concept fetching with proper error handling and rate limiting - Fixed MCP SDK usage to use Zod schemas as second parameter (not inputSchema object) - Added comprehensive logging and error handling throughout **Testing Results:** - CLI tests: ✅ Both arXiv and OpenAlex sources working - MCP tests: ✅ Server responds correctly to protocol calls - Rate limiting: ✅ Token bucket implementation working - Error handling: ✅ Proper validation and error responses --- **Created:** 2025-01-XX **Last Updated:** 2025-01-XX **Completed:** 2025-01-XX

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/benedict2310/Scientific-Papers-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

story-1.story.md•5.47 kB