Paper Search MCP Server

Overview Schema Related Servers Score Discussions

02_server.md•10.6 kB

# MCP Server 详解 > **文件位置**: `paper_search_mcp/server.py` > **难度**: ⭐⭐⭐⭐ (核心文件) > **更新**: 2025年12月 - 工厂函数重构 --- ## 概述 `server.py` 是整个项目的**核心入口**，负责： 1. 创建 MCP 服务器 2. 注册所有工具（搜索、下载、阅读） 3. 处理来自 LLM 的请求 ### 2025 最佳实践本项目采用以下最佳实践： | 实践 | 说明 | |------|------| | 工厂函数 | 减少代码重复 | | `logging` | 替代 `print()` | | 统一错误处理 | 在工厂函数中集中处理 | | 搜索器注册表 | 集中管理实例 | --- ## 架构图 ```mermaid graph TB Client[Claude Desktop / LLM] -->|MCP Request| Server[FastMCP Server] Server --> Tools{Tools} Tools --> Search[search_*] Tools --> Download[download_*] Tools --> Read[read_*] Search --> Factory[_search] Download --> FactoryD[_download] Read --> FactoryR[_read] Factory --> Searchers[(SEARCHERS)] FactoryD --> Searchers FactoryR --> Searchers Searchers --> A[ArxivSearcher] Searchers --> P[PubMedSearcher] Searchers --> S[SemanticSearcher] Searchers --> C[CrossRefSearcher] Searchers --> More[...] ``` --- ## 完整代码分析 ### 1. 导入和配置 ```python """ MCP Server - 学术论文搜索服务 2025 最佳实践版本： - 使用工厂函数减少代码重复 - 统一的错误处理 - 日志记录替代 print() """ from typing import List, Dict, Optional, Any import logging from mcp.server.fastmcp import FastMCP from .academic_platforms.arxiv import ArxivSearcher from .academic_platforms.pubmed import PubMedSearcher # ... 其他导入 # 日志配置 logger = logging.getLogger(__name__) # MCP Server 初始化 mcp = FastMCP("paper_search_server") ``` **💡 学习要点**： 1. **模块级文档字符串**: 描述模块功能和设计决策 2. **`logging`**: 专业的日志系统，比 `print()` 更灵活 3. **`FastMCP`**: MCP 框架的高级封装 --- ### 2. 搜索器注册表 ```python # 搜索器实例（单例） SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), } ``` **💡 设计优势**： 1. **集中管理**: 所有搜索器在一处初始化 2. **单例模式**: 每个搜索器只创建一个实例 3. **易于扩展**: 添加新平台只需一行代码 --- ### 3. 工厂函数（含错误处理） ```python async def _search( searcher_name: str, query: str, max_results: int = 10, **kwargs ) -> List[Dict]: """通用搜索函数""" searcher = SEARCHERS.get(searcher_name) if not searcher: logger.error(f"Unknown searcher: {searcher_name}") return [] try: papers = searcher.search(query, max_results=max_results, **kwargs) return [paper.to_dict() for paper in papers] except Exception as e: logger.error(f"Search failed for {searcher_name}: {e}") return [] async def _download( searcher_name: str, paper_id: str, save_path: str = "./downloads" ) -> str: """通用下载函数""" searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.download_pdf(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Download failed for {searcher_name}: {e}") return f"Error downloading: {str(e)}" async def _read( searcher_name: str, paper_id: str, save_path: str = "./downloads" ) -> str: """通用阅读函数""" searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.read_paper(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Read failed for {searcher_name}: {e}") return f"Error reading paper: {str(e)}" ``` **💡 工厂函数优势**： | 优势 | 说明 | |------|------| | 代码复用 | 一次编写，多处使用 | | 统一错误处理 | 所有搜索器使用相同的错误处理逻辑 | | 易于维护 | 修改一处即可影响所有工具 | | 易于测试 | 可以独立测试工厂函数 | --- ### 4. MCP 工具定义 ```python # ============================================================ # arXiv 工具 # ============================================================ @mcp.tool() async def search_arxiv(query: str, max_results: int = 10) -> List[Dict]: """Search academic papers from arXiv. Args: query: Search query string (e.g., 'machine learning'). max_results: Maximum number of papers to return (default: 10). Returns: List of paper metadata in dictionary format. """ return await _search('arxiv', query, max_results) @mcp.tool() async def download_arxiv(paper_id: str, save_path: str = "./downloads") -> str: """Download PDF of an arXiv paper. Args: paper_id: arXiv paper ID (e.g., '2106.12345'). save_path: Directory to save the PDF (default: './downloads'). Returns: Path to the downloaded PDF file. """ return await _download('arxiv', paper_id, save_path) @mcp.tool() async def read_arxiv_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from an arXiv paper PDF. Args: paper_id: arXiv paper ID (e.g., '2106.12345'). save_path: Directory where the PDF is/will be saved. Returns: str: The extracted text content of the paper. """ return await _read('arxiv', paper_id, save_path) ``` **💡 工具定义简化**：重构前（每个工具都有完整逻辑）： ```python @mcp.tool() async def search_arxiv(query: str, max_results: int = 10) -> List[Dict]: async with httpx.AsyncClient() as client: try: papers = arxiv_searcher.search(query, max_results=max_results) return [paper.to_dict() for paper in papers] except Exception as e: print(f"Error: {e}") return [] ``` 重构后（使用工厂函数）： ```python @mcp.tool() async def search_arxiv(query: str, max_results: int = 10) -> List[Dict]: return await _search('arxiv', query, max_results) ``` --- ### 5. 特殊工具处理有些工具有特殊参数，需要单独处理： ```python @mcp.tool() async def search_semantic( query: str, year: Optional[str] = None, max_results: int = 10 ) -> List[Dict]: """Search academic papers from Semantic Scholar. Args: query: Search query string. year: Optional year filter (e.g., '2019', '2016-2020'). max_results: Maximum number of papers to return. """ kwargs = {'year': year} if year else {} return await _search('semantic', query, max_results, **kwargs) @mcp.tool() async def search_iacr( query: str, max_results: int = 10, fetch_details: bool = True ) -> List[Dict]: """Search IACR ePrint Archive. 特殊参数需要单独处理 """ searcher = SEARCHERS['iacr'] try: papers = searcher.search(query, max_results, fetch_details) return [paper.to_dict() for paper in papers] if papers else [] except Exception as e: logger.error(f"IACR search failed: {e}") return [] ``` --- ### 6. 服务器入口 ```python if __name__ == "__main__": # 配置日志 logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' ) # 运行 MCP 服务器 mcp.run(transport="stdio") ``` **💡 学习要点**： 1. **日志配置**: 设置格式和级别 2. **`transport="stdio"`**: 使用标准输入/输出通信（Claude Desktop 默认） --- ## 工具分类 ### 搜索工具 | 工具 | 平台 | 特殊参数 | |------|------|----------| | `search_arxiv` | arXiv | - | | `search_pubmed` | PubMed | - | | `search_biorxiv` | bioRxiv | - | | `search_medrxiv` | medRxiv | - | | `search_semantic` | Semantic Scholar | `year` | | `search_crossref` | CrossRef | `filter`, `sort` | | `search_iacr` | IACR ePrint | `fetch_details` | | `search_google_scholar` | Google Scholar | - | ### 下载工具 | 工具 | 说明 | |------|------| | `download_arxiv` | 直接下载 | | `download_semantic` | 需要 PDF URL | | `download_pubmed` | ❌ 不支持 | | `download_crossref` | ❌ 不支持 | ### 阅读工具 | 工具 | 输出格式 | |------|----------| | `read_arxiv_paper` | Markdown (PyMuPDF4LLM) | | `read_semantic_paper` | Markdown | | `read_pubmed_paper` | ❌ 返回错误信息 | --- ## 最佳实践总结 ### ✅ 推荐做法 ```python # 1. 使用注册表管理实例 SEARCHERS = {'arxiv': ArxivSearcher(), ...} # 2. 使用工厂函数减少重复 async def _search(name, query, max_results): return SEARCHERS[name].search(query, max_results) # 3. 使用 logging logger.error(f"Search failed: {e}") ``` ### ❌ 避免做法 ```python # 1. 每个工具都重复相同代码 @mcp.tool() async def search_arxiv(...): try: papers = arxiv_searcher.search(...) ... except Exception as e: print(f"Error: {e}") # 不要用 print # 2. 全局实例散落各处 arxiv_searcher = ArxivSearcher() # 在文件顶部 pubmed_searcher = PubMedSearcher() # 难以管理 ``` --- ## 扩展：添加新平台添加新的学术平台只需 3 步： ### 1. 创建搜索器 ```python # paper_search_mcp/academic_platforms/new_platform.py class NewPlatformSearcher(PaperSource): def search(self, query, max_results=10): ... def download_pdf(self, paper_id, save_path): ... def read_paper(self, paper_id, save_path): ... ``` ### 2. 注册搜索器 ```python # server.py from .academic_platforms.new_platform import NewPlatformSearcher SEARCHERS = { ..., 'new_platform': NewPlatformSearcher(), } ``` ### 3. 添加工具 ```python @mcp.tool() async def search_new_platform(query: str, max_results: int = 10) -> List[Dict]: """Search papers from New Platform.""" return await _search('new_platform', query, max_results) ``` --- ## 参考资料 - [FastMCP 文档](https://github.com/jlowin/fastmcp) - [MCP 官方规范](https://modelcontextprotocol.io/) - [Python logging 模块](https://docs.python.org/3/library/logging.html)

Latest Blog Posts

What Is Context Bloat in MCP?
By Om-Shree-0709 on December 16, 2025.
mcp
Context Bloat
MCP Moves to the Linux Foundation: Neutral Stewardship for Agentic Infrastructure
By Om-Shree-0709 on December 15, 2025.
mcp
anthropic
Linux Foundation
Code Execution with MCP: Architecting Agentic Efficiency
By Om-Shree-0709 on December 14, 2025.
mcp
Token bloat

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server