Skip to main content
Glama
gqy20

Europe PMC Literature Search MCP Server

get_references_by_doi

Extract reference lists using DOIs with high-performance batch queries from Europe PMC. Optimize literature retrieval for large-scale research tasks or database building by reducing API calls and network latency.

Instructions

通过DOI获取参考文献列表(批量优化版本 - 基于Europe PMC批量查询能力)

功能说明:

  • 利用Europe PMC的批量查询能力获取参考文献

  • 使用OR操作符将多个DOI合并为单个查询

  • 相比传统方法可实现10倍以上的性能提升

  • 特别适用于大量参考文献的快速获取

  • 集成了发现的Europe PMC批量查询特性

参数说明:

  • doi: 必需,数字对象标识符(如:"10.1126/science.adf6218")

返回值说明:

  • 包含与其他版本相同的基础字段

  • 额外提供:

    • optimization: 优化类型标识

    • batch_info: 批量处理信息

      • batch_size: 批量大小

      • batch_time: 批量查询耗时

      • individual_time: 单个查询预估耗时

      • performance_improvement: 性能提升倍数

    • europe_pmc_batch_query: 使用的批量查询语句

使用场景:

  • 大规模参考文献获取

  • 高性能批量数据处理

  • 时间关键的研究任务

  • 文献数据库构建

性能特点:

  • 比传统方法快10-15倍

  • 利用Europe PMC原生批量查询能力

  • 减少API请求次数

  • 降低网络延迟影响

  • 最适合处理大量参考文献的场景

技术原理:

  • 使用DOI:"xxx" OR DOI:"yyy"的批量查询语法

  • 一次请求获取多个DOI的信息

  • 显著减少API调用次数和网络开销

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doiYes

Implementation Reference

  • MCP tool handler named 'get_references' (defaults to DOI lookups via get_references_by_doi_sync), orchestrates multi-source reference retrieval, merging, and deduplication.
    @mcp.tool(
        description="获取参考文献工具。通过文献标识符获取完整参考文献列表。",
        annotations=ToolAnnotations(
            title="参考文献",
            readOnlyHint=True,
            openWorldHint=False
        ),
        tags={"references", "citations", "bibliography"}
    )
    def get_references(
        identifier: str,
        id_type: str = "doi",
        sources: list[str] | None = None,
        max_results: int = 20,
        include_metadata: bool = True,
    ) -> dict[str, Any]:
        """获取参考文献工具。通过文献标识符获取其引用的参考文献列表。
    
        Args:
            identifier: 文献标识符 (DOI, PMID, PMCID, arXiv ID)
            id_type: 标识符类型 ["auto", "doi", "pmid", "pmcid"]
            sources: 数据源列表,支持多源查询
            max_results: 最大参考文献数量 (建议20-100)
            include_metadata: 是否包含详细元数据
    
        Returns:
            包含参考文献列表的字典,包括引用信息和统计
        """
        try:
            if not identifier or not identifier.strip():
                return {
                    "success": False,
                    "error": "文献标识符不能为空",
                    "identifier": identifier,
                    "sources_used": [],
                    "references_by_source": {},
                    "merged_references": [],
                    "total_count": 0,
                    "processing_time": 0,
                }
    
            # 处理None值的sources参数
            if sources is None:
                sources = ["europe_pmc", "crossref"]
    
            start_time = time.time()
            identifier = identifier.strip()
    
            # 自动识别标识符类型
            if id_type == "auto":
                id_type = _extract_identifier_type(identifier)
    
            references_by_source = {}
            sources_used = []
    
            # 从多个数据源获取参考文献
            for source in sources:
                try:
                    if source == "europe_pmc" and "reference" in _reference_services:
                        service = _reference_services["reference"]
                        if id_type == "doi":
                            result = service.get_references_by_doi_sync(identifier)
                        else:
                            result = {"success": False, "error": "Europe PMC only supports DOI for references"}
                        # 判断参考文献获取成功:没有错误且有参考文献数据
                        error = result.get("error")
                        references = result.get("references", [])
                        if not error and references:
                            references_by_source[source] = references
                            sources_used.append(source)
                            logger.info(f"从Europe PMC获取到 {len(references)} 条参考文献")
                        else:
                            logger.warning(f"Europe PMC参考文献获取失败: {error or '无参考文献数据'}")
    
                    elif source == "crossref" and "reference" in _reference_services:
                        # Crossref参考文献获取逻辑
                        service = _reference_services["reference"]
                        references = service.get_references_crossref_sync(identifier)
                        if references:  # Crossref方法直接返回参考文献列表或None
                            references_by_source[source] = references
                            sources_used.append(source)
                            logger.info(f"从Crossref获取到 {len(references)} 条参考文献")
                        else:
                            logger.warning("Crossref参考文献获取失败: 无数据")
    
                    elif source == "pubmed" and "reference" in _reference_services:
                        # PubMed暂时不支持参考文献获取,通过reference服务间接支持
                        service = _reference_services["reference"]
                        if id_type == "doi":
                            result = service.get_references_by_doi_sync(identifier)
                            error = result.get("error")
                            references = result.get("references", [])
                            if not error and references:
                                references_by_source[source] = references
                                sources_used.append(source)
                                logger.info(f"从PubMed获取到 {len(references)} 条参考文献")
                            else:
                                logger.warning(f"PubMed参考文献获取失败: {error or '无参考文献数据'}")
                        else:
                            logger.warning("PubMed参考文献获取仅支持DOI标识符")
    
                except Exception as e:
                    logger.error(f"从 {source} 获取参考文献失败: {e}")
                    continue
    
            # 合并和去重参考文献
            merged_references = _merge_and_deduplicate_references(
                references_by_source, include_metadata, logger
            )
    
            # 限制返回数量
            if len(merged_references) > max_results:
                merged_references = merged_references[:max_results]
    
            processing_time = round(time.time() - start_time, 2)
    
            return {
                "success": len(merged_references) > 0,
                "identifier": identifier,
                "id_type": id_type,
                "sources_used": sources_used,
                "references_by_source": references_by_source,
                "merged_references": merged_references,
                "total_count": len(merged_references),
                "processing_time": processing_time,
            }
    
        except Exception as e:
            logger.error(f"获取参考文献异常: {e}")
            # 抛出MCP标准错误
            from mcp import McpError
            from mcp.types import ErrorData
            raise McpError(ErrorData(
                code=-32603,
                message=f"获取参考文献失败: {type(e).__name__}: {str(e)}"
            ))
  • Core helper function implementing reference retrieval by DOI: queries Crossref API for references, enriches metadata from Europe PMC, applies deduplication.
    def get_references_by_doi_sync(self, doi: str) -> dict[str, Any]:
        """同步获取参考文献"""
        start_time = time.time()
    
        try:
            self.logger.info(f"开始同步获取 DOI {doi} 的参考文献")
    
            # 1. 从 Crossref 获取参考文献列表
            references = self.get_references_crossref_sync(doi)
    
            if references is None:
                return {
                    "references": [],
                    "message": "Crossref 查询失败",
                    "error": "未能从 Crossref 获取参考文献列表",
                    "total_count": 0,
                    "processing_time": time.time() - start_time,
                }
    
            if not references:
                return {
                    "references": [],
                    "message": "未找到参考文献",
                    "error": None,
                    "total_count": 0,
                    "processing_time": time.time() - start_time,
                }
    
            # 2. 使用 Europe PMC 补全信息
            enriched_references = []
            for ref in references:
                doi_ref = ref.get("doi")
                if doi_ref and not (ref.get("abstract") or ref.get("pmid")):
                    self.logger.info(f"使用 Europe PMC 补全: {doi_ref}")
    
                    europe_pmc_info = self.search_europe_pmc_by_doi_sync(doi_ref)
                    if europe_pmc_info:
                        formatted_info = self._format_europe_pmc_metadata(europe_pmc_info)
                        for key, value in formatted_info.items():
                            if value and not ref.get(key):
                                ref[key] = value
    
                    time.sleep(0.2)  # 控制频率
    
                enriched_references.append(ref)
    
            # 3. 去重处理
            final_references = self.deduplicate_references(enriched_references)
    
            processing_time = time.time() - start_time
    
            return {
                "references": final_references,
                "message": f"成功获取 {len(final_references)} 条参考文献 (同步版本)",
                "error": None,
                "total_count": len(final_references),
                "enriched_count": len(
                    [r for r in final_references if r.get("source") == "europe_pmc"]
                ),
                "processing_time": round(processing_time, 2),
            }
    
        except Exception as e:
            processing_time = time.time() - start_time
            self.logger.error(f"同步获取参考文献异常: {e}")
            return {
                "references": [],
                "message": "获取参考文献失败",
                "error": str(e),
                "total_count": 0,
                "processing_time": round(processing_time, 2),
            }
  • Registers the reference tools (including get_references handler) by injecting services like reference_service providing get_references_by_doi_sync.
    reference_services = {
        "europe_pmc": europe_pmc_service,
        "crossref": crossref_service,
        "pubmed": pubmed_service,
        "reference": reference_service,
    }
    register_reference_tools(mcp, reference_services, logger)
  • Supporting helper: Directly fetches raw reference list from Crossref API by DOI.
    def get_references_crossref_sync(self, doi: str) -> list[dict[str, Any]] | None:
        """同步获取 Crossref 参考文献"""
        try:
            url = f"https://api.crossref.org/works/{doi}"
            self.logger.info(f"请求 Crossref: {url}")
    
            resp = self.session.get(url, timeout=20)
            if resp.status_code != 200:
                self.logger.warning(f"Crossref 失败,状态码: {resp.status_code}")
                return None
    
            message = resp.json().get("message", {})
            refs_raw = message.get("reference", [])
    
            if not refs_raw:
                self.logger.info("Crossref 未返回参考文献")
                return []
    
            references = []
            for ref in refs_raw:
                author_raw = ref.get("author")
                authors = None
                if author_raw:
                    authors = [a.strip() for a in re.split("[;,]", author_raw) if a.strip()]
    
                references.append(
                    {
                        "title": ref.get("article-title") or ref.get("title"),
                        "authors": authors,
                        "journal": ref.get("journal-title") or ref.get("journal"),
                        "year": ref.get("year"),
                        "doi": ref.get("DOI") or ref.get("doi"),
                        "source": "crossref",
                    }
                )
    
            self.logger.info(f"Crossref 获取到 {len(references)} 条参考文献")
            return references
    
        except Exception as e:
            self.logger.error(f"Crossref 异常: {e}")
            return None
  • Helper function in tool handler for merging references from multiple sources and deduplicating.
    def _merge_and_deduplicate_references(
        references_by_source: dict[str, list[dict[str, Any]]], include_metadata: bool, logger
    ) -> list[dict[str, Any]]:
        """合并和去重参考文献"""
        try:
            all_references = []
            seen_dois = set()
            seen_titles = set()
    
            for source, references in references_by_source.items():
                for ref in references:
                    # 创建标准化的参考文献记录
                    std_ref = {
                        "title": ref.get("title", ""),
                        "authors": ref.get("authors", []),
                        "journal": ref.get("journal", ""),
                        "publication_date": ref.get("publication_date", ""),
                        "doi": ref.get("doi", ""),
                        "pmid": ref.get("pmid", ""),
                        "pmcid": ref.get("pmcid", ""),
                        "source": source,
                    }
    
                    # 去重逻辑
                    doi = std_ref["doi"]
                    title = std_ref["title"]
                    is_duplicate = False
    
                    if doi and doi in seen_dois:
                        is_duplicate = True
                    elif title and title.lower() in seen_titles:
                        is_duplicate = True
    
                    if not is_duplicate:
                        if doi:
                            seen_dois.add(doi)
                        if title:
                            seen_titles.add(title.lower())
    
                        # 添加元数据
                        if include_metadata:
                            std_ref.update(
                                {
                                    "abstract": ref.get("abstract", ""),
                                    "volume": ref.get("volume", ""),
                                    "issue": ref.get("issue", ""),
                                    "pages": ref.get("pages", ""),
                                    "issn": ref.get("issn", ""),
                                    "publisher": ref.get("publisher", ""),
                                }
                            )
    
                        all_references.append(std_ref)
    
            # 按相关性排序(这里简单按来源排序)
            source_priority = {"europe_pmc": 1, "pubmed": 2, "crossref": 3}
            all_references.sort(key=lambda x: source_priority.get(x.get("source", ""), 4))
    
            return all_references
    
        except Exception as e:
            logger.error(f"合并和去重参考文献失败: {e}")
            return []

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gqy20/article-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server