Skip to main content
Glama
gqy20

Europe PMC Literature Search MCP Server

get_references_by_doi

Extract reference lists using DOIs with high-performance batch queries from Europe PMC. Optimize literature retrieval for large-scale research tasks or database building by reducing API calls and network latency.

Instructions

通过DOI获取参考文献列表(批量优化版本 - 基于Europe PMC批量查询能力)

功能说明:

  • 利用Europe PMC的批量查询能力获取参考文献

  • 使用OR操作符将多个DOI合并为单个查询

  • 相比传统方法可实现10倍以上的性能提升

  • 特别适用于大量参考文献的快速获取

  • 集成了发现的Europe PMC批量查询特性

参数说明:

  • doi: 必需,数字对象标识符(如:"10.1126/science.adf6218")

返回值说明:

  • 包含与其他版本相同的基础字段

  • 额外提供:

    • optimization: 优化类型标识

    • batch_info: 批量处理信息

      • batch_size: 批量大小

      • batch_time: 批量查询耗时

      • individual_time: 单个查询预估耗时

      • performance_improvement: 性能提升倍数

    • europe_pmc_batch_query: 使用的批量查询语句

使用场景:

  • 大规模参考文献获取

  • 高性能批量数据处理

  • 时间关键的研究任务

  • 文献数据库构建

性能特点:

  • 比传统方法快10-15倍

  • 利用Europe PMC原生批量查询能力

  • 减少API请求次数

  • 降低网络延迟影响

  • 最适合处理大量参考文献的场景

技术原理:

  • 使用DOI:"xxx" OR DOI:"yyy"的批量查询语法

  • 一次请求获取多个DOI的信息

  • 显著减少API调用次数和网络开销

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doiYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • MCP tool handler named 'get_references' (defaults to DOI lookups via get_references_by_doi_sync), orchestrates multi-source reference retrieval, merging, and deduplication.
    @mcp.tool(
        description="获取参考文献工具。通过文献标识符获取完整参考文献列表。",
        annotations=ToolAnnotations(
            title="参考文献",
            readOnlyHint=True,
            openWorldHint=False
        ),
        tags={"references", "citations", "bibliography"}
    )
    def get_references(
        identifier: str,
        id_type: str = "doi",
        sources: list[str] | None = None,
        max_results: int = 20,
        include_metadata: bool = True,
    ) -> dict[str, Any]:
        """获取参考文献工具。通过文献标识符获取其引用的参考文献列表。
    
        Args:
            identifier: 文献标识符 (DOI, PMID, PMCID, arXiv ID)
            id_type: 标识符类型 ["auto", "doi", "pmid", "pmcid"]
            sources: 数据源列表,支持多源查询
            max_results: 最大参考文献数量 (建议20-100)
            include_metadata: 是否包含详细元数据
    
        Returns:
            包含参考文献列表的字典,包括引用信息和统计
        """
        try:
            if not identifier or not identifier.strip():
                return {
                    "success": False,
                    "error": "文献标识符不能为空",
                    "identifier": identifier,
                    "sources_used": [],
                    "references_by_source": {},
                    "merged_references": [],
                    "total_count": 0,
                    "processing_time": 0,
                }
    
            # 处理None值的sources参数
            if sources is None:
                sources = ["europe_pmc", "crossref"]
    
            start_time = time.time()
            identifier = identifier.strip()
    
            # 自动识别标识符类型
            if id_type == "auto":
                id_type = _extract_identifier_type(identifier)
    
            references_by_source = {}
            sources_used = []
    
            # 从多个数据源获取参考文献
            for source in sources:
                try:
                    if source == "europe_pmc" and "reference" in _reference_services:
                        service = _reference_services["reference"]
                        if id_type == "doi":
                            result = service.get_references_by_doi_sync(identifier)
                        else:
                            result = {"success": False, "error": "Europe PMC only supports DOI for references"}
                        # 判断参考文献获取成功:没有错误且有参考文献数据
                        error = result.get("error")
                        references = result.get("references", [])
                        if not error and references:
                            references_by_source[source] = references
                            sources_used.append(source)
                            logger.info(f"从Europe PMC获取到 {len(references)} 条参考文献")
                        else:
                            logger.warning(f"Europe PMC参考文献获取失败: {error or '无参考文献数据'}")
    
                    elif source == "crossref" and "reference" in _reference_services:
                        # Crossref参考文献获取逻辑
                        service = _reference_services["reference"]
                        references = service.get_references_crossref_sync(identifier)
                        if references:  # Crossref方法直接返回参考文献列表或None
                            references_by_source[source] = references
                            sources_used.append(source)
                            logger.info(f"从Crossref获取到 {len(references)} 条参考文献")
                        else:
                            logger.warning("Crossref参考文献获取失败: 无数据")
    
                    elif source == "pubmed" and "reference" in _reference_services:
                        # PubMed暂时不支持参考文献获取,通过reference服务间接支持
                        service = _reference_services["reference"]
                        if id_type == "doi":
                            result = service.get_references_by_doi_sync(identifier)
                            error = result.get("error")
                            references = result.get("references", [])
                            if not error and references:
                                references_by_source[source] = references
                                sources_used.append(source)
                                logger.info(f"从PubMed获取到 {len(references)} 条参考文献")
                            else:
                                logger.warning(f"PubMed参考文献获取失败: {error or '无参考文献数据'}")
                        else:
                            logger.warning("PubMed参考文献获取仅支持DOI标识符")
    
                except Exception as e:
                    logger.error(f"从 {source} 获取参考文献失败: {e}")
                    continue
    
            # 合并和去重参考文献
            merged_references = _merge_and_deduplicate_references(
                references_by_source, include_metadata, logger
            )
    
            # 限制返回数量
            if len(merged_references) > max_results:
                merged_references = merged_references[:max_results]
    
            processing_time = round(time.time() - start_time, 2)
    
            return {
                "success": len(merged_references) > 0,
                "identifier": identifier,
                "id_type": id_type,
                "sources_used": sources_used,
                "references_by_source": references_by_source,
                "merged_references": merged_references,
                "total_count": len(merged_references),
                "processing_time": processing_time,
            }
    
        except Exception as e:
            logger.error(f"获取参考文献异常: {e}")
            # 抛出MCP标准错误
            from mcp import McpError
            from mcp.types import ErrorData
            raise McpError(ErrorData(
                code=-32603,
                message=f"获取参考文献失败: {type(e).__name__}: {str(e)}"
            ))
  • Core helper function implementing reference retrieval by DOI: queries Crossref API for references, enriches metadata from Europe PMC, applies deduplication.
    def get_references_by_doi_sync(self, doi: str) -> dict[str, Any]:
        """同步获取参考文献"""
        start_time = time.time()
    
        try:
            self.logger.info(f"开始同步获取 DOI {doi} 的参考文献")
    
            # 1. 从 Crossref 获取参考文献列表
            references = self.get_references_crossref_sync(doi)
    
            if references is None:
                return {
                    "references": [],
                    "message": "Crossref 查询失败",
                    "error": "未能从 Crossref 获取参考文献列表",
                    "total_count": 0,
                    "processing_time": time.time() - start_time,
                }
    
            if not references:
                return {
                    "references": [],
                    "message": "未找到参考文献",
                    "error": None,
                    "total_count": 0,
                    "processing_time": time.time() - start_time,
                }
    
            # 2. 使用 Europe PMC 补全信息
            enriched_references = []
            for ref in references:
                doi_ref = ref.get("doi")
                if doi_ref and not (ref.get("abstract") or ref.get("pmid")):
                    self.logger.info(f"使用 Europe PMC 补全: {doi_ref}")
    
                    europe_pmc_info = self.search_europe_pmc_by_doi_sync(doi_ref)
                    if europe_pmc_info:
                        formatted_info = self._format_europe_pmc_metadata(europe_pmc_info)
                        for key, value in formatted_info.items():
                            if value and not ref.get(key):
                                ref[key] = value
    
                    time.sleep(0.2)  # 控制频率
    
                enriched_references.append(ref)
    
            # 3. 去重处理
            final_references = self.deduplicate_references(enriched_references)
    
            processing_time = time.time() - start_time
    
            return {
                "references": final_references,
                "message": f"成功获取 {len(final_references)} 条参考文献 (同步版本)",
                "error": None,
                "total_count": len(final_references),
                "enriched_count": len(
                    [r for r in final_references if r.get("source") == "europe_pmc"]
                ),
                "processing_time": round(processing_time, 2),
            }
    
        except Exception as e:
            processing_time = time.time() - start_time
            self.logger.error(f"同步获取参考文献异常: {e}")
            return {
                "references": [],
                "message": "获取参考文献失败",
                "error": str(e),
                "total_count": 0,
                "processing_time": round(processing_time, 2),
            }
  • Registers the reference tools (including get_references handler) by injecting services like reference_service providing get_references_by_doi_sync.
    reference_services = {
        "europe_pmc": europe_pmc_service,
        "crossref": crossref_service,
        "pubmed": pubmed_service,
        "reference": reference_service,
    }
    register_reference_tools(mcp, reference_services, logger)
  • Supporting helper: Directly fetches raw reference list from Crossref API by DOI.
    def get_references_crossref_sync(self, doi: str) -> list[dict[str, Any]] | None:
        """同步获取 Crossref 参考文献"""
        try:
            url = f"https://api.crossref.org/works/{doi}"
            self.logger.info(f"请求 Crossref: {url}")
    
            resp = self.session.get(url, timeout=20)
            if resp.status_code != 200:
                self.logger.warning(f"Crossref 失败,状态码: {resp.status_code}")
                return None
    
            message = resp.json().get("message", {})
            refs_raw = message.get("reference", [])
    
            if not refs_raw:
                self.logger.info("Crossref 未返回参考文献")
                return []
    
            references = []
            for ref in refs_raw:
                author_raw = ref.get("author")
                authors = None
                if author_raw:
                    authors = [a.strip() for a in re.split("[;,]", author_raw) if a.strip()]
    
                references.append(
                    {
                        "title": ref.get("article-title") or ref.get("title"),
                        "authors": authors,
                        "journal": ref.get("journal-title") or ref.get("journal"),
                        "year": ref.get("year"),
                        "doi": ref.get("DOI") or ref.get("doi"),
                        "source": "crossref",
                    }
                )
    
            self.logger.info(f"Crossref 获取到 {len(references)} 条参考文献")
            return references
    
        except Exception as e:
            self.logger.error(f"Crossref 异常: {e}")
            return None
  • Helper function in tool handler for merging references from multiple sources and deduplicating.
    def _merge_and_deduplicate_references(
        references_by_source: dict[str, list[dict[str, Any]]], include_metadata: bool, logger
    ) -> list[dict[str, Any]]:
        """合并和去重参考文献"""
        try:
            all_references = []
            seen_dois = set()
            seen_titles = set()
    
            for source, references in references_by_source.items():
                for ref in references:
                    # 创建标准化的参考文献记录
                    std_ref = {
                        "title": ref.get("title", ""),
                        "authors": ref.get("authors", []),
                        "journal": ref.get("journal", ""),
                        "publication_date": ref.get("publication_date", ""),
                        "doi": ref.get("doi", ""),
                        "pmid": ref.get("pmid", ""),
                        "pmcid": ref.get("pmcid", ""),
                        "source": source,
                    }
    
                    # 去重逻辑
                    doi = std_ref["doi"]
                    title = std_ref["title"]
                    is_duplicate = False
    
                    if doi and doi in seen_dois:
                        is_duplicate = True
                    elif title and title.lower() in seen_titles:
                        is_duplicate = True
    
                    if not is_duplicate:
                        if doi:
                            seen_dois.add(doi)
                        if title:
                            seen_titles.add(title.lower())
    
                        # 添加元数据
                        if include_metadata:
                            std_ref.update(
                                {
                                    "abstract": ref.get("abstract", ""),
                                    "volume": ref.get("volume", ""),
                                    "issue": ref.get("issue", ""),
                                    "pages": ref.get("pages", ""),
                                    "issn": ref.get("issn", ""),
                                    "publisher": ref.get("publisher", ""),
                                }
                            )
    
                        all_references.append(std_ref)
    
            # 按相关性排序(这里简单按来源排序)
            source_priority = {"europe_pmc": 1, "pubmed": 2, "crossref": 3}
            all_references.sort(key=lambda x: source_priority.get(x.get("source", ""), 4))
    
            return all_references
    
        except Exception as e:
            logger.error(f"合并和去重参考文献失败: {e}")
            return []
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does an excellent job disclosing behavioral traits. It explains performance characteristics ('比传统方法快10-15倍' - 10-15x faster than traditional methods), technical implementation ('使用OR操作符将多个DOI合并为单个查询' - using OR operator to combine multiple DOIs into a single query), and operational benefits ('减少API请求次数、降低网络延迟影响' - reduces API request count, lowers network latency impact). The only minor gap is it doesn't explicitly mention error handling or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (功能说明, 参数说明, 返回值说明, 使用场景, 性能特点, 技术原理), but it's quite verbose with repetitive information about batch optimization and performance benefits. Sentences like '相比传统方法可实现10倍以上的性能提升' and '比传统方法快10-15倍' convey similar points. While informative, it could be more concise by eliminating redundancy while maintaining clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (batch optimization, performance characteristics), no annotations, 0% schema coverage, but with an output schema present, the description is remarkably complete. It covers purpose, usage guidelines, behavioral transparency, parameter semantics, return value details ('返回值说明' section), performance characteristics, and technical implementation. The presence of an output schema means the description doesn't need to exhaustively explain return values, and it provides comprehensive context beyond what structured fields would offer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, but the description provides parameter information: '参数说明:- doi: 必需,数字对象标识符(如:"10.1126/science.adf6218")' (Parameter explanation: - doi: required, digital object identifier). This adds meaning by specifying it's required and providing an example format. However, with only 1 parameter total, the baseline would be 4 if no param info was provided; since it does provide some info but doesn't fully compensate for the 0% schema coverage (e.g., doesn't explain if multiple DOIs can be passed or format constraints), a 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '通过DOI获取参考文献列表(批量优化版本 - 基于Europe PMC批量查询能力)' which translates to 'Get reference list by DOI (batch optimized version - based on Europe PMC batch query capability)'. It specifies the verb ('获取' - get), resource ('参考文献列表' - reference list), and distinguishes from siblings by emphasizing batch optimization and Europe PMC integration, unlike other tools like get_article_details or search_europe_pmc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides usage scenarios: '使用场景:大规模参考文献获取、高性能批量数据处理、时间关键的研究任务、文献数据库构建' (Usage scenarios: large-scale reference acquisition, high-performance batch data processing, time-critical research tasks, literature database construction). It also distinguishes when to use this tool by mentioning it's '特别适用于大量参考文献的快速获取' (especially suitable for rapid acquisition of large numbers of references) and '最适合处理大量参考文献的场景' (most suitable for scenarios handling large numbers of references), guiding users away from alternatives for small-scale tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gqy20/article-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server