Skip to main content
Glama

get_protein_data_tool

Retrieve comprehensive protein structure data from the Protein Data Bank by specifying a PDB ID and desired data types like basic info, sequences, or structural analysis.

Instructions

蛋白质综合数据工具 - 获取完整蛋白质信息包

这个工具是蛋白质数据获取的核心,一次性获取你需要的所有信息。

Args: pdb_id: PDB ID (例如: "5G53") data_types: 需要的数据类型列表 - "basic": 基本信息 (标题、方法、分辨率等) - "sequence": 氨基酸序列信息 - "structure": 二级结构分析 - "all": 获取所有数据 chain_id: 特定链ID (例如: "A",可选) ctx: FastMCP Context,用于进度反馈和日志记录

Returns: 完整的蛋白质数据包,包含请求的所有数据类型

Examples: # 获取所有数据 get_protein_data("5G53", ["all"])

# 只获取基本信息和序列
get_protein_data("1A3N", ["basic", "sequence"])

# 获取特定链的数据
get_protein_data("2HHB", ["all"], "A")

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdb_idYes
data_typesNo
chain_idNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • Main MCP tool handler 'get_protein_data_tool' decorated with @mcp.tool(), handles input validation, defaulting data_types, and delegates to core helper. Includes detailed schema in docstring.
    async def get_protein_data_tool(
        pdb_id: str,
        data_types: list[str] | None = None,
        chain_id: str | None = None,
        ctx: Context | None = None,
    ) -> dict[str, Any]:
        """
        蛋白质综合数据工具 - 获取完整蛋白质信息包
    
        这个工具是蛋白质数据获取的核心,一次性获取你需要的所有信息。
    
        Args:
            pdb_id: PDB ID (例如: "5G53")
            data_types: 需要的数据类型列表
                - "basic": 基本信息 (标题、方法、分辨率等)
                - "sequence": 氨基酸序列信息
                - "structure": 二级结构分析
                - "all": 获取所有数据
            chain_id: 特定链ID (例如: "A",可选)
            ctx: FastMCP Context,用于进度反馈和日志记录
    
        Returns:
            完整的蛋白质数据包,包含请求的所有数据类型
    
        Examples:
            # 获取所有数据
            get_protein_data("5G53", ["all"])
    
            # 只获取基本信息和序列
            get_protein_data("1A3N", ["basic", "sequence"])
    
            # 获取特定链的数据
            get_protein_data("2HHB", ["all"], "A")
        """
        # 如果没有指定数据类型,默认获取基本数据
        if data_types is None:
            data_types = ["basic", "sequence", "structure"]
        return await get_protein_data(pdb_id, data_types, chain_id, ctx)
  • Core implementation logic for retrieving protein data: validates PDB ID, fetches basic info from RCSB, downloads PDB for sequence extraction, performs DSSP secondary structure analysis.
    async def get_protein_data(
        pdb_id: str, data_types: list[str], chain_id: str | None = None, ctx: Context | None = None
    ) -> dict[str, Any]:
        """
        蛋白质综合数据工具 - 获取完整蛋白质信息包
    
        这个工具是蛋白质数据获取的核心,一次性获取你需要的所有信息。
    
        Args:
            pdb_id: PDB ID (例如: "5G53")
            data_types: 需要的数据类型列表
                - "basic": 基本信息 (标题、方法、分辨率等)
                - "sequence": 氨基酸序列信息
                - "structure": 二级结构分析
                - "all": 获取所有数据
            chain_id: 特定链ID (例如: "A",可选)
            ctx: FastMCP Context,用于进度反馈和日志记录
    
        Returns:
            完整的蛋白质数据包,包含请求的所有数据类型
        """
        try:
            if ctx:
                await ctx.info(f"📊 开始获取蛋白质数据: {pdb_id}")
                await ctx.report_progress(0, 100, "初始化...")
            if not validate_pdb_id(pdb_id):
                return format_error_response(
                    "无效的PDB ID格式",
                    f"期望格式: 4位字符 (首位数字,后三位可数字可字母),实际: {pdb_id}",
                )
    
            # 处理 "all" 参数
            if "all" in data_types:
                data_types = ["basic", "sequence", "structure"]
    
            # 验证PDB ID存在性
            if not _validate_pdb_exists(pdb_id):
                if ctx:
                    await ctx.error(f"❌ PDB ID {pdb_id} 不存在")
                return format_error_response("PDB ID不存在", f"PDB ID {pdb_id} 在RCSB数据库中未找到")
    
            result_data = {}
    
            # 获取基本信息
            if "basic" in data_types:
                if ctx:
                    await ctx.report_progress(25, 100, "获取基本信息...")
                    await ctx.info(f"🔍 查询 {pdb_id} 基本信息...")
    
                entry_info = _get_entry_info(pdb_id)
                if entry_info:
                    struct_data = entry_info.get("struct", {})
                    result_data["basic"] = {
                        "pdb_id": pdb_id,
                        "title": struct_data.get("title", "未知标题"),
                        "method": struct_data.get("pdbx_descriptor", "未知方法"),
                        "resolution": struct_data.get("pdbx_resolution", None),
                        "deposition_date": struct_data.get("pdbx_deposit_date", "未知"),
                        "authors": [
                            author.get("name", "未知") for author in entry_info.get("audit_author", [])
                        ],
                    }
                    if ctx:
                        await ctx.info("✅ 基本信息获取完成")
                else:
                    result_data["basic"] = {"error": "无法获取基本信息"}
    
            # 获取序列信息
            if "sequence" in data_types:
                try:
                    if ctx:
                        await ctx.report_progress(50, 100, "提取序列信息...")
                        await ctx.info("🧬 下载并解析PDB文件...")
    
                    # 下载PDB文件并提取序列
                    pdb_url = f"{RCSB_DOWNLOAD_URL}/{pdb_id}.pdb"
                    local_pdb_file = f"{pdb_id}.pdb"
    
                    if download_file(pdb_url, local_pdb_file):
                        sequence_data = extract_sequence_from_pdb(local_pdb_file, chain_id)
                        if sequence_data:
                            result_data["sequence"] = {
                                "chain_id": sequence_data.get("chain_id", chain_id or "A"),
                                "sequence_1_letter": sequence_data.get("sequence_1_letter", ""),
                                "sequence_3_letter": sequence_data.get("sequence_3_letter", ""),
                                "length": sequence_data.get("length", 0),
                            }
                            if ctx:
                                await ctx.info(f"✅ 序列信息提取完成 (长度: {sequence_data.get('length', 0)})")
                        else:
                            result_data["sequence"] = {"error": "无法提取序列信息"}
                    else:
                        result_data["sequence"] = {"error": "PDB文件下载失败"}
                except Exception as e:
                    result_data["sequence"] = {"error": f"序列提取失败: {str(e)}"}
    
            # 获取二级结构信息
            if "structure" in data_types:
                if "sequence" in result_data and "sequence_1_letter" in result_data["sequence"]:
                    sequence = result_data["sequence"]["sequence_1_letter"]
                    try:
                        if ctx:
                            await ctx.report_progress(75, 100, "分析二级结构...")
                            await ctx.info("🔬 执行DSSP二级结构分析...")
    
                        secondary_structure = calculate_dssp(pdb_id, sequence)
                        result_data["structure"] = {
                            "dssp_prediction": secondary_structure,
                            "sequence_length": len(sequence),
                            "composition": {
                                "helix": secondary_structure.count("H"),
                                "strand": secondary_structure.count("E"),
                                "coil": secondary_structure.count("C"),
                            },
                        }
                        if ctx:
                            await ctx.info("✅ 二级结构分析完成")
                    except Exception as e:
                        result_data["structure"] = {"error": f"二级结构分析失败: {str(e)}"}
                else:
                    result_data["structure"] = {"error": "需要先获取序列信息"}
    
            # 计算成功率
            successful_types = [
                dt for dt in data_types if dt in result_data and "error" not in result_data[dt]
            ]
            success_rate = len(successful_types) / len(data_types) * 100 if data_types else 0
    
            if ctx:
                await ctx.report_progress(100, 100, "完成")
                await ctx.info(f"✅ 成功获取 {pdb_id} 的数据 ({success_rate:.0f}% 成功率)")
    
            return format_success_response(
                {
                    "pdb_id": pdb_id,
                    "requested_data_types": data_types,
                    "data": result_data,
                    "success_rate": success_rate,
                    "chain_id": chain_id,
                },
                f"成功获取 {pdb_id} 的数据: {', '.join(successful_types)} ({success_rate:.0f}%)",
            )
    
        except Exception as e:
            if ctx:
                await ctx.error(f"❌ 数据获取失败: {str(e)}")
            return format_error_response("数据获取错误", f"get_protein_data 执行失败: {str(e)}")
  • Registration of all MCP tools including 'get_protein_data_tool' via @mcp.tool() decorators within the register_all_tools function.
    def register_all_tools(mcp) -> None:
        """
        注册3个核心整合工具到FastMCP服务器
    
        优化后的工具设计:
        1. find_protein_structures - 蛋白质结构发现工具
        2. get_protein_data - 蛋白质综合数据工具
        3. download_structure - 结构文件工具
    
        Args:
            mcp: FastMCP服务器实例
        """
    
        # 工具1: 蛋白质结构发现工具 - 整合搜索、示例、验证功能
        @mcp.tool()
        async def find_protein_structures_tool(
            keywords: str | None = None,
            category: str | None = None,
            pdb_id: str | None = None,
            max_results: int = 10,
            ctx: Context | None = None,
        ) -> dict[str, Any]:
            """
            蛋白质结构发现工具 - 搜索、示例、验证的统一入口
    
            这是蛋白质研究的起点,帮助你发现和验证PDB结构。
    
            Args:
                keywords: 搜索关键词 (如: "hemoglobin", "kinase", "DNA")
                category: 预设类别 ("癌症靶点", "病毒蛋白", "酶类", "抗体", "膜蛋白", "核糖体")
                pdb_id: 直接验证或查看特定PDB ID (如: "1A3N")
                max_results: 搜索结果最大数量 (默认10,最大100)
                ctx: FastMCP Context,用于进度反馈和日志记录
    
            Returns:
                包含PDB结构列表、验证结果、示例数据的综合响应
    
            Examples:
                # 搜索血红蛋白相关结构
                find_protein_structures(keywords="hemoglobin")
    
                # 获取癌症靶点示例
                find_protein_structures(category="癌症靶点")
    
                # 验证PDB ID
                find_protein_structures(pdb_id="1A3N")
            """
            return await find_protein_structures(keywords, category, pdb_id, max_results, ctx)
    
        # 工具2: 蛋白质综合数据工具 - 一次获取所有蛋白质信息
        @mcp.tool()
        async def get_protein_data_tool(
            pdb_id: str,
            data_types: list[str] | None = None,
            chain_id: str | None = None,
            ctx: Context | None = None,
        ) -> dict[str, Any]:
            """
            蛋白质综合数据工具 - 获取完整蛋白质信息包
    
            这个工具是蛋白质数据获取的核心,一次性获取你需要的所有信息。
    
            Args:
                pdb_id: PDB ID (例如: "5G53")
                data_types: 需要的数据类型列表
                    - "basic": 基本信息 (标题、方法、分辨率等)
                    - "sequence": 氨基酸序列信息
                    - "structure": 二级结构分析
                    - "all": 获取所有数据
                chain_id: 特定链ID (例如: "A",可选)
                ctx: FastMCP Context,用于进度反馈和日志记录
    
            Returns:
                完整的蛋白质数据包,包含请求的所有数据类型
    
            Examples:
                # 获取所有数据
                get_protein_data("5G53", ["all"])
    
                # 只获取基本信息和序列
                get_protein_data("1A3N", ["basic", "sequence"])
    
                # 获取特定链的数据
                get_protein_data("2HHB", ["all"], "A")
            """
            # 如果没有指定数据类型,默认获取基本数据
            if data_types is None:
                data_types = ["basic", "sequence", "structure"]
            return await get_protein_data(pdb_id, data_types, chain_id, ctx)
    
        # 工具3: 结构文件工具 - 下载和管理蛋白质结构文件
        @mcp.tool()
        async def download_structure_tool(
            pdb_id: str,
            file_format: str = "pdb",
            save_local: bool = False,
            ctx: Context | None = None,
        ) -> dict[str, Any]:
            """
            结构文件工具 - 下载和管理蛋白质结构文件
    
            这个工具处理所有文件相关的操作,从下载到格式说明。
    
            Args:
                pdb_id: PDB ID (例如: "5G53")
                file_format: 文件格式
                    - "pdb": 标准PDB格式 (推荐,人类可读)
                    - "mmcif": 大分子晶体信息文件格式 (现代标准)
                    - "cif": 晶体信息文件格式
                    - "mmtf": 大分子传输格式 (二进制,速度快)
                save_local: 是否保存到本地文件 (默认False返回内容)
                ctx: FastMCP Context,用于进度反馈和日志记录
    
            Returns:
                文件内容或下载信息 + 格式说明和使用指南
    
            Examples:
                # 获取PDB文件内容
                download_structure("1A3N")
    
                # 下载mmCIF格式并保存到本地
                download_structure("2HHB", "mmcif", True)
    
                # 获取快速MMTF格式
                download_structure("6VSB", "mmtf")
            """
            return await download_structure(pdb_id, file_format, save_local, ctx)
  • The create_server function that initializes FastMCP instance and calls register_all_tools(mcp) to register the tools including get_protein_data_tool.
    def create_server(name: str = "protein-mcp", version: str = "0.1.5") -> FastMCP:
        """创建并配置FastMCP服务器实例"""
        mcp = FastMCP(
            name=name,
            version=version,
        )
    
        # 注册所有工具
        register_all_tools(mcp)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the tool returns a '完整蛋白质数据包' (complete protein data package) but doesn't disclose behavioral traits like rate limits, authentication requirements, error conditions, or whether it's a read-only operation. The description adds minimal context beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately structured with sections (Args, Returns, Examples) but includes some redundancy (e.g., repeating tool name in examples). The Chinese text is clear but could be more front-loaded; the core purpose is stated early but followed by less essential details. Some sentences like '这个工具是蛋白质数据获取的核心' add minimal value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no annotations, but has output schema), the description is reasonably complete. It explains parameters well, provides examples, and states the return value. The output schema existence means it doesn't need to detail return structure. However, it lacks behavioral context and sibling tool differentiation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining all 3 parameters: pdb_id (with example), data_types (with detailed options and meanings), and chain_id (optional, with example). It provides clear semantic meaning beyond the bare schema, though it doesn't cover the ctx parameter mentioned in the description but not in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '获取完整蛋白质信息包' (get complete protein information package) and '一次性获取你需要的所有信息' (get all needed information at once). It specifies the resource (protein data) and action (retrieve), though it doesn't explicitly differentiate from sibling tools like download_structure_tool or find_protein_structures_tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus the sibling tools (download_structure_tool, find_protein_structures_tool). It mentions this is the '核心' (core) tool for protein data, but doesn't specify scenarios where alternatives might be more appropriate or any prerequisites for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gqy20/protein-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server