Skip to main content
Glama

build_phylogenetic_profile

Constructs phylogenetic profiles to analyze gene distribution across species, supporting evolutionary research and homologous gene analysis.

Instructions

系统发育图谱构建工具 - MCP接口包装

Args: gene_symbols: 基因符号列表 species_set: 物种集合(默认包含常用模式生物) include_domain_info: 是否包含结构域信息

Returns: 系统发育图谱数据

Examples: # 分析p53家族在脊椎动物中的分布 build_phylogenetic_profile(["TP53", "TP63", "TP73"], ["human", "mouse", "zebrafish"])

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
gene_symbolsYes
species_setNo
include_domain_infoNo

Implementation Reference

  • Core implementation of the build_phylogenetic_profile tool. Performs ortholog analysis for each gene using analyze_gene_evolution, constructs presence/absence matrix across species, computes family evolution analysis, and returns phylogenetic profile data.
    async def build_phylogenetic_profile(
        gene_symbols: list[str],
        species_set: list[str] = None,
        include_domain_info: bool = True,
        query_executor: QueryExecutor = None,
    ) -> dict[str, Any]:
        """
        构建系统发育图谱 - 分析多个基因在指定物种集合中的分布
    
        用于研究:
        - 基因家族进化
        - 物种特异性基因丢失
        - 功能保守性分析
        - 比较基因组学研究
    
        Args:
            gene_symbols: 基因符号列表
            species_set: 物种集合(默认包含常用模式生物)
            include_domain_info: 是否包含结构域信息
            query_executor: 查询执行器实例
    
        Returns:
            系统发育图谱数据,包含存在/缺失矩阵和进化分析
    
        Examples:
            # 分析p53家族在脊椎动物中的分布
            build_phylogenetic_profile(["TP53", "TP63", "TP73"], ["human", "mouse", "zebrafish"])
        """
        if query_executor is None:
            query_executor = QueryExecutor()
    
        if species_set is None:
            species_set = [
                "human",
                "mouse",
                "rat",
                "zebrafish",
                "fruitfly",
                "worm",
                "yeast",
            ]
    
        try:
            # 批量分析基因
            results = {}
            service_unavailable_count = 0
    
            for gene_symbol in gene_symbols:
                # 分析每个基因的同源关系
                gene_result = await analyze_gene_evolution(
                    gene_symbol,
                    species_set,
                    include_sequence_info=include_domain_info,
                    query_executor=query_executor,
                )
    
                # 检查服务状态
                if (
                    gene_result.get("error")
                    and gene_result.get("status") == "service_unavailable"
                ):
                    service_unavailable_count += 1
    
                results[gene_symbol] = gene_result
    
            # 如果所有基因都查询失败,返回服务不可用
            if service_unavailable_count == len(gene_symbols):
                return {
                    "error": "系统发育分析服务不可用",
                    "gene_symbols": gene_symbols,
                    "status": "service_unavailable",
                    "message": "Ensembl API服务不可用,无法构建系统发育图谱",
                    "suggestions": ["稍后重试", "检查网络连接", "确认基因符号正确"],
                    "alternative_resources": [
                        {
                            "name": "Ensembl Web界面",
                            "url": "https://www.ensembl.org/Homo_sapiens/Search",
                            "description": "在Ensembl网站手动搜索基因",
                        },
                        {
                            "name": "NCBI Gene",
                            "url": "https://www.ncbi.nlm.nih.gov/gene",
                            "description": "NCBI基因数据库",
                        },
                    ],
                }
    
            # 构建存在/缺失矩阵
            presence_matrix = _build_presence_absence_matrix(results, species_set)
    
            # 分析基因家族进化
            family_analysis = _analyze_gene_family_evolution(results, species_set)
    
            return {
                "gene_symbols": gene_symbols,
                "species_set": species_set,
                "presence_matrix": presence_matrix,
                "family_analysis": family_analysis,
                "individual_results": results,
                "summary": {
                    "total_genes": len(gene_symbols),
                    "total_species": len(species_set),
                    "successful_queries": len(gene_symbols) - service_unavailable_count,
                    "failed_queries": service_unavailable_count,
                    "conservation_patterns": _identify_conservation_patterns(
                        presence_matrix
                    ),
                },
            }
    
        except Exception as e:
            return {
                "error": str(e),
                "gene_symbols": gene_symbols,
                "error_type": "phylogenetic_profile_error",
                "suggestions": [
                    "检查基因符号列表是否正确",
                    "确认物种列表格式正确",
                    "减少基因数量后重试",
                ],
                "troubleshooting": {
                    "gene_count": len(gene_symbols),
                    "species_count": len(species_set) if species_set else 0,
                    "possible_causes": [
                        "某些基因符号不存在",
                        "网络连接问题",
                        "Ensembl API限制",
                    ],
                },
            }
  • MCP tool registration and wrapper handler for build_phylogenetic_profile. Registers the tool with FastMCP, performs validation, and delegates to the internal implementation.
    async def build_phylogenetic_profile(
        gene_symbols: list[str],
        species_set: list[str] = None,
        include_domain_info: bool = True,
    ) -> PhylogeneticProfileResult:
        """
        系统发育图谱构建工具 - MCP接口包装
    
        Args:
            gene_symbols: 基因符号列表
            species_set: 物种集合(默认包含常用模式生物)
            include_domain_info: 是否包含结构域信息
    
        Returns:
            系统发育图谱数据
    
        Examples:
            # 分析p53家族在脊椎动物中的分布
            build_phylogenetic_profile(["TP53", "TP63", "TP73"], ["human", "mouse", "zebrafish"])
        """
        try:
            return await _build_phylogenetic_profile_internal(
                gene_symbols, species_set, include_domain_info, _query_executor
            )
        except ValidationError as e:
            return format_simple_error(
                e, query=str(gene_symbols), operation="build_phylogenetic_profile"
            )
        except Exception as e:
            return format_simple_error(
                e, query=str(gene_symbols), operation="build_phylogenetic_profile"
            )
  • Type definition (TypedDict) for the return type of build_phylogenetic_profile, defining the structure of the phylogenetic profile result.
    class PhylogeneticProfileResult(TypedDict):
        """系统发育图谱结果类型"""
    
        query_genes: list[str]
        phylogenetic_data: dict[str, list[dict[str, Any]]]
        domain_info: dict[str, list[dict[str, Any]]] | None
        profile_metadata: dict[str, Any]
  • Helper function that builds the presence/absence matrix from ortholog query results across the specified species set.
    def _build_presence_absence_matrix(
        results: dict, species_set: list[str]
    ) -> dict[str, dict]:
        """构建存在/缺失矩阵"""
        matrix = {}
    
        for gene_symbol, gene_result in results.items():
            gene_row = {}
            orthologs_data = gene_result.get("result", {}).get("orthologs", [])
            present_species = set()
    
            # 标准化物种名称
            for ortholog in orthologs_data:
                organism_name = ortholog.get("organism_name", "").lower()
                # 标准化物种名称(移除下划线,转换为小写)
                normalized_name = organism_name.replace("_", " ")
                present_species.add(normalized_name)
    
            for species in species_set:
                species_lower = species.lower()
                # 检查各种可能的物种名称格式
                species_variants = [
                    species_lower,
                    species_lower.replace(" ", "_"),
                    species_lower.replace(" ", ""),
                ]
    
                gene_row[species] = any(
                    variant in present_species
                    or any(variant in present for present in present_species)
                    for variant in species_variants
                )
    
            matrix[gene_symbol] = gene_row
    
        return matrix
  • Helper function that analyzes gene family evolution by calculating conservation scores and identifying most/least conserved genes.
    def _analyze_gene_family_evolution(
        results: dict, species_set: list[str]
    ) -> dict[str, Any]:
        """分析基因家族进化"""
        # 计算基因保守性
        conservation_scores = {}
        for gene_symbol, gene_result in results.items():
            conservation_scores[gene_symbol] = _calculate_conservation_score(gene_result)
    
        # 识别保守性模式
        most_conserved = max(conservation_scores.items(), key=lambda x: x[1])
        least_conserved = min(conservation_scores.items(), key=lambda x: x[1])
    
        return {
            "conservation_scores": conservation_scores,
            "most_conserved_gene": most_conserved[0],
            "least_conserved_gene": least_conserved[0],
            "conservation_range": most_conserved[1] - least_conserved[1],
        }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gqy20/genome-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server