build_phylogenetic_profile
Construct phylogenetic profiles to analyze gene distribution across species, supporting evolutionary research and comparative genomics.
Instructions
系统发育图谱构建工具 - MCP接口包装
Args: gene_symbols: 基因符号列表 species_set: 物种集合(默认包含常用模式生物) include_domain_info: 是否包含结构域信息
Returns: 系统发育图谱数据
Examples: # 分析p53家族在脊椎动物中的分布 build_phylogenetic_profile(["TP53", "TP63", "TP73"], ["human", "mouse", "zebrafish"])
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| gene_symbols | Yes | ||
| species_set | No | ||
| include_domain_info | No |
Input Schema (JSON Schema)
{
"properties": {
"gene_symbols": {
"items": {
"type": "string"
},
"type": "array"
},
"include_domain_info": {
"default": true,
"type": "boolean"
},
"species_set": {
"default": null,
"items": {
"type": "string"
},
"type": "array"
}
},
"required": [
"gene_symbols"
],
"type": "object"
}
Implementation Reference
- src/genome_mcp/core/tools.py:550-583 (handler)MCP tool handler for 'build_phylogenetic_profile'. Validates parameters and calls the internal implementation from evolution_tools.py.@mcp.tool() async def build_phylogenetic_profile( gene_symbols: list[str], species_set: list[str] = None, include_domain_info: bool = True, ) -> PhylogeneticProfileResult: """ 系统发育图谱构建工具 - MCP接口包装 Args: gene_symbols: 基因符号列表 species_set: 物种集合(默认包含常用模式生物) include_domain_info: 是否包含结构域信息 Returns: 系统发育图谱数据 Examples: # 分析p53家族在脊椎动物中的分布 build_phylogenetic_profile(["TP53", "TP63", "TP73"], ["human", "mouse", "zebrafish"]) """ try: return await _build_phylogenetic_profile_internal( gene_symbols, species_set, include_domain_info, _query_executor ) except ValidationError as e: return format_simple_error( e, query=str(gene_symbols), operation="build_phylogenetic_profile" ) except Exception as e: return format_simple_error( e, query=str(gene_symbols), operation="build_phylogenetic_profile" )
- Core implementation that performs ortholog searches for each gene across species, builds presence/absence matrix, and generates evolutionary insights.async def build_phylogenetic_profile( gene_symbols: list[str], species_set: list[str] = None, include_domain_info: bool = True, query_executor: QueryExecutor = None, ) -> dict[str, Any]: """ 构建系统发育图谱 - 分析多个基因在指定物种集合中的分布 用于研究: - 基因家族进化 - 物种特异性基因丢失 - 功能保守性分析 - 比较基因组学研究 Args: gene_symbols: 基因符号列表 species_set: 物种集合(默认包含常用模式生物) include_domain_info: 是否包含结构域信息 query_executor: 查询执行器实例 Returns: 系统发育图谱数据,包含存在/缺失矩阵和进化分析 Examples: # 分析p53家族在脊椎动物中的分布 build_phylogenetic_profile(["TP53", "TP63", "TP73"], ["human", "mouse", "zebrafish"]) """ if query_executor is None: query_executor = QueryExecutor() if species_set is None: species_set = [ "human", "mouse", "rat", "zebrafish", "fruitfly", "worm", "yeast", ] try: # 批量分析基因 results = {} service_unavailable_count = 0 for gene_symbol in gene_symbols: # 分析每个基因的同源关系 gene_result = await analyze_gene_evolution( gene_symbol, species_set, include_sequence_info=include_domain_info, query_executor=query_executor, ) # 检查服务状态 if ( gene_result.get("error") and gene_result.get("status") == "service_unavailable" ): service_unavailable_count += 1 results[gene_symbol] = gene_result # 如果所有基因都查询失败,返回服务不可用 if service_unavailable_count == len(gene_symbols): return { "error": "系统发育分析服务不可用", "gene_symbols": gene_symbols, "status": "service_unavailable", "message": "Ensembl API服务不可用,无法构建系统发育图谱", "suggestions": ["稍后重试", "检查网络连接", "确认基因符号正确"], "alternative_resources": [ { "name": "Ensembl Web界面", "url": "https://www.ensembl.org/Homo_sapiens/Search", "description": "在Ensembl网站手动搜索基因", }, { "name": "NCBI Gene", "url": "https://www.ncbi.nlm.nih.gov/gene", "description": "NCBI基因数据库", }, ], } # 构建存在/缺失矩阵 presence_matrix = _build_presence_absence_matrix(results, species_set) # 分析基因家族进化 family_analysis = _analyze_gene_family_evolution(results, species_set) return { "gene_symbols": gene_symbols, "species_set": species_set, "presence_matrix": presence_matrix, "family_analysis": family_analysis, "individual_results": results, "summary": { "total_genes": len(gene_symbols), "total_species": len(species_set), "successful_queries": len(gene_symbols) - service_unavailable_count, "failed_queries": service_unavailable_count, "conservation_patterns": _identify_conservation_patterns( presence_matrix ), }, } except Exception as e: return { "error": str(e), "gene_symbols": gene_symbols, "error_type": "phylogenetic_profile_error", "suggestions": [ "检查基因符号列表是否正确", "确认物种列表格式正确", "减少基因数量后重试", ], "troubleshooting": { "gene_count": len(gene_symbols), "species_count": len(species_set) if species_set else 0, "possible_causes": [ "某些基因符号不存在", "网络连接问题", "Ensembl API限制", ], }, }
- src/genome_mcp/core/tools.py:550-583 (registration)The @mcp.tool() decorator registers this function as an MCP tool.@mcp.tool() async def build_phylogenetic_profile( gene_symbols: list[str], species_set: list[str] = None, include_domain_info: bool = True, ) -> PhylogeneticProfileResult: """ 系统发育图谱构建工具 - MCP接口包装 Args: gene_symbols: 基因符号列表 species_set: 物种集合(默认包含常用模式生物) include_domain_info: 是否包含结构域信息 Returns: 系统发育图谱数据 Examples: # 分析p53家族在脊椎动物中的分布 build_phylogenetic_profile(["TP53", "TP63", "TP73"], ["human", "mouse", "zebrafish"]) """ try: return await _build_phylogenetic_profile_internal( gene_symbols, species_set, include_domain_info, _query_executor ) except ValidationError as e: return format_simple_error( e, query=str(gene_symbols), operation="build_phylogenetic_profile" ) except Exception as e: return format_simple_error( e, query=str(gene_symbols), operation="build_phylogenetic_profile" )
- Helper function to construct the presence/absence matrix from ortholog results.def _build_presence_absence_matrix( results: dict, species_set: list[str] ) -> dict[str, dict]: """构建存在/缺失矩阵""" matrix = {} for gene_symbol, gene_result in results.items(): gene_row = {} orthologs_data = gene_result.get("result", {}).get("orthologs", []) present_species = set() # 标准化物种名称 for ortholog in orthologs_data: organism_name = ortholog.get("organism_name", "").lower() # 标准化物种名称(移除下划线,转换为小写) normalized_name = organism_name.replace("_", " ") present_species.add(normalized_name) for species in species_set: species_lower = species.lower() # 检查各种可能的物种名称格式 species_variants = [ species_lower, species_lower.replace(" ", "_"), species_lower.replace(" ", ""), ] gene_row[species] = any( variant in present_species or any(variant in present for present in present_species) for variant in species_variants ) matrix[gene_symbol] = gene_row return matrix
- src/genome_mcp/core/tools.py:551-556 (schema)Function signature and return type define the input/output schema for the tool.async def build_phylogenetic_profile( gene_symbols: list[str], species_set: list[str] = None, include_domain_info: bool = True, ) -> PhylogeneticProfileResult: """