Skip to main content
Glama

kegg_pathway_enrichment

Analyze gene lists to identify significantly enriched KEGG pathways, revealing biological processes and functions associated with your genes.

Instructions

KEGG通路富集分析工具 - MVP版本

分析基因列表在KEGG通路中的富集情况,识别显著相关的生物学通路

Args: gene_list: 基因列表(如 ["TP53", "BRCA1", "BRCA2"]) organism: 生物体代码(默认 "hsa" 人类) pvalue_threshold: p值显著性阈值(默认 0.05) min_gene_count: 通路中最小基因数量(默认 2)

Returns: 通路富集分析结果,包含: - 显著富集的通路列表 - p值和FDR校正后的统计显著性 - 富集倍数和基因数量信息 - 分析参数和元数据

Examples: # 分析癌症相关基因的通路富集 kegg_pathway_enrichment(["TP53", "BRCA1", "BRCA2", "EGFR"])

# 分析小鼠基因的通路富集
kegg_pathway_enrichment(["Trp53", "Brca1"], organism="mmu")

# 使用更严格的显著性阈值
kegg_pathway_enrichment(["TP53", "BRCA1"], pvalue_threshold=0.01)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
gene_listYes
organismNohsa
pvalue_thresholdNo
min_gene_countNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
query_infoYes
query_genesYes
analysis_metadataYes
enriched_pathwaysYes

Implementation Reference

  • The main async handler function `kegg_pathway_enrichment` for the MCP tool, decorated with `@mcp.tool()` for automatic registration. Implements validation, query parsing using QueryParser, execution via QueryExecutor, result formatting, and error handling.
    @mcp.tool()
    async def kegg_pathway_enrichment(
        gene_list: list[str],
        organism: str = "hsa",
        pvalue_threshold: float = 0.05,
        min_gene_count: int = 2,
    ) -> KEGGResult:
        """
        KEGG通路富集分析工具 - MVP版本
    
        分析基因列表在KEGG通路中的富集情况,识别显著相关的生物学通路
    
        Args:
            gene_list: 基因列表(如 ["TP53", "BRCA1", "BRCA2"])
            organism: 生物体代码(默认 "hsa" 人类)
            pvalue_threshold: p值显著性阈值(默认 0.05)
            min_gene_count: 通路中最小基因数量(默认 2)
    
        Returns:
            通路富集分析结果,包含:
            - 显著富集的通路列表
            - p值和FDR校正后的统计显著性
            - 富集倍数和基因数量信息
            - 分析参数和元数据
    
        Examples:
            # 分析癌症相关基因的通路富集
            kegg_pathway_enrichment(["TP53", "BRCA1", "BRCA2", "EGFR"])
    
            # 分析小鼠基因的通路富集
            kegg_pathway_enrichment(["Trp53", "Brca1"], organism="mmu")
    
            # 使用更严格的显著性阈值
            kegg_pathway_enrichment(["TP53", "BRCA1"], pvalue_threshold=0.01)
        """
        try:
            # 验证KEGG分析参数
            (
                validated_gene_list,
                validated_organism,
                validated_pvalue_threshold,
                validated_min_gene_count,
            ) = validate_kegg_params(
                gene_list=gene_list,
                organism=organism,
                pvalue_threshold=pvalue_threshold,
                min_gene_count=min_gene_count,
            )
    
            # 使用QueryParser解析为通路富集查询
            parsed = QueryParser.parse(
                validated_gene_list, query_type="pathway_enrichment"
            )
    
            # 更新参数
            parsed.params.update(
                {
                    "gene_list": validated_gene_list,
                    "organism": validated_organism,
                    "pvalue_threshold": validated_pvalue_threshold,
                    "min_gene_count": validated_min_gene_count,
                }
            )
    
            # 执行查询
            result = await _query_executor.execute(parsed)
    
            # 格式化结果
            if "result" in result:
                enrichment_data = result["result"]
    
                # 添加查询信息
                enrichment_data["query_info"] = {
                    "gene_list": validated_gene_list,
                    "analysis_date": "2025-10-24",
                    "organism": validated_organism,
                    "method": "KEGG Pathway Enrichment",
                    "parameters": {
                        "pvalue_threshold": validated_pvalue_threshold,
                        "min_gene_count": validated_min_gene_count,
                    },
                }
    
                return enrichment_data
            elif "error" in result:
                return {
                    "error": result["error"],
                    "query_genes": gene_list,
                    "organism": organism,
                    "suggestions": [
                        "检查基因ID格式是否正确",
                        "确认生物体代码是否支持",
                        "验证网络连接是否正常",
                    ],
                }
            else:
                return {
                    "error": "Unknown error occurred during pathway enrichment analysis",
                    "query_genes": gene_list,
                    "organism": organism,
                }
    
        except ValidationError as e:
            return format_simple_error(
                e, query=str(gene_list), operation="kegg_pathway_enrichment"
            )
        except Exception as e:
            return format_simple_error(
                e, query=str(gene_list), operation="kegg_pathway_enrichment"
            )
  • `validate_kegg_params` function defining and enforcing the input schema: gene_list (list[str]), organism (str, KEGG codes), pvalue_threshold (float), min_gene_count (int). Filters and normalizes inputs, raises ValidationError on invalid data.
    def validate_kegg_params(
        gene_list: list,
        organism: str = "hsa",
        pvalue_threshold: float = 0.05,
        min_gene_count: int = 2,
    ) -> tuple[list, str, float, int]:
        """
        验证KEGG通路分析参数
    
        Args:
            gene_list: 基因列表
            organism: 生物体代码
            pvalue_threshold: p值阈值
            min_gene_count: 最小基因数量
    
        Returns:
            验证后的参数元组
    
        Raises:
            ValidationError: 基因列表验证失败
        """
        if not gene_list or not isinstance(gene_list, list):
            raise ValidationError("Gene list must be a non-empty list")
    
        # 过滤有效的基因符号
        valid_genes = []
        for gene in gene_list:
            if isinstance(gene, str) and gene.strip():
                valid_genes.append(gene.strip())
    
        if not valid_genes:
            raise ValidationError("No valid gene symbols found in the list")
    
        # 验证organism
        valid_organisms = {"hsa", "mmu", "rno", "dre", "cel", "scf"}
        if organism not in valid_organisms:
            organism = "hsa"
    
        # 验证pvalue_threshold
        try:
            pvalue_threshold = float(pvalue_threshold)
            if pvalue_threshold <= 0 or pvalue_threshold >= 1:
                pvalue_threshold = 0.05
        except (ValueError, TypeError):
            pvalue_threshold = 0.05
    
        # 验证min_gene_count
        try:
            min_gene_count = int(min_gene_count)
            if min_gene_count < 1:
                min_gene_count = 1
            elif min_gene_count > 10:
                min_gene_count = 10
        except (ValueError, TypeError):
            min_gene_count = 2
    
        return valid_genes, organism, pvalue_threshold, min_gene_count
  • `KEGGResult` TypedDict defining the output schema: query_genes (list[str]), enriched_pathways (list[dict]), analysis_metadata (dict), query_info (dict).
    class KEGGResult(TypedDict):
        """KEGG通路富集分析结果类型"""
    
        query_genes: list[str]
        enriched_pathways: list[dict[str, Any]]
        analysis_metadata: dict[str, Any]
        query_info: dict[str, Any]
  • Invocation of `create_mcp_tools(mcp)` which registers all tools including kegg_pathway_enrichment via their @mcp.tool() decorators.
    create_mcp_tools(mcp)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does well by describing what the tool returns (enrichment results with statistical significance, fold enrichment, gene counts, metadata) and mentions it's an 'MVP version' which implies potential limitations. However, it doesn't disclose important behavioral aspects like computational requirements, potential rate limits, error conditions, or whether this is a read-only vs. write operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with a clear purpose statement, then provides parameter details in a labeled Args section, followed by Returns information, and concludes with practical Examples. Every section earns its place by adding specific value, with no redundant or unnecessary content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (statistical enrichment analysis with 4 parameters) and the presence of an output schema (which handles return value documentation), the description is quite complete. It covers purpose, parameters, returns, and examples. The main gap is the lack of behavioral context about computational requirements or limitations, which would be helpful for a statistical analysis tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed parameter explanations in the Args section. Each of the 4 parameters gets clear semantic meaning: gene_list is explained with examples, organism with default and context, pvalue_threshold with statistical significance context, and min_gene_count with pathway size requirements. The description adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'KEGG pathway enrichment analysis' on gene lists to identify significantly enriched biological pathways. It specifies the exact function (analyzing gene lists in KEGG pathways) and distinguishes it from sibling tools like 'analyze_gene_evolution' or 'build_phylogenetic_profile' by focusing specifically on pathway enrichment rather than evolutionary analysis or phylogenetic profiling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (analyzing gene lists for pathway enrichment) and includes helpful examples showing different use cases. However, it doesn't explicitly state when NOT to use this tool or mention specific alternatives among the sibling tools, though the examples implicitly guide usage through parameter variations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gqy20/genome-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server