Skip to main content
Glama
taehojo
by taehojo

predict_variant_effect

Analyze how genetic variants affect gene expression, splicing, transcription factor binding, and chromatin accessibility using AI-powered regulatory predictions.

Instructions

Predict the regulatory impact of a genetic variant using AlphaGenome AI.

Powered by Google DeepMind's AlphaGenome model for accurate regulatory predictions.

Analyzes how a single nucleotide change affects:

  • Gene expression (RNA-seq predictions)

  • Splicing patterns

  • Transcription factor binding

  • Chromatin accessibility

  • Histone modifications

Perfect for: variant interpretation, GWAS follow-up, clinical genomics research.

Example: "Analyze chr17:41234567A>T with AlphaGenome"

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
chromosomeYesChromosome (chr1-chr22, chrX, chrY)
positionYesGenomic position (1-based, positive integer)
refYesReference allele (A, T, G, or C)
altYesAlternate allele (A, T, G, or C)
output_typesNoOptional: specific analyses to run (default: all)
tissue_typeNoOptional: tissue context (UBERON term, e.g., "UBERON:0001157" for brain)

Implementation Reference

  • Defines the MCP Tool schema for 'predict_variant_effect' including name, description, and detailed inputSchema for variant parameters.
    export const PREDICT_VARIANT_TOOL: Tool = {
      name: 'predict_variant_effect',
      description: `Predict the regulatory impact of a genetic variant using AlphaGenome AI.
    
    Powered by Google DeepMind's AlphaGenome model for accurate regulatory predictions.
    
    Analyzes how a single nucleotide change affects:
    - Gene expression (RNA-seq predictions)
    - Splicing patterns
    - Transcription factor binding
    - Chromatin accessibility
    - Histone modifications
    
    Perfect for: variant interpretation, GWAS follow-up, clinical genomics research.
    
    Example: "Analyze chr17:41234567A>T with AlphaGenome"`,
      inputSchema: {
        type: 'object',
        properties: {
          chromosome: {
            type: 'string',
            description: 'Chromosome (chr1-chr22, chrX, chrY)',
            pattern: '^chr([1-9]|1[0-9]|2[0-2]|X|Y)$',
          },
          position: {
            type: 'number',
            description: 'Genomic position (1-based, positive integer)',
            minimum: 1,
          },
          ref: {
            type: 'string',
            description: 'Reference allele (A, T, G, or C)',
            pattern: '^[ATGCatgc]+$',
          },
          alt: {
            type: 'string',
            description: 'Alternate allele (A, T, G, or C)',
            pattern: '^[ATGCatgc]+$',
          },
          output_types: {
            type: 'array',
            items: {
              type: 'string',
              enum: [
                'rna_seq',
                'cage',
                'splice',
                'histone',
                'tf_binding',
                'dnase',
                'atac',
                'contact_map',
              ],
            },
            description: 'Optional: specific analyses to run (default: all)',
          },
          tissue_type: {
            type: 'string',
            description: 'Optional: tissue context (UBERON term, e.g., "UBERON:0001157" for brain)',
          },
        },
        required: ['chromosome', 'position', 'ref', 'alt'],
      },
    };
  • MCP server CallTool handler specifically for 'predict_variant_effect': validates input, calls AlphaGenomeClient.predictVariant(), formats result with formatVariantResult, and returns formatted text content.
    case 'predict_variant_effect': {
      // Validate input
      const params = validateInput(variantPredictionSchema, args) as VariantPredictionParams;
    
      // Call AlphaGenome API
      const result = await getClient().predictVariant(params);
    
      // Format output
      const formatted = formatVariantResult(result);
    
      return {
        content: [
          {
            type: 'text',
            text: formatted,
          },
        ],
      };
    }
  • src/index.ts:99-101 (registration)
    Registers the listTools handler which returns ALL_TOOLS array including PREDICT_VARIANT_TOOL.
    server.setRequestHandler(ListToolsRequestSchema, async () => {
      return { tools: ALL_TOOLS };
    });
  • AlphaGenomeClient.predictVariant method: bridges to Python subprocess with action 'predict_variant', maps TS params to Python format, handles errors.
    async predictVariant(params: VariantPredictionParams): Promise<VariantResult> {
      try {
        const result = await this.callPythonBridge<VariantResult>('predict_variant', {
          chromosome: params.chromosome,
          position: params.position,
          reference_bases: params.ref,
          alternate_bases: params.alt,
          output_types: params.output_types,
          tissue_type: params.tissue_type || 'brain',
        });
    
        return result;
      } catch (error) {
        if (error instanceof ApiError) {
          throw error;
        }
        throw new ApiError(`Variant prediction failed: ${error}`, 500);
      }
    }
  • Core implementation in Python bridge: calls AlphaGenome API predict_variant, processes outputs for multiple modalities (RNA-seq, splicing, TF binding), computes impact levels and returns structured result matching TS types.
    def predict_variant_effect(client, params: Dict[str, Any]) -> Dict[str, Any]:
        """
        Predict the regulatory impact of a genetic variant.
    
        Args:
            client: AlphaGenome client instance
            params: Dictionary with chromosome, position, reference_bases, alternate_bases, etc.
    
        Returns:
            Dictionary with variant predictions matching TypeScript VariantResult type
        """
        # Extract parameters
        chromosome = params.get('chromosome')
        position = params.get('position')
        ref_bases = params.get('reference_bases', params.get('ref'))
        alt_bases = params.get('alternate_bases', params.get('alt'))
        tissue_type = params.get('tissue_type', 'brain')
        output_types = params.get('output_types', ALL_MODALITIES)
    
        # Map tissue type to ontology term
        ontology_term = TISSUE_ONTOLOGY_MAP.get(tissue_type.lower(), "UBERON:0000955")
    
        # Create variant object
        variant = genome.Variant(
            chromosome=chromosome,
            position=position,
            reference_bases=ref_bases,
            alternate_bases=alt_bases
        )
    
        # Create interval (resize to standard 1MB size like reference implementation)
        interval = variant.reference_interval.resize(dna_client.SEQUENCE_LENGTH_1MB)
    
        # Call AlphaGenome API
        outputs = client.predict_variant(
            interval=interval,
            variant=variant,
            ontology_terms=[ontology_term],
            requested_outputs=output_types if isinstance(output_types, list) else ALL_MODALITIES
        )
    
        # Process predictions for each modality
        predictions = {}
    
        # RNA-seq effect
        if hasattr(outputs, 'alternate') and hasattr(outputs, 'reference'):
            if outputs.alternate.rna_seq and outputs.reference.rna_seq:
                rna_effect = safe_max_effect(
                    outputs.alternate.rna_seq.values,
                    outputs.reference.rna_seq.values,
                    'RNA_SEQ'
                )
                rna_fc = calculate_fold_change(
                    outputs.alternate.rna_seq.values,
                    outputs.reference.rna_seq.values
                )
    
                ref_mean = float(np.mean(outputs.reference.rna_seq.values))
                alt_mean = float(np.mean(outputs.alternate.rna_seq.values))
    
                predictions['rna_seq'] = {
                    'reference_score': ref_mean,
                    'alternate_score': alt_mean,
                    'fold_change': rna_fc,
                    'confidence': 0.85  # Placeholder - would need actual confidence from model
                }
    
            # Splice site analysis
            if outputs.alternate.splice_sites and outputs.reference.splice_sites:
                splice_effect = safe_max_effect(
                    outputs.alternate.splice_sites.values,
                    outputs.reference.splice_sites.values,
                    'SPLICE_SITES'
                )
    
                ref_mean = float(np.mean(outputs.reference.splice_sites.values))
                alt_mean = float(np.mean(outputs.alternate.splice_sites.values))
    
                predictions['splice'] = {
                    'reference_score': ref_mean,
                    'alternate_score': alt_mean,
                    'delta': splice_effect,
                    'consequence': 'splice_site_disruption' if splice_effect > 0.2 else 'minimal_impact'
                }
    
            # Transcription factor binding
            tf_binding = []
            if outputs.alternate.chip_tf and outputs.reference.chip_tf:
                tf_effect = safe_max_effect(
                    outputs.alternate.chip_tf.values,
                    outputs.reference.chip_tf.values,
                    'CHIP_TF'
                )
    
                if tf_effect > 0.1:  # Significant TF binding change
                    ref_mean = float(np.mean(outputs.reference.chip_tf.values))
                    alt_mean = float(np.mean(outputs.alternate.chip_tf.values))
    
                    tf_binding.append({
                        'factor': 'TF_Binding',  # Would need metadata for specific TF names
                        'ref_score': ref_mean,
                        'alt_score': alt_mean,
                        'change': tf_effect
                    })
    
            if tf_binding:
                predictions['tf_binding'] = tf_binding
    
        # Determine impact level
        max_effect = max([
            predictions.get('rna_seq', {}).get('fold_change', 0),
            predictions.get('splice', {}).get('delta', 0),
            max([tf.get('change', 0) for tf in predictions.get('tf_binding', [])], default=0)
        ], default=0)
    
        if abs(max_effect) > 0.5:
            impact_level = 'high'
            clinical_sig = 'likely_pathogenic'
        elif abs(max_effect) > 0.2:
            impact_level = 'moderate'
            clinical_sig = 'uncertain_significance'
        else:
            impact_level = 'low'
            clinical_sig = 'likely_benign'
    
        # Build interpretation
        interpretation = {
            'impact_level': impact_level,
            'clinical_significance': clinical_sig,
            'recommendations': [
                'Further functional validation recommended' if impact_level == 'high' else 'Standard clinical follow-up',
                f'Tissue type: {tissue_type}',
                f'Ontology term: {ontology_term}'
            ]
        }
    
        # Return structured result matching TypeScript VariantResult interface
        return {
            'variant': f"{chromosome}:{position}{ref_bases}>{alt_bases}",
            'gene_context': None,  # Would need gene annotation data
            'predictions': predictions,
            'interpretation': interpretation
        }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It explains what the tool does (predicts regulatory impact) and lists specific analyses, but doesn't disclose behavioral traits like computational requirements, runtime, accuracy limitations, data sources, or error conditions. The description adds value but doesn't provide comprehensive behavioral context for a complex AI prediction tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with the core purpose, explains the technology, lists specific analyses, provides usage context, and includes an example. Most sentences earn their place, though the 'Powered by Google DeepMind's AlphaGenome model' line could be integrated more seamlessly with the opening sentence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex AI prediction tool with 6 parameters and no output schema or annotations, the description provides good purpose and usage context but lacks details about output format, limitations, or behavioral characteristics. The example helps, but more completeness would be needed for a higher score given the tool's complexity and lack of structured output documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 6 parameters thoroughly. The description mentions 'single nucleotide change' which aligns with the ref/alt parameters, and lists analysis types that map to output_types enum values, but doesn't add significant meaning beyond what's in the schema descriptions. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Predict the regulatory impact of a genetic variant using AlphaGenome AI' with specific analyses listed (gene expression, splicing, TF binding, etc.). It distinguishes from siblings by focusing on single nucleotide variant prediction using a specific AI model (AlphaGenome), unlike tools like 'analyze_gwas_locus' or 'batch_score_variants'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use it: 'Perfect for: variant interpretation, GWAS follow-up, clinical genomics research' and includes an example. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools, which would be needed for a score of 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/taehojo/alphagenome-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server