Skip to main content
Glama
winternewt
by winternewt

Normalize VCF

normalize_vcf
Idempotent

Normalize a VCF to a quality-filtered genotype Parquet for reuse in PRS computation. Strips chr prefix, renames id to rsid, parses GT, applies quality filters, and writes compressed Parquet.

Instructions

Normalize a VCF to a quality-filtered genotype Parquet (background task).

Strips the chr prefix, renames id→rsid, computes genotype from GT, applies optional quality filters (FILTER allow-list, min DP, min QUAL), and writes zstd-compressed Parquet. The output is a drop-in genotype source for compute_prs / compute_prs_batch (pass it as genotypes_path), so a VCF is normalized once and reused across many scores.

Runs as a real MCP background task: the client gets a task id immediately and polls for the result. Normalization is the slow step (seconds to minutes depending on VCF size).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
vcf_pathYes
output_pathNo
pass_filtersNo
min_depthNo
min_qualNo
sexNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
output_pathYesPath to the normalized Parquet file.
n_variantsYesNumber of variant rows written.
messageYesHuman-readable summary.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotentHint, readOnlyHint), the description adds that it runs as a background task, is slow (seconds to minutes), writes zstd-compressed Parquet, and details the transformations (chr stripping, renaming). This provides full behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is about 10 lines, well-structured with clear sections, and front-loaded with the main purpose. It is slightly verbose but every sentence adds value. Could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description appropriately focuses on input and workflow. It explains the background task nature and reuse across PRS scores. Missing details like error handling, but overall complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It explains three of six parameters (pass_filters, min_depth, min_qual) as 'optional quality filters', and implies output_path via 'writes Parquet', but sex and vcf_path are not described. Partial compensation leaves room for improvement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the exact actions (strip chr prefix, rename id, compute genotype, apply filters, write Parquet) and clearly distinguishes the tool's purpose from siblings by stating it outputs a drop-in genotype source for compute_prs tools. It uses specific verbs and resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states the tool is meant to be used once per VCF before running compute_prs/compute_prs_batch, and explains the background task behavior. It could be stronger by mentioning when not to use it (e.g., if VCF is already normalized), but current guidance is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/winternewt/just-prs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server