Normalize VCF
normalize_vcfNormalize a VCF to a quality-filtered genotype Parquet for reuse in PRS computation. Strips chr prefix, renames id to rsid, parses GT, applies quality filters, and writes compressed Parquet.
Instructions
Normalize a VCF to a quality-filtered genotype Parquet (background task).
Strips the chr prefix, renames id→rsid, computes genotype from GT, applies
optional quality filters (FILTER allow-list, min DP, min QUAL), and writes
zstd-compressed Parquet. The output is a drop-in genotype source for
compute_prs / compute_prs_batch (pass it as genotypes_path),
so a VCF is normalized once and reused across many scores.
Runs as a real MCP background task: the client gets a task id immediately and polls for the result. Normalization is the slow step (seconds to minutes depending on VCF size).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| vcf_path | Yes | ||
| output_path | No | ||
| pass_filters | No | ||
| min_depth | No | ||
| min_qual | No | ||
| sex | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| output_path | Yes | Path to the normalized Parquet file. | |
| n_variants | Yes | Number of variant rows written. | |
| message | Yes | Human-readable summary. |