Compute PRS
compute_prsCompute a polygenic risk score from a VCF file using a specified PGS score. Returns the score, match rate, variant counts, and trait information.
Instructions
Compute a polygenic risk score for one VCF against one PGS score.
Downloads the harmonized scoring file (cached) and scores the genotypes.
Pass genotypes_path to reuse a normalized Parquet from normalize_vcf
(avoids re-reading the VCF); otherwise the VCF is read directly. Returns
the score, match rate, variant counts, trait, and (when data permits) a
theoretical percentile.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| vcf_path | Yes | ||
| pgs_id | Yes | ||
| genome_build | No | ||
| genotypes_path | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pgs_id | Yes | PGS Catalog Score ID | |
| score | Yes | Computed polygenic risk score | |
| variants_matched | Yes | Number of scoring variants matched in VCF | |
| variants_total | Yes | Total number of variants in scoring file | |
| match_rate | Yes | Fraction of scoring variants matched (0-1) | |
| trait_reported | No | Reported trait for the score | |
| performance | No | Best available performance metric from PGS Catalog | |
| has_allele_frequencies | No | Whether the scoring file contained allelefrequency_effect data | |
| theoretical_mean | No | Theoretical population mean PRS computed from allele frequencies: sum(w_i * 2 * p_i) | |
| theoretical_std | No | Theoretical population SD of PRS: sqrt(sum(w_i^2 * 2 * p_i * (1-p_i))) | |
| percentile | No | Estimated population percentile (0-100) from theoretical distribution | |
| ancestry | No | Ancestry superpopulation used for percentile (AFR, AMR, EAS, EUR, SAS) | |
| percentile_method | No | Method used to compute percentile: 'reference_panel', 'theoretical', or 'auroc_approx' | |
| absolute_risk | No | Absolute disease risk estimate based on PRS z-score and prevalence data |