diversity_pca
Run principal component analysis on genotype dosage data to evaluate population structure, detect outliers, and generate PCA coordinates with variance explained.
Instructions
Principal component analysis of population structure.
Runs PCA on the alt-allele dosage matrix (monomorphic markers dropped, missing
mean-imputed, Patterson scaling). Writes pca_coords.csv (per-sample PC
coordinates) and reports variance explained plus any PC1/PC2 outlier samples
(beyond outlier_sd SD). Pass metadata_tsv + group_column to add a
group column (population label per sample) for colouring the PC plot. For large
sets pass method="allelematrix" + max_markers to avoid a full VCF export.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| method | No | Genotype source: 'vcf' (full export, cached) or 'allelematrix' (paged, server-side subset). | vcf |
| region | No | Restrict analysis to a genomic window: 'chrom' or 'chrom:start-end' (1-based). | |
| id_column | No | Column in the metadata TSV holding the individual/accession id (default 'individual'). | individual |
| outlier_sd | No | Flag points more than this many standard deviations from the mean. | |
| output_dir | No | Directory for the output CSV(s) (default ./gigwa_results/<module>/). | |
| max_markers | No | Cap the number of markers analysed (evenly-spaced subsample); omit to use all. | |
| group_column | No | Column in the metadata TSV holding the group/population label. | |
| metadata_tsv | No | Path to a metadata TSV (import_metadata format) used to define groups. | |
| n_components | No | Number of principal components to compute. | |
| variant_set_db_id | Yes | BrAPI variantSetDbId identifying the run (MODULE§project§run); from list_variant_sets / list_content. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |