import_dartseq
Import DArTseq SNP and Silico-DArT xlsx reports into Gigwa by converting them to VCF and uploading to a database, project, and run. Optionally anchor markers to a reference genome or reuse precomputed mapping positions.
Instructions
Import DArTseq data from xlsx report(s) into Gigwa.
Converts the DArTseq SNP and/or Silico-DArT xlsx report(s) to a standard VCF —
doing the 2-row genotype calling in Python (so reference homozygotes are not
mis-imported as heterozygous, as Gigwa's built-in DArT parser does) — and
uploads it to create/append a database (module), project and run.
Provide at least one of snp_xlsx / silico_xlsx (absolute paths). SNP
and Silico use different allele models; importing both into the same run is
unusual — prefer separate runs unless you specifically intend to combine them.
If reference_fasta is given (a reference genome FASTA or a prebuilt
minimap2 .mmi index — an .mmi is loaded directly with no re-indexing,
preferred for large genomes), the SNP markers' tag sequences are aligned to it
and uniquely-mapped markers (mapq ≥ min_mapq) are imported genome-anchored
(real chromosome/position); the rest stay on an Unmapped contig. Without it,
all markers go on Unmapped.
positions_csv reuses a mapping already produced by
map_dartseq_to_reference (its dartseq_positions.csv) instead of
re-aligning — much faster when you've already inspected the mapping. Provide
either reference_fasta or positions_csv, not both.
Set clear_project_data=True to replace any existing data in the project,
skip_monomorphic=True to drop non-variant markers, and wait=False to
return immediately with a progress token instead of blocking until done.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| run | Yes | Target run name within the project. | |
| wait | No | Block until the import finishes (True) or return a progress token immediately (False). | |
| module | Yes | Target Gigwa database (module) name. | |
| ploidy | No | Sample ploidy (default 2). | |
| project | Yes | Target project name within the database. | |
| min_mapq | No | Minimum mapping quality for a tag to count as uniquely mapped. | |
| snp_xlsx | No | Path to a DArTseq SNP xlsx report. | |
| technology | No | Free-text genotyping technology label (e.g. 'DArTseq', 'WGS', 'GBS'). | DArTseq |
| silico_xlsx | No | Path to a Silico-DArT xlsx report. | |
| positions_csv | No | Path to a dartseq_positions.csv (from map_dartseq_to_reference) to reuse instead of re-aligning. | |
| reference_fasta | No | Path to a reference genome FASTA or a prebuilt minimap2 .mmi index, for genome-anchoring. | |
| skip_monomorphic | No | Drop non-variant (monomorphic) markers during import. | |
| clear_project_data | No | Replace any existing data in the project before importing. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |