Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
GIGWA_URLNoThe base URL of the Gigwa server (without /rest suffix).https://gigwa.icarda.org:8443/gigwa/
GIGWA_PASSNoThe password for authentication.
GIGWA_USERNoThe username for authentication.
GIGWA_TIMEOUTNoRead/request timeout in seconds.120
GIGWA_URL_OTHERNoThe base URL of the "other" Gigwa server.https://gigwa.icarda.org:8443/gigwa/
GIGWA_PASS_OTHERNoThe password for authentication.
GIGWA_USER_OTHERNoThe username for authentication.
GIGWA_CONNECT_TIMEOUTNoTCP connect timeout in seconds.10

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
gigwa_connectA

Switch the active Gigwa server at runtime — no restart needed.

Re-points every subsequent tool (and the gigwa:// resources) at url for the rest of the session. Credentials are never passed through the chat: they are resolved from the environment — a named profile reads GIGWA_USER_/GIGWA_PASS_; anonymous=True sends none. With neither, the default GIGWA_USER/GIGWA_PASS are used only when reconnecting to the configured GIGWA_URL — switching to any other server without a profile connects anonymously, so your home credentials are never transmitted to a different host unasked. The new connection is verified with a live round-trip before this returns; on failure the previous connection is restored.

gigwa_server_infoA

Check connectivity to the configured Gigwa server.

Generates an auth token with the configured credentials and reports the server URL and (best-effort) version. Use this first to confirm the connection works before importing data.

list_contentA

List the databases, projects and runs currently hosted on the Gigwa server.

import_dartseqA

Import DArTseq data from xlsx report(s) into Gigwa.

Converts the DArTseq SNP and/or Silico-DArT xlsx report(s) to a standard VCF — doing the 2-row genotype calling in Python (so reference homozygotes are not mis-imported as heterozygous, as Gigwa's built-in DArT parser does) — and uploads it to create/append a database (module), project and run.

Provide at least one of snp_xlsx / silico_xlsx (absolute paths). SNP and Silico use different allele models; importing both into the same run is unusual — prefer separate runs unless you specifically intend to combine them.

If reference_fasta is given (a reference genome FASTA or a prebuilt minimap2 .mmi index — an .mmi is loaded directly with no re-indexing, preferred for large genomes), the SNP markers' tag sequences are aligned to it and uniquely-mapped markers (mapq ≥ min_mapq) are imported genome-anchored (real chromosome/position); the rest stay on an Unmapped contig. Without it, all markers go on Unmapped.

positions_csv reuses a mapping already produced by map_dartseq_to_reference (its dartseq_positions.csv) instead of re-aligning — much faster when you've already inspected the mapping. Provide either reference_fasta or positions_csv, not both.

Set clear_project_data=True to replace any existing data in the project, skip_monomorphic=True to drop non-variant markers, and wait=False to return immediately with a progress token instead of blocking until done.

import_vcfA

Import a VCF file (.vcf or .vcf.gz) into Gigwa.

Uploads the VCF to create/append a database (module), project and run. technology is optional free-text (e.g. 'WGS', 'GBS'). Use clear_project_data=True to replace existing project data and wait=False to return a progress token instead of blocking.

map_dartseq_to_referenceA

Guess genomic positions for DArTseq SNP markers by aligning their tag sequences.

Aligns each marker's ~69 bp AlleleSequence tag to reference_fasta (a reference genome FASTA, or a prebuilt minimap2 .mmi index) and reports the inferred chromosome, position and strand of each SNP. Writes dartseq_positions.csv (allele_id, chrom, pos, strand, mapq, ref, alt, status). The result can be passed to import_dartseq (reference_fasta=) to import the data genome-anchored instead of on an Unmapped contig.

backend: "auto" uses the minimap2 CLI when available (streams over multi-part indexes → bounded RAM, best for large multi-gigabase genomes), falling back to the in-process mappy binding. Markers are classified unique (mapq ≥ min_mapq), multi (ambiguous), or unmapped.

get_import_progressA

Report the current status of a running import, given its progress token.

abort_importA

Abort a running import (or other long process), given its progress token.

Asks Gigwa to cancel the process identified by progress_token (the token returned by import_dartseq / import_vcf when run with wait=False). Returns whether the abort request was accepted; poll get_import_progress afterwards to confirm it stopped.

validate_metadataA

Validate an individual-metadata file against a Gigwa database without importing.

metadata_type is the name of the ID column in the file that links rows to genotype entities — for individual metadata this is the individual column (the header must match exactly, case-sensitive). tsv_path is a TSV whose first column header equals metadata_type.

import_metadataA

Import individual metadata (per-individual attributes) into an existing Gigwa database.

The file is a TSV whose first column header equals metadata_type (individual for individual metadata) and whose values match the individual/sample names already present in the database. Remaining columns become searchable attributes. By default the file is validated first; set validate_first=False to skip that check.

get_germplasm_metadataA

Fetch server-stored per-individual metadata (germplasm attributes) for a database.

Reads the attributes already stored in Gigwa (imported earlier via import_metadata or a BrAPI source) for the module of variant_set_db_id, via BrAPI germplasm. Writes germplasm_metadata.csv (one row per accession, attribute columns) that can be fed back to the grouping tools (diversity_fst / diversity_by_group via metadata_tsv). Returns an empty-result note when the Gigwa build does not expose germplasm attributes (some 2.12 builds do not).

qc_call_rateA

Per-sample and per-marker call rate (missingness) QC for a variant set.

Flags samples/markers below the given thresholds. Writes call_rate_samples.csv and call_rate_markers.csv and returns a summary with the overall call rate and the worst offenders. variant_set_db_id is a BrAPI variantSetDbId (from list_content / BrAPI variantsets). For large production sets pass method="allelematrix" with max_markers (e.g. 20000) to estimate from a server-side marker subset instead of a full VCF export. region ("chrom" or "chrom:start-end", 1-based; from list_sequences) restricts the analysis to one genomic window — available on every QC/diversity tool.

qc_heterozygosityA

Per-sample observed heterozygosity QC, flagging outliers.

High Ho relative to the cohort suggests contamination or off-types; very low Ho suggests selfed/inbred or duplicated material. Flags samples more than outlier_sd standard deviations from the mean. Writes heterozygosity_samples.csv. For large sets pass method="allelematrix" + max_markers to avoid a full VCF export.

qc_duplicate_accessionsA

Detect duplicate / clonal accessions via pairwise identity-by-state (IBS).

Computes IBS allele-sharing similarity between every pair of samples and groups pairs at or above similarity_threshold into duplicate sets — the core genebank "cleaning" check for mislabelled duplicates and clones. By default subsamples to max_markers evenly-spaced markers for speed (set to 0/None to use all). Writes duplicate_pairs.csv and duplicate_groups.csv. For large sets pass method="allelematrix" to fetch the marker subset without a full export.

qc_maf_filterA

Report markers that would be filtered by MAF / missingness (no changes applied).

Computes per-marker minor-allele frequency and missing rate, and counts how many markers are monomorphic, below maf_threshold, or above max_missing missing. Writes marker_filter_stats.csv. This is a report only — it does not modify Gigwa. For large sets pass method="allelematrix" + max_markers to sample server-side.

diversity_summaryA

Per-marker diversity statistics (MAF, He, Ho, PIC) and dataset means.

He is Nei's gene diversity (1 - Σpᵢ²), Ho is observed heterozygosity, PIC is polymorphism information content. Writes diversity_markers.csv. For large sets pass method="allelematrix" + max_markers to sample server-side.

diversity_pcaA

Principal component analysis of population structure.

Runs PCA on the alt-allele dosage matrix (monomorphic markers dropped, missing mean-imputed, Patterson scaling). Writes pca_coords.csv (per-sample PC coordinates) and reports variance explained plus any PC1/PC2 outlier samples (beyond outlier_sd SD). Pass metadata_tsv + group_column to add a group column (population label per sample) for colouring the PC plot. For large sets pass method="allelematrix" + max_markers to avoid a full VCF export.

diversity_kinshipA

VanRaden genomic relationship (kinship) matrix.

Computes G = ZZ'/(2 Σp(1-p)) from alt dosage. Writes the full matrix as kinship_matrix.csv (samples × samples) and reports the most-related pairs and the diagonal (self-relationship / inbreeding) range. For large sets pass method="allelematrix" + max_markers to avoid a full VCF export.

diversity_fstB

Pairwise Weir & Cockerham Fst between groups of samples.

Define the groups one of two ways:

  • groups_json — a JSON object mapping each group name to a list of accession names (or callset ids), e.g. {"north": ["112","156"], "south": ["11","42"]}.

  • metadata_tsv + group_column — read groups from a metadata TSV (the same file format used by import_metadata), keyed on id_column (default individual) and grouped by group_column.

Writes fst_pairwise.csv with the Fst for every group pair. (Server-side BrAPI attributes are not used for grouping — that endpoint is unavailable on the target Gigwa 2.12 build.)

diversity_by_groupA

Per-population diversity: He, Ho, Fis, MAF, % polymorphic, allelic richness.

Define groups the same way as diversity_fst — either groups_json {group: [names]} or metadata_tsv + group_column. For each group computes n, % polymorphic markers, mean MAF, Nei's He, observed Ho, Fis (1−Ho/He), mean observed allelic richness, and rarefied allelic richness (rarefied to the smallest group's gene-copy count so unequal group sizes are comparable). Writes diversity_by_group.csv.

diversity_core_collectionA

Select a core collection that maximises captured allelic diversity.

Greedy allele-coverage selection (Core-Hunter style): repeatedly add the accession that contributes the most not-yet-captured marker-alleles. Pick the core size directly, or as fraction of all accessions (default 10%). Writes core_collection.csv (rank, accession, cumulative allele coverage) and reports the fraction of total allelic diversity the core captures.

diversity_structureA

Lightweight population-structure clustering (PCA + K-means, in-Python).

Reduces the alt-dosage matrix with PCA (Patterson scaling), then runs K-means for K in k_min..k_max and picks the K with the highest pseudo-F (Calinski-Harabasz) between/within variance ratio — a clear maximum when groups are well separated. Writes structure_clusters.csv (sample, assigned cluster at the best K, PC coords) and reports the chosen K with cluster sizes. (No external ADMIXTURE binary — computed entirely in Python, consistent with the rest of the analysis layer.)

diversity_treeA

UPGMA dendrogram of accessions from IBS allele-sharing distance (Newick).

Builds a pairwise IBS similarity matrix, converts to distance (1 − IBS), and writes a UPGMA tree as tree.nwk (standard Newick, loadable in FigTree / iTOL / ape). Marker subsampling (max_markers) keeps it tractable on large sets.

audit_import_qualityA

Scan a Gigwa instance for databases imported with genotype-encoding artifacts.

With no variant_set_db_id this audits every run on the instance; pass one to audit a single variant set. For each run it pulls a bounded genotype sample (up to max_markers markers × max_samples callsets) via paged BrAPI search/allelematrix — cheap and constant-cost regardless of how large the variant set is, so it is safe to run across a whole production instance without exporting multi-GB VCFs. The aggregate genotype-class fractions it needs are estimated tightly from the sample (a true zero hom-alt class stays zero; a rare-but-real one shows up). It flags two import failure modes plus two weaker signals:

  • BROKEN — cohort mean Ho above het_threshold (DArT 2-row mis-call), or homozygous-alt genotypes far below their HWE expectation given the alt-allele frequency (lost hom-alt class; the HWE test avoids false positives on low-MAF / mostly-monomorphic panels where near-zero hom-alt is genuine).

  • SUSPECT — call rate above complete_call_rate (no missing data, often missing forced to 0/0), monomorphic fraction above monomorphic_threshold, or AD/DP depth fields present but uniformly zero (a VCF synthesised from genotype calls with fabricated depth/likelihoods — the same converter often miscalls GT too).

Writes import_quality_scan.csv (one row per run) under output_dir (default ./gigwa_results/) and returns a summary ranked worst-first. Read-only — it never modifies Gigwa.

count_variantsA

Count variants matching filters, computed server-side (nothing is downloaded).

Fast way to size a query before pulling data. Filter by genomic region (reference_name + optional start/end, from list_sequences), minor- allele frequency (min_maf/max_maf) and/or max_missing_data (0–1 fraction). With no filters this returns the total variant count of the set. variant_set_db_id is a BrAPI variantSetDbId (from list_variant_sets / list_content).

search_variantsA

Search variants matching filters server-side and write the matching list to CSV.

Same filters as count_variants (region / MAF / missing-data). Returns variant metadata only (id, chrom, pos, ref, alt) — no genotypes are fetched — and writes variant_search.csv. Use count_variants first to size the result; max_variants caps how many are retrieved. For downstream genotype analysis on a filtered subset, use the region/min_maf options on the QC/diversity tools instead.

list_sequencesA

List the reference sequences (chromosomes/contigs) available in a variant set.

Use this to discover valid reference_name values for the region filters on count_variants / search_variants / the QC & diversity tools.

list_variant_setsA

List every variant set (run) with its exact BrAPI variantSetDbId.

The other tools take a variant_set_db_id; this returns those ids directly (plus name and variant/callset counts when the server provides them), complementing the human-readable database/project/run view from list_content.

export_genotypesA

Export a variant set to a file in the given format.

format is one of Gigwa's export formats. Which are available depends on the Gigwa build — VCF (default), PLINK and FLAPJACK are commonly supported; others (HAPMAP, DARWIN, …) may not be, in which case the tool reports the formats this instance actually offers. The export runs server-side and is streamed to output_path. For large sets this can take a while; raise timeout (seconds).

Prompts

Interactive templates invoked by user choice

NameDescription
import_and_qcImport a genotype dataset (DArTseq xlsx or VCF) into Gigwa, then run standard QC.
diversity_reportProduce a population diversity / structure report for a variant set.
qc_triageRun the full QC suite on a variant set and give a go/no-go verdict.
explore_instanceGet an overview of the whole Gigwa instance and flag anything that needs attention.
region_scanCharacterise variants and diversity within one genomic region.

Resources

Contextual data attached and managed by the client

NameDescription
Tool catalogA categorised catalog of every tool with its EDAM operation/topic annotations.
Gigwa server infoConfigured connection info — the target URL and auth mode. Deliberately makes **no network call**: reading a resource must be side-effect-free, and the server should not generate outbound traffic during directory inspection. Use the ``gigwa_server_info`` tool to actually test the live connection and fetch the server version.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gkanogiannis/Gigwa-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server