Skip to main content
Glama

import_dartseq

Import DArTseq SNP and Silico-DArT xlsx reports into Gigwa by converting them to VCF and uploading to a database, project, and run. Optionally anchor markers to a reference genome or reuse precomputed mapping positions.

Instructions

Import DArTseq data from xlsx report(s) into Gigwa.

Converts the DArTseq SNP and/or Silico-DArT xlsx report(s) to a standard VCF — doing the 2-row genotype calling in Python (so reference homozygotes are not mis-imported as heterozygous, as Gigwa's built-in DArT parser does) — and uploads it to create/append a database (module), project and run.

Provide at least one of snp_xlsx / silico_xlsx (absolute paths). SNP and Silico use different allele models; importing both into the same run is unusual — prefer separate runs unless you specifically intend to combine them.

If reference_fasta is given (a reference genome FASTA or a prebuilt minimap2 .mmi index — an .mmi is loaded directly with no re-indexing, preferred for large genomes), the SNP markers' tag sequences are aligned to it and uniquely-mapped markers (mapq ≥ min_mapq) are imported genome-anchored (real chromosome/position); the rest stay on an Unmapped contig. Without it, all markers go on Unmapped.

positions_csv reuses a mapping already produced by map_dartseq_to_reference (its dartseq_positions.csv) instead of re-aligning — much faster when you've already inspected the mapping. Provide either reference_fasta or positions_csv, not both.

Set clear_project_data=True to replace any existing data in the project, skip_monomorphic=True to drop non-variant markers, and wait=False to return immediately with a progress token instead of blocking until done.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
runYesTarget run name within the project.
waitNoBlock until the import finishes (True) or return a progress token immediately (False).
moduleYesTarget Gigwa database (module) name.
ploidyNoSample ploidy (default 2).
projectYesTarget project name within the database.
min_mapqNoMinimum mapping quality for a tag to count as uniquely mapped.
snp_xlsxNoPath to a DArTseq SNP xlsx report.
technologyNoFree-text genotyping technology label (e.g. 'DArTseq', 'WGS', 'GBS').DArTseq
silico_xlsxNoPath to a Silico-DArT xlsx report.
positions_csvNoPath to a dartseq_positions.csv (from map_dartseq_to_reference) to reuse instead of re-aligning.
reference_fastaNoPath to a reference genome FASTA or a prebuilt minimap2 .mmi index, for genome-anchoring.
skip_monomorphicNoDrop non-variant (monomorphic) markers during import.
clear_project_dataNoReplace any existing data in the project before importing.

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It explains the conversion process (2-row genotype calling), the effect of reference_fasta (genome-anchored vs Unmapped), and the behavior of positions_csv. It also describes async behavior with wait=False. However, it does not mention error handling, idempotency, or performance, which are minor gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear opening sentence followed by paragraphs each addressing a specific aspect (conversion, file inputs, anchoring, parameters). It uses efficient language without redundancy, and every sentence contributes necessary information. Though long, it is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 13 parameters and 3 required, the description covers all major behavioral aspects: input requirements, conversion details, anchoring options, and key flags. The presence of an output schema (not shown) likely covers return values, so description completeness is high. It addresses the main use cases and parameter interactions, leaving few gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing baseline of 3. The description adds substantial value beyond schema: explains the requirement of at least one of snp_xlsx/silico_xlsx, the implications of combining both, the details of .mmi vs FASTA for reference_fasta, and the mutual exclusivity with positions_csv. This addition elevates the score to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: importing DArTseq data from xlsx reports into Gigwa. It specifies the verb ('Import'), resource ('DArTseq data from xlsx report(s)'), and destination ('Gigwa'). It distinguishes from siblings by explaining the specific file types and conversion process, and notes the unusual case of combining SNP and Silico-DArT in one run, implying separate tools for other formats.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use this tool: at least one xlsx file must be provided, and warns against combining SNP and Silico in the same run unless intentional. It explains the alternative of using reference_fasta vs positions_csv and their mutual exclusivity. Also details the effects of clear_project_data, skip_monomorphic, and wait, giving clear context for decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gkanogiannis/Gigwa-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server