get_sequence
Retrieve DNA sequences from Ensembl using genomic coordinates or gene/transcript identifiers for analysis and research purposes.
Instructions
Get DNA sequence for genomic coordinates or gene/transcript ID
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| region | Yes | Genomic region (chr:start-end) or feature ID | |
| species | No | Species name (default: homo_sapiens) | |
| format | No | Output format (default: fasta) | |
| mask | No | Repeat masking type (optional) | |
| multiple_sequences | No | Return multiple sequences if applicable (default: false) |
Implementation Reference
- src/index.ts:1032-1079 (handler)Core implementation of the get_sequence tool. Validates input using isValidSequenceArgs, determines Ensembl REST API endpoint based on whether input is a feature ID (ENS...) or genomic region, applies optional mask and multiple_sequences params, fetches sequence data, and returns in json or fasta format.private async handleGetSequence(args: any) { if (!isValidSequenceArgs(args)) { throw new McpError(ErrorCode.InvalidParams, 'Invalid sequence arguments'); } try { const species = this.getDefaultSpecies(args.species); const format = args.format || 'fasta'; const region = this.formatGenomicRegion(args.region); let endpoint: string; const params: any = {}; if (region.startsWith('ENS')) { // Feature ID endpoint = `/sequence/id/${region}`; params.type = 'genomic'; } else { // Genomic region endpoint = `/sequence/region/${species}/${region}`; } if (args.mask) { params.mask = args.mask; } if (args.multiple_sequences) { params.multiple_sequences = 1; } const response = await this.apiClient.get(endpoint, { params }); return { content: [ { type: 'text', text: format === 'json' ? JSON.stringify(response.data, null, 2) : typeof response.data === 'string' ? response.data : JSON.stringify(response.data, null, 2), }, ], }; } catch (error) { return this.handleError(error, 'fetching sequence'); } }
- src/index.ts:842-843 (registration)Tool handler registration in the CallToolRequestSchema switch statement, dispatching calls to handleGetSequence.case 'get_sequence': return this.handleGetSequence(args);
- src/index.ts:613-626 (registration)Tool definition and input schema registration in the ListToolsRequestSchema response, including name, description, and JSON schema for parameters.name: 'get_sequence', description: 'Get DNA sequence for genomic coordinates or gene/transcript ID', inputSchema: { type: 'object', properties: { region: { type: 'string', description: 'Genomic region (chr:start-end) or feature ID' }, species: { type: 'string', description: 'Species name (default: homo_sapiens)' }, format: { type: 'string', enum: ['json', 'fasta'], description: 'Output format (default: fasta)' }, mask: { type: 'string', enum: ['hard', 'soft'], description: 'Repeat masking type (optional)' }, multiple_sequences: { type: 'boolean', description: 'Return multiple sequences if applicable (default: false)' }, }, required: ['region'], }, },
- src/index.ts:169-182 (schema)Type guard and validation function for get_sequence tool input arguments, ensuring correct types and values before processing.const isValidSequenceArgs = ( args: any ): args is { region: string; species?: string; format?: string; mask?: string; multiple_sequences?: boolean } => { return ( typeof args === 'object' && args !== null && typeof args.region === 'string' && args.region.length > 0 && (args.species === undefined || typeof args.species === 'string') && (args.format === undefined || ['json', 'fasta'].includes(args.format)) && (args.mask === undefined || ['hard', 'soft'].includes(args.mask)) && (args.multiple_sequences === undefined || typeof args.multiple_sequences === 'boolean') ); };
- src/index.ts:894-907 (helper)Utility function used by handleGetSequence to normalize and validate the region parameter format.private formatGenomicRegion(region: string): string { // Handle different region formats and ensure proper formatting // Support formats like: chr1:1000-2000, 1:1000-2000, ENSG00000139618 if (region.includes(':') && region.includes('-')) { // Already in proper format return region; } else if (region.startsWith('ENS')) { // Gene/transcript/exon ID return region; } else { // Assume it's a chromosome name, return as-is return region; } }