Skip to main content
Glama

ingest_data

Add text, HTML, or Markdown content to a local search index for private document retrieval, using a source identifier to update existing entries.

Instructions

Ingest content as a string, not from a file. Use for: fetched web pages (format: html), copied text (format: text), or markdown strings (format: markdown). The source identifier enables re-ingestion to update existing content. For files on disk, use ingest_file instead.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
contentYesThe content to ingest (text, HTML, or Markdown)
metadataYes

Implementation Reference

  • Primary execution logic for the ingest_data tool. Handles HTML to Markdown conversion if needed, saves processed content to a raw-data file using saveRawData, then delegates ingestion to handleIngestFile with rollback on failure.
    async handleIngestData( args: IngestDataInput ): Promise<{ content: [{ type: 'text'; text: string }] }> { try { let contentToSave = args.content let formatToSave: ContentFormat = args.metadata.format // For HTML content, convert to Markdown first if (args.metadata.format === 'html') { console.error(`Parsing HTML from: ${args.metadata.source}`) const markdown = await parseHtml(args.content, args.metadata.source) if (!markdown.trim()) { throw new Error( 'Failed to extract content from HTML. The page may have no readable content.' ) } contentToSave = markdown formatToSave = 'markdown' // Save as .md file console.error(`Converted HTML to Markdown: ${markdown.length} characters`) } // Save content to raw-data directory const rawDataPath = await saveRawData( this.dbPath, args.metadata.source, contentToSave, formatToSave ) console.error(`Saved raw data: ${args.metadata.source} -> ${rawDataPath}`) // Call existing ingest_file internally with rollback on failure try { return await this.handleIngestFile({ filePath: rawDataPath }) } catch (ingestError) { // Rollback: delete the raw-data file if ingest fails try { await unlink(rawDataPath) console.error(`Rolled back raw-data file: ${rawDataPath}`) } catch { console.warn(`Failed to rollback raw-data file: ${rawDataPath}`) } throw ingestError } } catch (error) { // Error handling: suppress stack trace in production const errorMessage = process.env['NODE_ENV'] === 'production' ? (error as Error).message : (error as Error).stack || (error as Error).message console.error('Failed to ingest data:', errorMessage) throw new Error(`Failed to ingest data: ${errorMessage}`) } }
  • MCP tool registration in listTools handler, defining name, description, and detailed inputSchema for ingest_data.
    name: 'ingest_data', description: 'Ingest content as a string, not from a file. Use for: fetched web pages (format: html), copied text (format: text), or markdown strings (format: markdown). The source identifier enables re-ingestion to update existing content. For files on disk, use ingest_file instead.', inputSchema: { type: 'object', properties: { content: { type: 'string', description: 'The content to ingest (text, HTML, or Markdown)', }, metadata: { type: 'object', properties: { source: { type: 'string', description: 'Source identifier. For web pages, use the URL (e.g., "https://example.com/page"). For other content, use URL-scheme format: "{type}://{date}" or "{type}://{date}/{detail}". Examples: "clipboard://2024-12-30", "chat://2024-12-30/project-discussion", "note://2024-12-30/meeting".', }, format: { type: 'string', enum: ['text', 'html', 'markdown'], description: 'Content format: "text", "html", or "markdown"', }, }, required: ['source', 'format'], }, }, required: ['content', 'metadata'], }, },
  • TypeScript interfaces defining the input structure for ingest_data: IngestDataMetadata and IngestDataInput, used for type safety and schema validation.
    * ingest_data tool input metadata */ export interface IngestDataMetadata { /** Source identifier: URL ("https://...") or custom ID ("clipboard://2024-12-30") */ source: string /** Content format */ format: ContentFormat } /** * ingest_data tool input */ export interface IngestDataInput { /** Content to ingest (text, HTML, or Markdown) */ content: string /** Content metadata */ metadata: IngestDataMetadata }
  • Core helper function called by the handler to persist the ingested content to a secure raw-data file path derived from the source identifier.
    export async function saveRawData( dbPath: string, source: string, content: string, format: ContentFormat ): Promise<string> { const filePath = generateRawDataPath(dbPath, source, format) // Ensure directory exists await mkdir(dirname(filePath), { recursive: true }) // Write content to file await writeFile(filePath, content, 'utf-8') return filePath }
  • Generates the deterministic file path for raw-data storage using base64url encoding of normalized source, ensuring uniqueness and security.
    export function generateRawDataPath(dbPath: string, source: string, format: ContentFormat): string { const normalizedSource = normalizeSource(source) const encoded = encodeBase64Url(normalizedSource) const extension = formatToExtension(format) // Use resolve to ensure absolute path (required by validateFilePath) return resolve(getRawDataDir(dbPath), `${encoded}.${extension}`) }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shinpr/mcp-local-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server