Skip to main content
Glama

ingest_data

Add text, HTML, or Markdown content to a local search index for private document retrieval, using a source identifier to update existing entries.

Instructions

Ingest content as a string, not from a file. Use for: fetched web pages (format: html), copied text (format: text), or markdown strings (format: markdown). The source identifier enables re-ingestion to update existing content. For files on disk, use ingest_file instead.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
contentYesThe content to ingest (text, HTML, or Markdown)
metadataYes

Implementation Reference

  • Primary execution logic for the ingest_data tool. Handles HTML to Markdown conversion if needed, saves processed content to a raw-data file using saveRawData, then delegates ingestion to handleIngestFile with rollback on failure.
    async handleIngestData(
      args: IngestDataInput
    ): Promise<{ content: [{ type: 'text'; text: string }] }> {
      try {
        let contentToSave = args.content
        let formatToSave: ContentFormat = args.metadata.format
    
        // For HTML content, convert to Markdown first
        if (args.metadata.format === 'html') {
          console.error(`Parsing HTML from: ${args.metadata.source}`)
          const markdown = await parseHtml(args.content, args.metadata.source)
    
          if (!markdown.trim()) {
            throw new Error(
              'Failed to extract content from HTML. The page may have no readable content.'
            )
          }
    
          contentToSave = markdown
          formatToSave = 'markdown' // Save as .md file
          console.error(`Converted HTML to Markdown: ${markdown.length} characters`)
        }
    
        // Save content to raw-data directory
        const rawDataPath = await saveRawData(
          this.dbPath,
          args.metadata.source,
          contentToSave,
          formatToSave
        )
    
        console.error(`Saved raw data: ${args.metadata.source} -> ${rawDataPath}`)
    
        // Call existing ingest_file internally with rollback on failure
        try {
          return await this.handleIngestFile({ filePath: rawDataPath })
        } catch (ingestError) {
          // Rollback: delete the raw-data file if ingest fails
          try {
            await unlink(rawDataPath)
            console.error(`Rolled back raw-data file: ${rawDataPath}`)
          } catch {
            console.warn(`Failed to rollback raw-data file: ${rawDataPath}`)
          }
          throw ingestError
        }
      } catch (error) {
        // Error handling: suppress stack trace in production
        const errorMessage =
          process.env['NODE_ENV'] === 'production'
            ? (error as Error).message
            : (error as Error).stack || (error as Error).message
    
        console.error('Failed to ingest data:', errorMessage)
    
        throw new Error(`Failed to ingest data: ${errorMessage}`)
      }
    }
  • MCP tool registration in listTools handler, defining name, description, and detailed inputSchema for ingest_data.
      name: 'ingest_data',
      description:
        'Ingest content as a string, not from a file. Use for: fetched web pages (format: html), copied text (format: text), or markdown strings (format: markdown). The source identifier enables re-ingestion to update existing content. For files on disk, use ingest_file instead.',
      inputSchema: {
        type: 'object',
        properties: {
          content: {
            type: 'string',
            description: 'The content to ingest (text, HTML, or Markdown)',
          },
          metadata: {
            type: 'object',
            properties: {
              source: {
                type: 'string',
                description:
                  'Source identifier. For web pages, use the URL (e.g., "https://example.com/page"). For other content, use URL-scheme format: "{type}://{date}" or "{type}://{date}/{detail}". Examples: "clipboard://2024-12-30", "chat://2024-12-30/project-discussion", "note://2024-12-30/meeting".',
              },
              format: {
                type: 'string',
                enum: ['text', 'html', 'markdown'],
                description: 'Content format: "text", "html", or "markdown"',
              },
            },
            required: ['source', 'format'],
          },
        },
        required: ['content', 'metadata'],
      },
    },
  • TypeScript interfaces defining the input structure for ingest_data: IngestDataMetadata and IngestDataInput, used for type safety and schema validation.
     * ingest_data tool input metadata
     */
    export interface IngestDataMetadata {
      /** Source identifier: URL ("https://...") or custom ID ("clipboard://2024-12-30") */
      source: string
      /** Content format */
      format: ContentFormat
    }
    
    /**
     * ingest_data tool input
     */
    export interface IngestDataInput {
      /** Content to ingest (text, HTML, or Markdown) */
      content: string
      /** Content metadata */
      metadata: IngestDataMetadata
    }
  • Core helper function called by the handler to persist the ingested content to a secure raw-data file path derived from the source identifier.
    export async function saveRawData(
      dbPath: string,
      source: string,
      content: string,
      format: ContentFormat
    ): Promise<string> {
      const filePath = generateRawDataPath(dbPath, source, format)
    
      // Ensure directory exists
      await mkdir(dirname(filePath), { recursive: true })
    
      // Write content to file
      await writeFile(filePath, content, 'utf-8')
    
      return filePath
    }
  • Generates the deterministic file path for raw-data storage using base64url encoding of normalized source, ensuring uniqueness and security.
    export function generateRawDataPath(dbPath: string, source: string, format: ContentFormat): string {
      const normalizedSource = normalizeSource(source)
      const encoded = encodeBase64Url(normalizedSource)
      const extension = formatToExtension(format)
      // Use resolve to ensure absolute path (required by validateFilePath)
      return resolve(getRawDataDir(dbPath), `${encoded}.${extension}`)
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shinpr/mcp-local-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server