Skip to main content
Glama

ingestUrls

Extract and process content from multiple URLs with configurable scraping options and metadata for knowledge base ingestion.

Instructions

Ingests content from a list of URLs. Supports scraping options and metadata.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
namespaceIdNo
ingestConfigYes
tenantIdNo

Implementation Reference

  • src/index.ts:251-268 (registration)
    Registration of the 'ingestUrls' MCP tool using server.tool, including description, input schema (IngestUrlsSchema.shape), and handler function that creates a SourceSyncApiClient and delegates to its ingestUrls method.
    server.tool(
      'ingestUrls',
      'Ingests content from a list of URLs. Supports scraping options and metadata.',
      IngestUrlsSchema.shape,
      async (params) => {
        return safeApiCall(async () => {
          const { namespaceId, tenantId, ingestConfig } = params
    
          // Create a client with the provided parameters
          const client = createClient({ namespaceId, tenantId })
    
          // Direct passthrough to the API
          return await client.ingestUrls({
            ingestConfig,
          })
        })
      },
    )
  • Zod schema (IngestUrlsSchema) defining the input validation for the ingestUrls tool, including optional namespaceId, ingestConfig with URLs list, scrape options, metadata, chunk config, and tenantId.
    export const IngestUrlsSchema = z.object({
      namespaceId: namespaceIdSchema.optional(),
      ingestConfig: z.object({
        source: z.literal(SourceSyncIngestionSource.URLS_LIST),
        config: z.object({
          urls: z.array(z.string()),
          scrapeOptions: ScrapeOptionsSchema.optional(),
          metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(),
        }),
        chunkConfig: chunkConfigSchema.optional(),
      }),
      tenantId: tenantIdSchema,
    })
  • Handler function (ingestUrls method) in SourceSyncApiClient class that executes the core tool logic: constructs and sends JSON POST request to SourceSync API endpoint '/v1/ingest/urls' with namespaceId, ingestConfig, and default chunk config.
    public async ingestUrls({
      ingestConfig,
    }: Omit<
      SourceSyncIngestUrlsRequest,
      'namespaceId'
    >): Promise<SourceSyncIngestResponse> {
      return this.client
        .url('/v1/ingest/urls')
        .json({
          namespaceId: this.namespaceId,
          ingestConfig: {
            ...ingestConfig,
            chunkConfig: SourceSyncApiClient.CHUNK_CONFIG,
          },
        } satisfies SourceSyncIngestUrlsRequest)
        .post()
        .json<SourceSyncIngestResponse>()
    }
  • TypeScript type definition (SourceSyncIngestUrlsRequest) for the request structure used in the ingestUrls API call, imported and used for type safety in src/sourcesync.ts.
    export type SourceSyncIngestUrlsRequest = {
      namespaceId: string
      ingestConfig: {
        source: SourceSyncIngestionSource.URLS_LIST
        config: {
          urls: string[]
          scrapeOptions?: SourceSyncScrapeOptions
          metadata?: Record<string, any>
        }
        chunkConfig?: SourceSyncChunkConfig
      }
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pbteja1998/sourcesyncai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server