Skip to main content
Glama

ingestUrls

Extract and process content from multiple URLs with configurable scraping options and metadata for knowledge base ingestion.

Instructions

Ingests content from a list of URLs. Supports scraping options and metadata.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
namespaceIdNo
ingestConfigYes
tenantIdNo

Implementation Reference

  • src/index.ts:251-268 (registration)
    Registration of the 'ingestUrls' MCP tool using server.tool, including description, input schema (IngestUrlsSchema.shape), and handler function that creates a SourceSyncApiClient and delegates to its ingestUrls method.
    server.tool(
      'ingestUrls',
      'Ingests content from a list of URLs. Supports scraping options and metadata.',
      IngestUrlsSchema.shape,
      async (params) => {
        return safeApiCall(async () => {
          const { namespaceId, tenantId, ingestConfig } = params
    
          // Create a client with the provided parameters
          const client = createClient({ namespaceId, tenantId })
    
          // Direct passthrough to the API
          return await client.ingestUrls({
            ingestConfig,
          })
        })
      },
    )
  • Zod schema (IngestUrlsSchema) defining the input validation for the ingestUrls tool, including optional namespaceId, ingestConfig with URLs list, scrape options, metadata, chunk config, and tenantId.
    export const IngestUrlsSchema = z.object({
      namespaceId: namespaceIdSchema.optional(),
      ingestConfig: z.object({
        source: z.literal(SourceSyncIngestionSource.URLS_LIST),
        config: z.object({
          urls: z.array(z.string()),
          scrapeOptions: ScrapeOptionsSchema.optional(),
          metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(),
        }),
        chunkConfig: chunkConfigSchema.optional(),
      }),
      tenantId: tenantIdSchema,
    })
  • Handler function (ingestUrls method) in SourceSyncApiClient class that executes the core tool logic: constructs and sends JSON POST request to SourceSync API endpoint '/v1/ingest/urls' with namespaceId, ingestConfig, and default chunk config.
    public async ingestUrls({
      ingestConfig,
    }: Omit<
      SourceSyncIngestUrlsRequest,
      'namespaceId'
    >): Promise<SourceSyncIngestResponse> {
      return this.client
        .url('/v1/ingest/urls')
        .json({
          namespaceId: this.namespaceId,
          ingestConfig: {
            ...ingestConfig,
            chunkConfig: SourceSyncApiClient.CHUNK_CONFIG,
          },
        } satisfies SourceSyncIngestUrlsRequest)
        .post()
        .json<SourceSyncIngestResponse>()
    }
  • TypeScript type definition (SourceSyncIngestUrlsRequest) for the request structure used in the ingestUrls API call, imported and used for type safety in src/sourcesync.ts.
    export type SourceSyncIngestUrlsRequest = {
      namespaceId: string
      ingestConfig: {
        source: SourceSyncIngestionSource.URLS_LIST
        config: {
          urls: string[]
          scrapeOptions?: SourceSyncScrapeOptions
          metadata?: Record<string, any>
        }
        chunkConfig?: SourceSyncChunkConfig
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but provides minimal behavioral disclosure. 'Ingests' implies a write/mutation operation, but there's no information about permissions required, rate limits, side effects, or what 'ingests' actually means operationally. The mention of 'scraping options and metadata' hints at capabilities but lacks specifics about behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with zero waste. The first sentence states the core purpose, the second adds key capabilities. However, it's arguably too brief given the tool's complexity and lack of annotations, potentially sacrificing completeness for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex ingestion tool with 3 parameters, nested objects, no annotations, and no output schema, the description is inadequate. It doesn't explain what 'ingests' means operationally, what happens to the ingested content, error conditions, or return values. The lack of behavioral context and incomplete parameter coverage makes this insufficient for safe agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate but only partially does. It mentions 'list of URLs' (mapping to urls array), 'scraping options' (mapping to scrapeOptions), and 'metadata' (mapping to metadata object), covering 3 of the 6 nested parameters. However, it omits namespaceId, tenantId, chunkConfig, and the required source field, leaving significant gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Ingests content from a list of URLs' specifies the verb (ingests) and resource (content from URLs). It distinguishes from siblings like ingestFile, ingestText, and ingestWebsite by focusing on URL lists, but doesn't explicitly differentiate from fetchUrlContent which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description mentions 'Supports scraping options and metadata' but doesn't clarify when to choose ingestUrls over ingestWebsite, ingestSitemap, or fetchUrlContent. No prerequisites, exclusions, or comparative context is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sitegpt/sourcesyncai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server