Skip to main content
Glama

ingestWebsite

Crawl and ingest website content recursively with customizable depth, link limits, and path filters. Organize extracted data into chunks for efficient processing and integration into knowledge bases.

Instructions

Crawls and ingests content from a website recursively. Supports depth control and path filtering.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
ingestConfigYes
namespaceIdNo
tenantIdNo

Implementation Reference

  • Core handler method in SourceSyncApiClient that executes the website ingestion by posting to the /v1/ingest/website API endpoint with chunk config.
    public async ingestWebsite({ ingestConfig, }: Omit< SourceSyncIngestWebsiteRequest, 'namespaceId' >): Promise<SourceSyncIngestResponse> { return this.client .url('/v1/ingest/website') .json({ namespaceId: this.namespaceId, ingestConfig: { ...ingestConfig, chunkConfig: SourceSyncApiClient.CHUNK_CONFIG, }, } satisfies SourceSyncIngestWebsiteRequest) .post() .json<SourceSyncIngestResponse>() }
  • src/index.ts:291-307 (registration)
    MCP server.tool registration for the 'ingestWebsite' tool, which creates a SourceSync client and calls its ingestWebsite method.
    server.tool( 'ingestWebsite', 'Crawls and ingests content from a website recursively. Supports depth control and path filtering.', IngestWebsiteSchema.shape, async (params) => { return safeApiCall(async () => { const { namespaceId, ingestConfig, tenantId } = params // Create a client with the provided parameters const client = createClient({ namespaceId, tenantId }) // Direct passthrough to the API return await client.ingestWebsite({ ingestConfig, }) }) },
  • Zod schema defining the input parameters for the ingestWebsite tool, including namespaceId, ingestConfig with website-specific options, and tenantId.
    export const IngestWebsiteSchema = z.object({ namespaceId: namespaceIdSchema.optional(), ingestConfig: z.object({ source: z.literal(SourceSyncIngestionSource.WEBSITE), config: z.object({ url: z.string(), maxDepth: z.number().optional(), maxLinks: z.number().optional(), includePaths: z.array(z.string()).optional(), excludePaths: z.array(z.string()).optional(), metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(), }), chunkConfig: chunkConfigSchema.optional(), }), tenantId: tenantIdSchema, })

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pbteja1998/sourcesyncai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server