ingestWebsite
Crawl and ingest website content recursively with customizable depth, link limits, and path filters. Organize extracted data into chunks for efficient processing and integration into knowledge bases.
Instructions
Crawls and ingests content from a website recursively. Supports depth control and path filtering.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| ingestConfig | Yes | ||
| namespaceId | No | ||
| tenantId | No |
Implementation Reference
- src/sourcesync.ts:412-429 (handler)Core handler method in SourceSyncApiClient that executes the website ingestion by posting to the /v1/ingest/website API endpoint with chunk config.public async ingestWebsite({ ingestConfig, }: Omit< SourceSyncIngestWebsiteRequest, 'namespaceId' >): Promise<SourceSyncIngestResponse> { return this.client .url('/v1/ingest/website') .json({ namespaceId: this.namespaceId, ingestConfig: { ...ingestConfig, chunkConfig: SourceSyncApiClient.CHUNK_CONFIG, }, } satisfies SourceSyncIngestWebsiteRequest) .post() .json<SourceSyncIngestResponse>() }
- src/index.ts:291-307 (registration)MCP server.tool registration for the 'ingestWebsite' tool, which creates a SourceSync client and calls its ingestWebsite method.server.tool( 'ingestWebsite', 'Crawls and ingests content from a website recursively. Supports depth control and path filtering.', IngestWebsiteSchema.shape, async (params) => { return safeApiCall(async () => { const { namespaceId, ingestConfig, tenantId } = params // Create a client with the provided parameters const client = createClient({ namespaceId, tenantId }) // Direct passthrough to the API return await client.ingestWebsite({ ingestConfig, }) }) },
- src/schemas.ts:232-247 (schema)Zod schema defining the input parameters for the ingestWebsite tool, including namespaceId, ingestConfig with website-specific options, and tenantId.export const IngestWebsiteSchema = z.object({ namespaceId: namespaceIdSchema.optional(), ingestConfig: z.object({ source: z.literal(SourceSyncIngestionSource.WEBSITE), config: z.object({ url: z.string(), maxDepth: z.number().optional(), maxLinks: z.number().optional(), includePaths: z.array(z.string()).optional(), excludePaths: z.array(z.string()).optional(), metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(), }), chunkConfig: chunkConfigSchema.optional(), }), tenantId: tenantIdSchema, })