ingestWebsite
Crawl and ingest website content recursively with depth control and path filtering for knowledge management.
Instructions
Crawls and ingests content from a website recursively. Supports depth control and path filtering.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| namespaceId | No | ||
| ingestConfig | Yes | ||
| tenantId | No |
Implementation Reference
- src/sourcesync.ts:412-429 (handler)Core implementation of ingestWebsite in SourceSyncApiClient: sends POST to /v1/ingest/website API endpoint with namespaceId and ingestConfig, applying default chunkConfig.public async ingestWebsite({ ingestConfig, }: Omit< SourceSyncIngestWebsiteRequest, 'namespaceId' >): Promise<SourceSyncIngestResponse> { return this.client .url('/v1/ingest/website') .json({ namespaceId: this.namespaceId, ingestConfig: { ...ingestConfig, chunkConfig: SourceSyncApiClient.CHUNK_CONFIG, }, } satisfies SourceSyncIngestWebsiteRequest) .post() .json<SourceSyncIngestResponse>() }
- src/index.ts:291-308 (registration)MCP server.tool registration for 'ingestWebsite', wraps client.ingestWebsite call with safeApiCall and parameter handling.server.tool( 'ingestWebsite', 'Crawls and ingests content from a website recursively. Supports depth control and path filtering.', IngestWebsiteSchema.shape, async (params) => { return safeApiCall(async () => { const { namespaceId, ingestConfig, tenantId } = params // Create a client with the provided parameters const client = createClient({ namespaceId, tenantId }) // Direct passthrough to the API return await client.ingestWebsite({ ingestConfig, }) }) }, )
- src/schemas.ts:232-247 (schema)Zod schema definition for IngestWebsite tool input validation, including namespaceId, ingestConfig with website-specific config, and tenantId.export const IngestWebsiteSchema = z.object({ namespaceId: namespaceIdSchema.optional(), ingestConfig: z.object({ source: z.literal(SourceSyncIngestionSource.WEBSITE), config: z.object({ url: z.string(), maxDepth: z.number().optional(), maxLinks: z.number().optional(), includePaths: z.array(z.string()).optional(), excludePaths: z.array(z.string()).optional(), metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(), }), chunkConfig: chunkConfigSchema.optional(), }), tenantId: tenantIdSchema, })
- src/sourcesync.types.ts:445-460 (helper)TypeScript type definition for SourceSyncIngestWebsiteRequest used by the API client.export type SourceSyncIngestWebsiteRequest = { namespaceId: string ingestConfig: { source: SourceSyncIngestionSource.WEBSITE config: { url: string maxDepth?: number maxLinks?: number includePaths?: string[] excludePaths?: string[] scrapeOptions?: SourceSyncScrapeOptions metadata?: Record<string, any> } chunkConfig?: SourceSyncChunkConfig } }