ingestWebsite
Crawl and ingest website content recursively with depth control and path filtering for structured data integration.
Instructions
Crawls and ingests content from a website recursively. Supports depth control and path filtering.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| ingestConfig | Yes | ||
| namespaceId | No | ||
| tenantId | No |
Implementation Reference
- src/index.ts:291-308 (registration)MCP tool registration for 'ingestWebsite' using server.tool(). Includes tool name, description, input schema (IngestWebsiteSchema), and thin async handler that creates a SourceSyncApiClient instance and delegates to its ingestWebsite method.server.tool( 'ingestWebsite', 'Crawls and ingests content from a website recursively. Supports depth control and path filtering.', IngestWebsiteSchema.shape, async (params) => { return safeApiCall(async () => { const { namespaceId, ingestConfig, tenantId } = params // Create a client with the provided parameters const client = createClient({ namespaceId, tenantId }) // Direct passthrough to the API return await client.ingestWebsite({ ingestConfig, }) }) }, )
- src/sourcesync.ts:412-429 (handler)Core handler implementation in SourceSyncApiClient class. Makes HTTP POST request to SourceSync.ai API endpoint '/v1/ingest/website' with namespaceId and ingestConfig (merging default chunkConfig).public async ingestWebsite({ ingestConfig, }: Omit< SourceSyncIngestWebsiteRequest, 'namespaceId' >): Promise<SourceSyncIngestResponse> { return this.client .url('/v1/ingest/website') .json({ namespaceId: this.namespaceId, ingestConfig: { ...ingestConfig, chunkConfig: SourceSyncApiClient.CHUNK_CONFIG, }, } satisfies SourceSyncIngestWebsiteRequest) .post() .json<SourceSyncIngestResponse>() }
- src/schemas.ts:232-247 (schema)Zod schema (IngestWebsiteSchema) defining input parameters for the 'ingestWebsite' tool, including optional namespaceId, required ingestConfig with website-specific fields (url, maxDepth, etc.), optional chunkConfig, and tenantId.export const IngestWebsiteSchema = z.object({ namespaceId: namespaceIdSchema.optional(), ingestConfig: z.object({ source: z.literal(SourceSyncIngestionSource.WEBSITE), config: z.object({ url: z.string(), maxDepth: z.number().optional(), maxLinks: z.number().optional(), includePaths: z.array(z.string()).optional(), excludePaths: z.array(z.string()).optional(), metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(), }), chunkConfig: chunkConfigSchema.optional(), }), tenantId: tenantIdSchema, })