ingestSitemap
Extract website content automatically by processing sitemap.xml files with configurable path filtering and link limits for structured data ingestion.
Instructions
Ingests content from a website using its sitemap.xml. Supports path filtering and link limits.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| namespaceId | No | ||
| ingestConfig | Yes | ||
| tenantId | No |
Implementation Reference
- src/sourcesync.ts:388-407 (handler)The core handler implementation in SourceSyncApiClient.ingestSitemap that performs the HTTP POST to the SourceSync API endpoint /v1/ingest/sitemap with the sitemap configuration.* Ingest a sitemap */ public async ingestSitemap({ ingestConfig, }: Omit< SourceSyncIngestSitemapRequest, 'namespaceId' >): Promise<SourceSyncIngestResponse> { return this.client .url('/v1/ingest/sitemap') .json({ namespaceId: this.namespaceId, ingestConfig: { ...ingestConfig, chunkConfig: SourceSyncApiClient.CHUNK_CONFIG, }, } satisfies SourceSyncIngestSitemapRequest) .post() .json<SourceSyncIngestResponse>() }
- src/index.ts:270-288 (registration)MCP server registration of the 'ingestSitemap' tool, including name, description, input schema, and handler that delegates to SourceSyncApiClient.ingestSitemap.// Add ingestSitemap tool server.tool( 'ingestSitemap', 'Ingests content from a website using its sitemap.xml. Supports path filtering and link limits.', IngestSitemapSchema.shape, async (params) => { return safeApiCall(async () => { const { namespaceId, ingestConfig, tenantId } = params // Create a client with the provided parameters const client = createClient({ namespaceId, tenantId }) // Direct passthrough to the API return await client.ingestSitemap({ ingestConfig, }) }) }, )
- src/schemas.ts:216-230 (schema)Zod schema definition for validating inputs to the ingestSitemap tool, including namespaceId, ingestConfig with sitemap url and options, and tenantId.export const IngestSitemapSchema = z.object({ namespaceId: namespaceIdSchema.optional(), ingestConfig: z.object({ source: z.literal(SourceSyncIngestionSource.SITEMAP), config: z.object({ url: z.string(), maxLinks: z.number().optional(), includePaths: z.array(z.string()).optional(), excludePaths: z.array(z.string()).optional(), metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(), }), chunkConfig: chunkConfigSchema.optional(), }), tenantId: tenantIdSchema, })
- src/sourcesync.types.ts:429-443 (schema)TypeScript type definition for SourceSyncIngestSitemapRequest, defining the structure expected by the SourceSync API for sitemap ingestion.export type SourceSyncIngestSitemapRequest = { namespaceId: string ingestConfig: { source: SourceSyncIngestionSource.SITEMAP config: { url: string maxLinks?: number includePaths?: string[] excludePaths?: string[] scrapeOptions?: SourceSyncScrapeOptions metadata?: Record<string, any> } chunkConfig?: SourceSyncChunkConfig } }