Skip to main content
Glama
scmdr

SourceSync.ai MCP Server

by scmdr

ingestWebsite

Crawl and ingest website content recursively with depth control and path filtering for knowledge management.

Instructions

Crawls and ingests content from a website recursively. Supports depth control and path filtering.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
namespaceIdNo
ingestConfigYes
tenantIdNo

Implementation Reference

  • Core implementation of ingestWebsite in SourceSyncApiClient: sends POST to /v1/ingest/website API endpoint with namespaceId and ingestConfig, applying default chunkConfig.
    public async ingestWebsite({
      ingestConfig,
    }: Omit<
      SourceSyncIngestWebsiteRequest,
      'namespaceId'
    >): Promise<SourceSyncIngestResponse> {
      return this.client
        .url('/v1/ingest/website')
        .json({
          namespaceId: this.namespaceId,
          ingestConfig: {
            ...ingestConfig,
            chunkConfig: SourceSyncApiClient.CHUNK_CONFIG,
          },
        } satisfies SourceSyncIngestWebsiteRequest)
        .post()
        .json<SourceSyncIngestResponse>()
    }
  • src/index.ts:291-308 (registration)
    MCP server.tool registration for 'ingestWebsite', wraps client.ingestWebsite call with safeApiCall and parameter handling.
    server.tool(
      'ingestWebsite',
      'Crawls and ingests content from a website recursively. Supports depth control and path filtering.',
      IngestWebsiteSchema.shape,
      async (params) => {
        return safeApiCall(async () => {
          const { namespaceId, ingestConfig, tenantId } = params
    
          // Create a client with the provided parameters
          const client = createClient({ namespaceId, tenantId })
    
          // Direct passthrough to the API
          return await client.ingestWebsite({
            ingestConfig,
          })
        })
      },
    )
  • Zod schema definition for IngestWebsite tool input validation, including namespaceId, ingestConfig with website-specific config, and tenantId.
    export const IngestWebsiteSchema = z.object({
      namespaceId: namespaceIdSchema.optional(),
      ingestConfig: z.object({
        source: z.literal(SourceSyncIngestionSource.WEBSITE),
        config: z.object({
          url: z.string(),
          maxDepth: z.number().optional(),
          maxLinks: z.number().optional(),
          includePaths: z.array(z.string()).optional(),
          excludePaths: z.array(z.string()).optional(),
          metadata: z.record(z.union([z.string(), z.array(z.string())])).optional(),
        }),
        chunkConfig: chunkConfigSchema.optional(),
      }),
      tenantId: tenantIdSchema,
    })
  • TypeScript type definition for SourceSyncIngestWebsiteRequest used by the API client.
    export type SourceSyncIngestWebsiteRequest = {
      namespaceId: string
      ingestConfig: {
        source: SourceSyncIngestionSource.WEBSITE
        config: {
          url: string
          maxDepth?: number
          maxLinks?: number
          includePaths?: string[]
          excludePaths?: string[]
          scrapeOptions?: SourceSyncScrapeOptions
          metadata?: Record<string, any>
        }
        chunkConfig?: SourceSyncChunkConfig
      }
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/scmdr/sourcesyncai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server