webCrawl
Crawl and ingest web pages into a knowledge base using sitemap.xml, accepting a URL and optional read limit. Executes asynchronously and returns a feed identifier for streamlined data organization.
Instructions
Crawls web pages from web site into Graphlit knowledge base. Accepts a URL and an optional read limit for the number of pages to crawl. Uses sitemap.xml to discover pages to be crawled from website. Executes asynchronously and returns the feed identifier.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
readLimit | No | Number of web pages to ingest, optional. Defaults to 100. | |
url | Yes |
Input Schema (JSON Schema)
You must be authenticated.
Other Tools from Graphlit MCP Server
- addContentsToCollection
- createCollection
- deleteCollection
- deleteContent
- deleteContents
- deleteFeed
- deleteFeeds
- describeContent
- describeImage
- extractText
- ingestBoxFiles
- ingestDiscordMessages
- ingestDropboxFiles
- ingestFile
- ingestGitHubFiles
- ingestGitHubIssues
- ingestGoogleDriveFiles
- ingestGoogleEmail
- ingestJiraIssues
- ingestLinearIssues
- ingestMicrosoftEmail
- ingestMicrosoftTeamsMessages
- ingestNotionPages
- ingestOneDriveFiles
- ingestRedditPosts
- ingestRSS
- ingestSharePointFiles
- ingestSlackMessages
- ingestText
- ingestUrl
- isContentDone
- isFeedDone
- listMicrosoftTeamsChannels
- listMicrosoftTeamsTeams
- listSharePointFolders
- listSharePointLibraries
- listSlackChannels
- removeContentsFromCollection
- retrieveSources
- screenshotPage
- webCrawl
- webMap
- webSearch
Related Tools
- @mugoosse/sitemap-mcp-server
- @scmdr/sourcesyncai-mcp
- @codyde/mcp-firecrawl-tool
- @scmdr/sourcesyncai-mcp
- @mugoosse/sitemap-mcp-server