by graphlit
Crawls web pages from web site into Graphlit knowledge base. Accepts a URL and an optional read limit for the number of pages to crawl. Uses sitemap.xml to discover pages to be crawled from website. Executes asynchronously and returns the feed identifier.
Input Schema
Name | Required | Description | Default |
readLimit | No | Number of web pages to ingest, optional. Defaults to 100. | |
url | Yes |
Input Schema (JSON Schema)
You must be authenticated.
Other Tools
- addContentsToCollection
- createCollection
- deleteCollection
- deleteContent
- deleteContents
- deleteFeed
- deleteFeeds
- describeContent
- describeImage
- extractText
- ingestBoxFiles
- ingestDiscordMessages
- ingestDropboxFiles
- ingestFile
- ingestGitHubFiles
- ingestGitHubIssues
- ingestGoogleDriveFiles
- ingestGoogleEmail
- ingestJiraIssues
- ingestLinearIssues
- ingestMicrosoftEmail
- ingestMicrosoftTeamsMessages
- ingestNotionPages
- ingestOneDriveFiles
- ingestRedditPosts
- ingestRSS
- ingestSharePointFiles
- ingestSlackMessages
- ingestText
- ingestUrl
- isContentDone
- isFeedDone
- listMicrosoftTeamsChannels
- listMicrosoftTeamsTeams
- listSharePointFolders
- listSharePointLibraries
- listSlackChannels
- removeContentsFromCollection
- retrieveSources
- screenshotPage
- webCrawl
- webMap
- webSearch