scrape_channel
Extract Telegram channel posts in markdown format for content analysis or archiving. Specify URL and post limit to gather channel data with optional reaction information.
Instructions
Scrape a Telegram channel and return posts in markdown format. Uses authenticated session if logged in.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The Telegram channel URL (e.g., https://t.me/channelname) | |
| max_posts | No | Maximum number of posts to scrape (default: 100) | |
| include_reactions | No | Include reaction data in the output |
Implementation Reference
- src/server.ts:385-415 (handler)The primary handler function for the 'scrape_channel' tool. It checks authentication status, selects the appropriate scraper (authenticated or unauthenticated), prepares scrape options from tool arguments, invokes the scraper's scrape method, formats the result as markdown, and returns the MCP-standard content response with an authentication indicator if applicable.private async handleScrapeChannel(args: any): Promise<any> { // Check if authenticated and use authenticated scraper by default const isAuthenticated = await this.auth.isAuthenticated(); const scraperToUse = isAuthenticated ? this.authScraper : this.scraper; if (isAuthenticated) { logger.info('Using authenticated scraper (logged in)'); } else { logger.info('Using unauthenticated scraper (not logged in)'); } const options: ScrapeOptions = { url: args.url, maxPosts: args.max_posts === undefined ? 0 : args.max_posts, // 0 means no limit includeReactions: args.include_reactions !== false }; const result = await scraperToUse.scrape(options); const markdown = this.formatter.format(result); return { content: [ { type: 'text', text: isAuthenticated ? `${markdown}\n\n✅ *Scraped using authenticated session*` : markdown } ] }; }
- src/server.ts:124-146 (registration)Registers the 'scrape_channel' tool with the MCP server via the getTools() method, providing name, description, and input schema for validation.name: 'scrape_channel', description: 'Scrape a Telegram channel and return posts in markdown format. Uses authenticated session if logged in.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The Telegram channel URL (e.g., https://t.me/channelname)' }, max_posts: { type: 'number', description: 'Maximum number of posts to scrape (default: 100)', default: 100 }, include_reactions: { type: 'boolean', description: 'Include reaction data in the output', default: true } }, required: ['url'] } },
- src/server.ts:126-144 (schema)Input schema definition for the 'scrape_channel' tool, specifying parameters like url (required), max_posts, and include_reactions with types, descriptions, and defaults.inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The Telegram channel URL (e.g., https://t.me/channelname)' }, max_posts: { type: 'number', description: 'Maximum number of posts to scrape (default: 100)', default: 100 }, include_reactions: { type: 'boolean', description: 'Include reaction data in the output', default: true } }, required: ['url']
- Core helper function implementing the scraping logic called by the handler. Handles browser page creation, URL validation/navigation (auth/unauth modes), channel info parsing, infinite scrolling to collect posts with deduplication and limits, error handling with screenshots, file saving, and returns structured ScrapeResult.async scrape(options: ScrapeOptions): Promise<ScrapeResult> { logger.info(`Starting scrape for: ${options.url}`); let page: Page | null = null; try { // Validate URL if (!this.isValidTelegramUrl(options.url)) { throw new Error('Invalid Telegram URL. Must be a t.me link.'); } // Create page page = await this.browserManager.createPage(); // Navigate to channel/group await this.navigateToChannel(page, options.url); // Get channel info BEFORE scrolling const channelHtml = await page.content(); const parser = new DataParser(channelHtml); let channel = parser.parseChannelInfo(); // Try to get channel name and username from URL if parsing failed const urlMatch = options.url.match(/t\.me\/s?\/([^/?]+)/); if (urlMatch && urlMatch[1]) { if (channel.username === 'unknown') { channel.username = urlMatch[1]; } if (channel.name === 'Unknown Channel') { channel.name = urlMatch[1]; } } // Scroll and collect posts const posts = await this.scrollAndCollectPosts(page, options); // Get total post count from collected posts const totalPosts = posts.length; logger.info(`Scraping complete. Total posts: ${totalPosts}`); const result = { channel, posts, scrapedAt: new Date(), totalPosts }; // Save to file await this.saveToFile(result, channel.username); return result; } catch (error) { logger.error('Scraping failed:', error); // Take screenshot on error if (page && config.debug.saveScreenshots) { await this.browserManager.screenshot(page, 'error'); } return { channel: { name: 'Unknown', username: 'unknown', description: '' }, posts: [], scrapedAt: new Date(), totalPosts: 0, error: error instanceof Error ? error.message : 'Unknown error' }; } finally { if (page) { await page.close(); } } }