scrape_manual
Manually scrape Telegram channel content by opening a browser for login and navigation, then extracting posts with optional file saving.
Instructions
Manual scraping mode: Opens browser for you to login and navigate to any channel, then scrapes it
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of posts to scrape (optional) | |
| save_to_file | No | Save results to MD and JSON files |
Implementation Reference
- src/server.ts:272-290 (registration)Registration of the 'scrape_manual' tool in getTools(), including name, description, and input schema definition.{ name: 'scrape_manual', description: 'Manual scraping mode: Opens browser for you to login and navigate to any channel, then scrapes it', inputSchema: { type: 'object', properties: { limit: { type: 'number', description: 'Maximum number of posts to scrape (optional)' }, save_to_file: { type: 'boolean', description: 'Save results to MD and JSON files', default: true } }, required: [] } },
- src/server.ts:275-290 (schema)Input schema definition for the 'scrape_manual' tool.inputSchema: { type: 'object', properties: { limit: { type: 'number', description: 'Maximum number of posts to scrape (optional)' }, save_to_file: { type: 'boolean', description: 'Save results to MD and JSON files', default: true } }, required: [] } },
- src/server.ts:678-761 (handler)The primary handler function for the 'scrape_manual' tool. It dynamically imports ManualTelegramScraper, launches a browser for manual user navigation and login, performs the scrape, optionally saves results to files, and returns a formatted response with a sample.private async handleManualScrape(args: any): Promise<any> { try { logger.info('Starting manual scrape mode...'); // Import manual scraper and required modules const { ManualTelegramScraper } = await import('./scraper/manual-scraper.js'); const { join } = await import('path'); const { writeFile, mkdir } = await import('fs/promises'); const manualScraper = new ManualTelegramScraper(); // Open browser and wait for user to navigate const { browser, page } = await manualScraper.loginAndWaitForChannel(); // Scrape the current channel const options = { maxPosts: args.limit || args.max_posts || 0 }; const result = await manualScraper.scrapeCurrentChannel(page, options); // Save to file if (args.save_to_file !== false) { const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, -5); const filename = `${result.channel.username}_${timestamp}_manual.md`; const formatter = new MarkdownFormatter(); const markdown = formatter.format(result); const basePath = 'C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data'; const filepath = join(basePath, filename); await mkdir(basePath, { recursive: true }); await writeFile(filepath, markdown, 'utf8'); // Also save JSON const jsonFilename = `${result.channel.username}_${timestamp}_manual.json`; const jsonFilepath = join(basePath, jsonFilename); await writeFile(jsonFilepath, JSON.stringify(result, null, 2), 'utf8'); logger.info(`Saved to: ${filepath}`); } // Close browser await manualScraper.close(browser); // Format response const summary = result.posts.slice(0, 5).map(post => ({ date: post.date.toISOString(), content: post.content.substring(0, 100) + (post.content.length > 100 ? '...' : ''), views: post.views })); return { content: [ { type: 'text', text: `ā Successfully scraped ${result.posts.length} posts from ${result.channel.name} Channel: @${result.channel.username} Total posts scraped: ${result.posts.length} ${args.save_to_file !== false ? `Files saved to: - Markdown: C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data\\${result.channel.username}_*_manual.md - JSON: C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data\\${result.channel.username}_*_manual.json ` : ''}Sample of first 5 posts: ${summary.map(post => `\nš ${post.date}\n${post.content}\nš ${post.views} views`).join('\n---\n')}` } ] }; } catch (error) { logger.error('Manual scrape failed:', error); return { content: [ { type: 'text', text: `ā Manual scrape failed: ${error instanceof Error ? error.message : 'Unknown error'}` } ] }; } }
- src/scraper/manual-scraper.ts:8-206 (helper)Helper class ManualTelegramScraper providing the core implementation for manual scraping: browser launch, user-guided navigation, channel scraping via scrolling and parsing, post collection, and cleanup.export class ManualTelegramScraper { private browserManager: BrowserManager; private cookieManager: CookieManager; constructor() { this.browserManager = new BrowserManager(); this.cookieManager = new CookieManager(); } async loginAndWaitForChannel(): Promise<{ browser: Browser; page: Page }> { logger.info('Opening browser for manual login and navigation...'); // Launch browser in non-headless mode const browser = await this.browserManager.launch(false); const page = await browser.newPage(); // Set viewport await page.setViewport({ width: 1280, height: 800 }); // Load cookies if available await this.cookieManager.loadCookies(page); // Navigate to Telegram Web logger.info('Navigating to Telegram Web...'); await page.goto('https://web.telegram.org/a/', { waitUntil: 'networkidle2', timeout: 60000 }); // Instructions for user logger.info('='.repeat(60)); logger.info('MANUAL NAVIGATION REQUIRED'); logger.info('1. Log in to Telegram if needed'); logger.info('2. Navigate to the channel you want to scrape'); logger.info('3. Make sure the channel messages are visible'); logger.info('4. Press Enter here when ready to start scraping'); logger.info('='.repeat(60)); // Wait for user to press Enter await new Promise<void>(resolve => { process.stdin.once('data', () => { resolve(); }); }); // Save cookies for future use await this.cookieManager.saveCookies(page); return { browser, page }; } async scrapeCurrentChannel(page: Page, options: Partial<ScrapeOptions> = {}): Promise<ScrapeResult> { logger.info('Starting to scrape current channel...'); try { // Get current URL to extract channel info const currentUrl = page.url(); logger.info(`Current URL: ${currentUrl}`); // Get channel info from the page const channelHtml = await page.content(); const parser = new DataParser(channelHtml); let channel = parser.parseChannelInfo(); // Try to extract channel name from URL or page title if (channel.name === 'Unknown Channel') { const pageTitle = await page.title(); if (pageTitle && pageTitle !== 'Telegram') { channel.name = pageTitle; } } // Extract username from URL if possible const urlMatch = currentUrl.match(/#@?([^/?]+)$/); if (urlMatch && urlMatch[1] && channel.username === 'unknown') { channel.username = urlMatch[1].replace('-', ''); } logger.info(`Scraping channel: ${channel.name} (@${channel.username})`); // Scroll and collect posts const posts = await this.scrollAndCollectPosts(page, options); logger.info(`Scraping complete. Total posts: ${posts.length}`); return { channel, posts, scrapedAt: new Date(), totalPosts: posts.length }; } catch (error) { logger.error('Scraping failed:', error); return { channel: { name: 'Unknown', username: 'unknown', description: '' }, posts: [], scrapedAt: new Date(), totalPosts: 0, error: error instanceof Error ? error.message : 'Unknown error' }; } } private async scrollAndCollectPosts(page: Page, options: Partial<ScrapeOptions>): Promise<any[]> { logger.info('Starting to scroll and collect posts'); const posts: Map<string, any> = new Map(); let scrollAttempts = 0; let lastPostCount = 0; let noNewPostsCount = 0; const maxScrollAttempts = options.maxPosts ? Math.min(50, Math.ceil(options.maxPosts / 20)) : 50; while (scrollAttempts < maxScrollAttempts) { // Parse current posts const html = await page.content(); const parser = new DataParser(html); const currentPosts = parser.parsePosts(); // Add new posts to map (deduplication) for (const post of currentPosts) { if (!posts.has(post.id)) { posts.set(post.id, post); // Log progress if (posts.size % 20 === 0) { logger.info(`Collected ${posts.size} posts so far...`); } } } // Check if we've reached max posts if (options.maxPosts && posts.size >= options.maxPosts) { logger.info(`Reached maxPosts limit: ${options.maxPosts}`); break; } // Check if we're getting new posts if (posts.size === lastPostCount) { noNewPostsCount++; if (noNewPostsCount >= 3) { logger.info('No new posts found after 3 attempts, stopping'); break; } } else { noNewPostsCount = 0; lastPostCount = posts.size; } // Scroll to load more messages await this.scrollUp(page); // Wait for new content await new Promise(resolve => setTimeout(resolve, 2000)); scrollAttempts++; } logger.info(`Scrolling complete. Total posts collected: ${posts.size}`); // Sort posts by date (newest first) and limit if needed let sortedPosts = Array.from(posts.values()).sort((a, b) => b.date.getTime() - a.date.getTime()); if (options.maxPosts && sortedPosts.length > options.maxPosts) { sortedPosts = sortedPosts.slice(0, options.maxPosts); } return sortedPosts; } private async scrollUp(page: Page): Promise<void> { // Scroll within the messages container to load older messages await page.evaluate(() => { const container = document.querySelector('.bubbles-inner, .messages-container, .bubbles, .im_history_scrollable'); if (container) { // Scroll to top of the container to load older messages container.scrollTop = 0; } else { // Try to find any scrollable container const scrollables = document.querySelectorAll('[class*="scroll"], [class*="messages"], [class*="chat"]'); for (let i = 0; i < scrollables.length; i++) { const el = scrollables[i] as HTMLElement; if (el.scrollHeight > el.clientHeight) { el.scrollTop = 0; break; } } } }); } async close(browser: Browser): Promise<void> { await browser.close(); } }