Skip to main content
Glama
DLHellMe
by DLHellMe

scrape_manual

Manually scrape Telegram channel content by opening a browser for login and navigation, then extracting posts with optional file saving.

Instructions

Manual scraping mode: Opens browser for you to login and navigate to any channel, then scrapes it

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of posts to scrape (optional)
save_to_fileNoSave results to MD and JSON files

Implementation Reference

  • src/server.ts:272-290 (registration)
    Registration of the 'scrape_manual' tool in getTools(), including name, description, and input schema definition.
    {
      name: 'scrape_manual',
      description: 'Manual scraping mode: Opens browser for you to login and navigate to any channel, then scrapes it',
      inputSchema: {
        type: 'object',
        properties: {
          limit: {
            type: 'number',
            description: 'Maximum number of posts to scrape (optional)'
          },
          save_to_file: {
            type: 'boolean',
            description: 'Save results to MD and JSON files',
            default: true
          }
        },
        required: []
      }
    },
  • Input schema definition for the 'scrape_manual' tool.
      inputSchema: {
        type: 'object',
        properties: {
          limit: {
            type: 'number',
            description: 'Maximum number of posts to scrape (optional)'
          },
          save_to_file: {
            type: 'boolean',
            description: 'Save results to MD and JSON files',
            default: true
          }
        },
        required: []
      }
    },
  • The primary handler function for the 'scrape_manual' tool. It dynamically imports ManualTelegramScraper, launches a browser for manual user navigation and login, performs the scrape, optionally saves results to files, and returns a formatted response with a sample.
      private async handleManualScrape(args: any): Promise<any> {
        try {
          logger.info('Starting manual scrape mode...');
          
          // Import manual scraper and required modules
          const { ManualTelegramScraper } = await import('./scraper/manual-scraper.js');
          const { join } = await import('path');
          const { writeFile, mkdir } = await import('fs/promises');
          
          const manualScraper = new ManualTelegramScraper();
          
          // Open browser and wait for user to navigate
          const { browser, page } = await manualScraper.loginAndWaitForChannel();
          
          // Scrape the current channel
          const options = {
            maxPosts: args.limit || args.max_posts || 0
          };
          
          const result = await manualScraper.scrapeCurrentChannel(page, options);
          
          // Save to file
          if (args.save_to_file !== false) {
            const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, -5);
            const filename = `${result.channel.username}_${timestamp}_manual.md`;
            
            const formatter = new MarkdownFormatter();
            const markdown = formatter.format(result);
            
            const basePath = 'C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data';
            const filepath = join(basePath, filename);
            
            await mkdir(basePath, { recursive: true });
            await writeFile(filepath, markdown, 'utf8');
            
            // Also save JSON
            const jsonFilename = `${result.channel.username}_${timestamp}_manual.json`;
            const jsonFilepath = join(basePath, jsonFilename);
            await writeFile(jsonFilepath, JSON.stringify(result, null, 2), 'utf8');
            
            logger.info(`Saved to: ${filepath}`);
          }
          
          // Close browser
          await manualScraper.close(browser);
          
          // Format response
          const summary = result.posts.slice(0, 5).map(post => ({
            date: post.date.toISOString(),
            content: post.content.substring(0, 100) + (post.content.length > 100 ? '...' : ''),
            views: post.views
          }));
          
          return {
            content: [
              {
                type: 'text',
                text: `βœ… Successfully scraped ${result.posts.length} posts from ${result.channel.name}
    
    Channel: @${result.channel.username}
    Total posts scraped: ${result.posts.length}
    
    ${args.save_to_file !== false ? `Files saved to:
    - Markdown: C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data\\${result.channel.username}_*_manual.md
    - JSON: C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data\\${result.channel.username}_*_manual.json
    
    ` : ''}Sample of first 5 posts:
    ${summary.map(post => `\nπŸ“… ${post.date}\n${post.content}\nπŸ‘ ${post.views} views`).join('\n---\n')}`
              }
            ]
          };
          
        } catch (error) {
          logger.error('Manual scrape failed:', error);
          return {
            content: [
              {
                type: 'text',
                text: `❌ Manual scrape failed: ${error instanceof Error ? error.message : 'Unknown error'}`
              }
            ]
          };
        }
      }
  • Helper class ManualTelegramScraper providing the core implementation for manual scraping: browser launch, user-guided navigation, channel scraping via scrolling and parsing, post collection, and cleanup.
    export class ManualTelegramScraper {
      private browserManager: BrowserManager;
      private cookieManager: CookieManager;
    
      constructor() {
        this.browserManager = new BrowserManager();
        this.cookieManager = new CookieManager();
      }
    
      async loginAndWaitForChannel(): Promise<{ browser: Browser; page: Page }> {
        logger.info('Opening browser for manual login and navigation...');
        
        // Launch browser in non-headless mode
        const browser = await this.browserManager.launch(false);
        const page = await browser.newPage();
        
        // Set viewport
        await page.setViewport({ width: 1280, height: 800 });
        
        // Load cookies if available
        await this.cookieManager.loadCookies(page);
        
        // Navigate to Telegram Web
        logger.info('Navigating to Telegram Web...');
        await page.goto('https://web.telegram.org/a/', {
          waitUntil: 'networkidle2',
          timeout: 60000
        });
        
        // Instructions for user
        logger.info('='.repeat(60));
        logger.info('MANUAL NAVIGATION REQUIRED');
        logger.info('1. Log in to Telegram if needed');
        logger.info('2. Navigate to the channel you want to scrape');
        logger.info('3. Make sure the channel messages are visible');
        logger.info('4. Press Enter here when ready to start scraping');
        logger.info('='.repeat(60));
        
        // Wait for user to press Enter
        await new Promise<void>(resolve => {
          process.stdin.once('data', () => {
            resolve();
          });
        });
        
        // Save cookies for future use
        await this.cookieManager.saveCookies(page);
        
        return { browser, page };
      }
    
      async scrapeCurrentChannel(page: Page, options: Partial<ScrapeOptions> = {}): Promise<ScrapeResult> {
        logger.info('Starting to scrape current channel...');
        
        try {
          // Get current URL to extract channel info
          const currentUrl = page.url();
          logger.info(`Current URL: ${currentUrl}`);
          
          // Get channel info from the page
          const channelHtml = await page.content();
          const parser = new DataParser(channelHtml);
          let channel = parser.parseChannelInfo();
          
          // Try to extract channel name from URL or page title
          if (channel.name === 'Unknown Channel') {
            const pageTitle = await page.title();
            if (pageTitle && pageTitle !== 'Telegram') {
              channel.name = pageTitle;
            }
          }
          
          // Extract username from URL if possible
          const urlMatch = currentUrl.match(/#@?([^/?]+)$/);
          if (urlMatch && urlMatch[1] && channel.username === 'unknown') {
            channel.username = urlMatch[1].replace('-', '');
          }
          
          logger.info(`Scraping channel: ${channel.name} (@${channel.username})`);
          
          // Scroll and collect posts
          const posts = await this.scrollAndCollectPosts(page, options);
          
          logger.info(`Scraping complete. Total posts: ${posts.length}`);
          
          return {
            channel,
            posts,
            scrapedAt: new Date(),
            totalPosts: posts.length
          };
          
        } catch (error) {
          logger.error('Scraping failed:', error);
          return {
            channel: {
              name: 'Unknown',
              username: 'unknown',
              description: ''
            },
            posts: [],
            scrapedAt: new Date(),
            totalPosts: 0,
            error: error instanceof Error ? error.message : 'Unknown error'
          };
        }
      }
    
      private async scrollAndCollectPosts(page: Page, options: Partial<ScrapeOptions>): Promise<any[]> {
        logger.info('Starting to scroll and collect posts');
        
        const posts: Map<string, any> = new Map();
        let scrollAttempts = 0;
        let lastPostCount = 0;
        let noNewPostsCount = 0;
        const maxScrollAttempts = options.maxPosts ? Math.min(50, Math.ceil(options.maxPosts / 20)) : 50;
        
        while (scrollAttempts < maxScrollAttempts) {
          // Parse current posts
          const html = await page.content();
          const parser = new DataParser(html);
          const currentPosts = parser.parsePosts();
          
          // Add new posts to map (deduplication)
          for (const post of currentPosts) {
            if (!posts.has(post.id)) {
              posts.set(post.id, post);
              
              // Log progress
              if (posts.size % 20 === 0) {
                logger.info(`Collected ${posts.size} posts so far...`);
              }
            }
          }
          
          // Check if we've reached max posts
          if (options.maxPosts && posts.size >= options.maxPosts) {
            logger.info(`Reached maxPosts limit: ${options.maxPosts}`);
            break;
          }
          
          // Check if we're getting new posts
          if (posts.size === lastPostCount) {
            noNewPostsCount++;
            if (noNewPostsCount >= 3) {
              logger.info('No new posts found after 3 attempts, stopping');
              break;
            }
          } else {
            noNewPostsCount = 0;
            lastPostCount = posts.size;
          }
          
          // Scroll to load more messages
          await this.scrollUp(page);
          
          // Wait for new content
          await new Promise(resolve => setTimeout(resolve, 2000));
          
          scrollAttempts++;
        }
        
        logger.info(`Scrolling complete. Total posts collected: ${posts.size}`);
        
        // Sort posts by date (newest first) and limit if needed
        let sortedPosts = Array.from(posts.values()).sort((a, b) => b.date.getTime() - a.date.getTime());
        
        if (options.maxPosts && sortedPosts.length > options.maxPosts) {
          sortedPosts = sortedPosts.slice(0, options.maxPosts);
        }
        
        return sortedPosts;
      }
    
      private async scrollUp(page: Page): Promise<void> {
        // Scroll within the messages container to load older messages
        await page.evaluate(() => {
          const container = document.querySelector('.bubbles-inner, .messages-container, .bubbles, .im_history_scrollable');
          if (container) {
            // Scroll to top of the container to load older messages
            container.scrollTop = 0;
          } else {
            // Try to find any scrollable container
            const scrollables = document.querySelectorAll('[class*="scroll"], [class*="messages"], [class*="chat"]');
            for (let i = 0; i < scrollables.length; i++) {
              const el = scrollables[i] as HTMLElement;
              if (el.scrollHeight > el.clientHeight) {
                el.scrollTop = 0;
                break;
              }
            }
          }
        });
      }
    
      async close(browser: Browser): Promise<void> {
        await browser.close();
      }
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DLHellMe/telegram-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server