Skip to main content
Glama
DLHellMe
by DLHellMe

scrape_manual

Manually scrape Telegram channel content by opening a browser for login and navigation, then extracting posts with optional file saving.

Instructions

Manual scraping mode: Opens browser for you to login and navigate to any channel, then scrapes it

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of posts to scrape (optional)
save_to_fileNoSave results to MD and JSON files

Implementation Reference

  • src/server.ts:272-290 (registration)
    Registration of the 'scrape_manual' tool in getTools(), including name, description, and input schema definition.
    {
      name: 'scrape_manual',
      description: 'Manual scraping mode: Opens browser for you to login and navigate to any channel, then scrapes it',
      inputSchema: {
        type: 'object',
        properties: {
          limit: {
            type: 'number',
            description: 'Maximum number of posts to scrape (optional)'
          },
          save_to_file: {
            type: 'boolean',
            description: 'Save results to MD and JSON files',
            default: true
          }
        },
        required: []
      }
    },
  • Input schema definition for the 'scrape_manual' tool.
      inputSchema: {
        type: 'object',
        properties: {
          limit: {
            type: 'number',
            description: 'Maximum number of posts to scrape (optional)'
          },
          save_to_file: {
            type: 'boolean',
            description: 'Save results to MD and JSON files',
            default: true
          }
        },
        required: []
      }
    },
  • The primary handler function for the 'scrape_manual' tool. It dynamically imports ManualTelegramScraper, launches a browser for manual user navigation and login, performs the scrape, optionally saves results to files, and returns a formatted response with a sample.
      private async handleManualScrape(args: any): Promise<any> {
        try {
          logger.info('Starting manual scrape mode...');
          
          // Import manual scraper and required modules
          const { ManualTelegramScraper } = await import('./scraper/manual-scraper.js');
          const { join } = await import('path');
          const { writeFile, mkdir } = await import('fs/promises');
          
          const manualScraper = new ManualTelegramScraper();
          
          // Open browser and wait for user to navigate
          const { browser, page } = await manualScraper.loginAndWaitForChannel();
          
          // Scrape the current channel
          const options = {
            maxPosts: args.limit || args.max_posts || 0
          };
          
          const result = await manualScraper.scrapeCurrentChannel(page, options);
          
          // Save to file
          if (args.save_to_file !== false) {
            const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, -5);
            const filename = `${result.channel.username}_${timestamp}_manual.md`;
            
            const formatter = new MarkdownFormatter();
            const markdown = formatter.format(result);
            
            const basePath = 'C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data';
            const filepath = join(basePath, filename);
            
            await mkdir(basePath, { recursive: true });
            await writeFile(filepath, markdown, 'utf8');
            
            // Also save JSON
            const jsonFilename = `${result.channel.username}_${timestamp}_manual.json`;
            const jsonFilepath = join(basePath, jsonFilename);
            await writeFile(jsonFilepath, JSON.stringify(result, null, 2), 'utf8');
            
            logger.info(`Saved to: ${filepath}`);
          }
          
          // Close browser
          await manualScraper.close(browser);
          
          // Format response
          const summary = result.posts.slice(0, 5).map(post => ({
            date: post.date.toISOString(),
            content: post.content.substring(0, 100) + (post.content.length > 100 ? '...' : ''),
            views: post.views
          }));
          
          return {
            content: [
              {
                type: 'text',
                text: `āœ… Successfully scraped ${result.posts.length} posts from ${result.channel.name}
    
    Channel: @${result.channel.username}
    Total posts scraped: ${result.posts.length}
    
    ${args.save_to_file !== false ? `Files saved to:
    - Markdown: C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data\\${result.channel.username}_*_manual.md
    - JSON: C:\\Users\\User\\AppData\\Roaming\\Claude\\telegram_scraped_data\\${result.channel.username}_*_manual.json
    
    ` : ''}Sample of first 5 posts:
    ${summary.map(post => `\nšŸ“… ${post.date}\n${post.content}\nšŸ‘ ${post.views} views`).join('\n---\n')}`
              }
            ]
          };
          
        } catch (error) {
          logger.error('Manual scrape failed:', error);
          return {
            content: [
              {
                type: 'text',
                text: `āŒ Manual scrape failed: ${error instanceof Error ? error.message : 'Unknown error'}`
              }
            ]
          };
        }
      }
  • Helper class ManualTelegramScraper providing the core implementation for manual scraping: browser launch, user-guided navigation, channel scraping via scrolling and parsing, post collection, and cleanup.
    export class ManualTelegramScraper {
      private browserManager: BrowserManager;
      private cookieManager: CookieManager;
    
      constructor() {
        this.browserManager = new BrowserManager();
        this.cookieManager = new CookieManager();
      }
    
      async loginAndWaitForChannel(): Promise<{ browser: Browser; page: Page }> {
        logger.info('Opening browser for manual login and navigation...');
        
        // Launch browser in non-headless mode
        const browser = await this.browserManager.launch(false);
        const page = await browser.newPage();
        
        // Set viewport
        await page.setViewport({ width: 1280, height: 800 });
        
        // Load cookies if available
        await this.cookieManager.loadCookies(page);
        
        // Navigate to Telegram Web
        logger.info('Navigating to Telegram Web...');
        await page.goto('https://web.telegram.org/a/', {
          waitUntil: 'networkidle2',
          timeout: 60000
        });
        
        // Instructions for user
        logger.info('='.repeat(60));
        logger.info('MANUAL NAVIGATION REQUIRED');
        logger.info('1. Log in to Telegram if needed');
        logger.info('2. Navigate to the channel you want to scrape');
        logger.info('3. Make sure the channel messages are visible');
        logger.info('4. Press Enter here when ready to start scraping');
        logger.info('='.repeat(60));
        
        // Wait for user to press Enter
        await new Promise<void>(resolve => {
          process.stdin.once('data', () => {
            resolve();
          });
        });
        
        // Save cookies for future use
        await this.cookieManager.saveCookies(page);
        
        return { browser, page };
      }
    
      async scrapeCurrentChannel(page: Page, options: Partial<ScrapeOptions> = {}): Promise<ScrapeResult> {
        logger.info('Starting to scrape current channel...');
        
        try {
          // Get current URL to extract channel info
          const currentUrl = page.url();
          logger.info(`Current URL: ${currentUrl}`);
          
          // Get channel info from the page
          const channelHtml = await page.content();
          const parser = new DataParser(channelHtml);
          let channel = parser.parseChannelInfo();
          
          // Try to extract channel name from URL or page title
          if (channel.name === 'Unknown Channel') {
            const pageTitle = await page.title();
            if (pageTitle && pageTitle !== 'Telegram') {
              channel.name = pageTitle;
            }
          }
          
          // Extract username from URL if possible
          const urlMatch = currentUrl.match(/#@?([^/?]+)$/);
          if (urlMatch && urlMatch[1] && channel.username === 'unknown') {
            channel.username = urlMatch[1].replace('-', '');
          }
          
          logger.info(`Scraping channel: ${channel.name} (@${channel.username})`);
          
          // Scroll and collect posts
          const posts = await this.scrollAndCollectPosts(page, options);
          
          logger.info(`Scraping complete. Total posts: ${posts.length}`);
          
          return {
            channel,
            posts,
            scrapedAt: new Date(),
            totalPosts: posts.length
          };
          
        } catch (error) {
          logger.error('Scraping failed:', error);
          return {
            channel: {
              name: 'Unknown',
              username: 'unknown',
              description: ''
            },
            posts: [],
            scrapedAt: new Date(),
            totalPosts: 0,
            error: error instanceof Error ? error.message : 'Unknown error'
          };
        }
      }
    
      private async scrollAndCollectPosts(page: Page, options: Partial<ScrapeOptions>): Promise<any[]> {
        logger.info('Starting to scroll and collect posts');
        
        const posts: Map<string, any> = new Map();
        let scrollAttempts = 0;
        let lastPostCount = 0;
        let noNewPostsCount = 0;
        const maxScrollAttempts = options.maxPosts ? Math.min(50, Math.ceil(options.maxPosts / 20)) : 50;
        
        while (scrollAttempts < maxScrollAttempts) {
          // Parse current posts
          const html = await page.content();
          const parser = new DataParser(html);
          const currentPosts = parser.parsePosts();
          
          // Add new posts to map (deduplication)
          for (const post of currentPosts) {
            if (!posts.has(post.id)) {
              posts.set(post.id, post);
              
              // Log progress
              if (posts.size % 20 === 0) {
                logger.info(`Collected ${posts.size} posts so far...`);
              }
            }
          }
          
          // Check if we've reached max posts
          if (options.maxPosts && posts.size >= options.maxPosts) {
            logger.info(`Reached maxPosts limit: ${options.maxPosts}`);
            break;
          }
          
          // Check if we're getting new posts
          if (posts.size === lastPostCount) {
            noNewPostsCount++;
            if (noNewPostsCount >= 3) {
              logger.info('No new posts found after 3 attempts, stopping');
              break;
            }
          } else {
            noNewPostsCount = 0;
            lastPostCount = posts.size;
          }
          
          // Scroll to load more messages
          await this.scrollUp(page);
          
          // Wait for new content
          await new Promise(resolve => setTimeout(resolve, 2000));
          
          scrollAttempts++;
        }
        
        logger.info(`Scrolling complete. Total posts collected: ${posts.size}`);
        
        // Sort posts by date (newest first) and limit if needed
        let sortedPosts = Array.from(posts.values()).sort((a, b) => b.date.getTime() - a.date.getTime());
        
        if (options.maxPosts && sortedPosts.length > options.maxPosts) {
          sortedPosts = sortedPosts.slice(0, options.maxPosts);
        }
        
        return sortedPosts;
      }
    
      private async scrollUp(page: Page): Promise<void> {
        // Scroll within the messages container to load older messages
        await page.evaluate(() => {
          const container = document.querySelector('.bubbles-inner, .messages-container, .bubbles, .im_history_scrollable');
          if (container) {
            // Scroll to top of the container to load older messages
            container.scrollTop = 0;
          } else {
            // Try to find any scrollable container
            const scrollables = document.querySelectorAll('[class*="scroll"], [class*="messages"], [class*="chat"]');
            for (let i = 0; i < scrollables.length; i++) {
              const el = scrollables[i] as HTMLElement;
              if (el.scrollHeight > el.clientHeight) {
                el.scrollTop = 0;
                break;
              }
            }
          }
        });
      }
    
      async close(browser: Browser): Promise<void> {
        await browser.close();
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions 'Opens browser' and 'login,' hinting at interactive behavior, but doesn't disclose critical traits like whether it's read-only, destructive, requires user input during runtime, rate limits, or error handling. For a tool with manual interaction and scraping, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence: 'Manual scraping mode: Opens browser for you to login and navigate to any channel, then scrapes it.' It's front-loaded with the key concept ('Manual scraping mode') and wastes no words, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a manual scraping tool with browser interaction and no annotations or output schema, the description is incomplete. It doesn't explain what 'scrapes' entails (e.g., data types returned, success/failure states), prerequisites, or behavioral details. This leaves the agent with insufficient information for reliable tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('limit' and 'save_to_file') with clear descriptions. The tool description adds no parameter-specific information beyond what's in the schema, such as default values or usage context. Baseline 3 is appropriate as the schema handles the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Opens browser for you to login and navigate to any channel, then scrapes it.' It specifies the verb ('scrapes') and resource ('any channel'), and distinguishes itself from siblings like 'api_scrape_channel' by emphasizing manual browser interaction. However, it doesn't explicitly differentiate from 'scrape_channel_authenticated' or 'scrape_channel_full' in terms of scope or depth.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'Opens browser for you to login,' suggesting it's for manual authentication scenarios. It doesn't provide explicit when-to-use vs. when-not-to-use guidance or name alternatives like 'api_scrape_channel' for automated cases. The context is clear but lacks detailed exclusions or comparative advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DLHellMe/telegram-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server