PROJECT_PLAN.md•5.36 kB
# Telegram MCP Server Project Plan
## Project Overview
Build an MCP (Model Context Protocol) server that scrapes Telegram public channels and groups using Puppeteer/Chromium and provides the data to Claude in markdown format for analysis.
## Key Requirements
- No Telegram API usage (web scraping only)
- Use Chromium browser automation
- Scroll to parse historical messages
- Extract comprehensive data (posts, reactions, views, metadata)
- Return data in markdown format
- Follow MCP protocol standards
## Proposed Task List
### Phase 1: Project Setup and Architecture
1. **Initialize TypeScript project**
- Set up package.json with dependencies
- Configure TypeScript (tsconfig.json)
- Set up build scripts
- Configure linting and formatting
2. **Install core dependencies**
- @modelcontextprotocol/sdk
- puppeteer (includes Chromium)
- cheerio (for HTML parsing)
- date-fns (for date handling)
- dotenv (for configuration)
3. **Create project structure**
```
tgmcp/
├── src/
│ ├── index.ts (MCP server entry)
│ ├── server.ts (MCP server implementation)
│ ├── scraper/
│ │ ├── telegram-scraper.ts
│ │ ├── browser-manager.ts
│ │ └── data-parser.ts
│ ├── formatters/
│ │ └── markdown-formatter.ts
│ ├── types/
│ │ └── telegram.types.ts
│ └── utils/
│ ├── logger.ts
│ └── config.ts
├── dist/
├── tests/
├── .env.example
├── README.md
└── package.json
```
### Phase 2: MCP Server Implementation
4. **Implement base MCP server**
- Create server class extending MCP SDK
- Set up JSON-RPC message handling
- Implement initialization handshake
- Configure stdio transport
5. **Define MCP tools**
- `scrape_channel`: Scrape a Telegram channel
- `scrape_group`: Scrape a public Telegram group
- `get_channel_info`: Get channel metadata only
- `scrape_date_range`: Scrape posts within date range
6. **Implement error handling**
- MCP protocol error responses
- Scraping failure handling
- Rate limiting errors
- Network timeout handling
### Phase 3: Telegram Scraper Implementation
7. **Browser automation setup**
- Puppeteer configuration
- Headless/headful mode options
- User agent and viewport settings
- Cookie and session management
8. **Navigation logic**
- URL validation for Telegram links
- Page load waiting strategies
- Dynamic content detection
- Error page handling
9. **Scrolling mechanism**
- Implement infinite scroll detection
- Date-based stopping condition
- Memory-efficient batch processing
- Progress tracking
10. **Data extraction**
- Channel/group metadata (name, description, member count)
- Post content (text, media indicators)
- Post metadata (date, views, forwards)
- Reactions (emoji types and counts)
- Reply/comment counts
- Media presence indicators
### Phase 4: Data Processing and Formatting
11. **Data parsing**
- HTML element selectors
- Text content extraction
- Date parsing and normalization
- Number formatting (views, reactions)
12. **Markdown formatter**
- Channel/group header section
- Posts organized by date
- Reaction summaries
- Media indicators
- Structured metadata tables
13. **Data models**
- TypeScript interfaces for all data types
- Validation schemas
- Error types
### Phase 5: Testing and Optimization
14. **Unit tests**
- Parser functions
- Formatter logic
- Data validation
15. **Integration tests**
- MCP server communication
- Scraper functionality
- Error scenarios
16. **Performance optimization**
- Memory usage monitoring
- Batch processing for large channels
- Caching strategies
- Resource cleanup
### Phase 6: Documentation and Deployment
17. **Documentation**
- README with installation guide
- API documentation
- Usage examples
- Troubleshooting guide
18. **Deployment preparation**
- Build scripts
- Distribution packaging
- Claude Desktop configuration example
- Security guidelines
## Technical Decisions
### Architecture
- **Language**: TypeScript for type safety and MCP SDK compatibility
- **Browser Automation**: Puppeteer (includes Chromium, well-documented)
- **Transport**: stdio (standard for local MCP servers)
- **Data Format**: Markdown for Claude compatibility
### Security Considerations
- No credential storage
- Read-only operations
- Rate limiting implementation
- Input validation for URLs
- Sandboxed browser execution
### Limitations
- Public channels/groups only
- No member list access
- Subject to Telegram's anti-bot measures
- Performance depends on channel size
## Success Criteria
- Successfully connects to Claude Desktop
- Can scrape public Telegram channels/groups
- Returns well-formatted markdown
- Handles errors gracefully
- Performs within reasonable time limits
- Follows MCP protocol standards
## Estimated Timeline
- Phase 1-2: 2-3 days (Setup and MCP implementation)
- Phase 3-4: 3-4 days (Scraper and formatting)
- Phase 5-6: 2-3 days (Testing and documentation)
- Total: ~8-10 days for production-ready implementation