Reddit MCP Server

reddit-api-deep-dive.md•6.42 KiB

# Reddit API Deep Dive - Technical Research ## Executive Summary Reddit API provides programmatic access via OAuth2 with 100 QPM free tier. Key limitations: 1,000 item cap, NSFW blocked since 2023, Pushshift shutdown impacts historical data access. ## Authentication & Rate Limits ### OAuth2 Flow - **Free Tier**: 100 queries per minute (QPM) per OAuth client ID - **Paid Tier**: $0.24 per 1,000 API calls (commercial use) - **Authentication**: OAuth2 required, no unauthenticated access - **Rate Limit Window**: 10-minute rolling average ### Best Libraries - **Python**: PRAW (Python Reddit API Wrapper) - mature, handles auth/rate limits automatically - **JavaScript**: Snoowrap - async/promise-based (last update 4 years ago) ## Key Endpoints ### Posts/Submissions - `/r/{subreddit}/hot.json` - Hot posts - `/r/{subreddit}/new.json` - New posts - `/r/{subreddit}/top.json?t={time}` - Top posts (hour/day/week/month/year/all) - `/r/{subreddit}/rising.json` - Rising posts - `/search.json?q={query}` - Site-wide search ### Comments - `/r/{subreddit}/comments/{id}.json` - Post with comments - `/api/morechildren` - Fetch additional nested comments ### Users & Subreddits - `/user/{username}/about.json` - User profile - `/r/{subreddit}/about.json` - Subreddit metadata ## Critical Limitations ### 1,000 Item Cap - Cannot retrieve more than ~1,000 items per listing - Pagination stops after 1,000 items regardless of actual count - **Workaround**: Time-based filtering, multiple queries ### NSFW Content Blocked - All NSFW content inaccessible via API since mid-2023 - No workaround available ### Pushshift Shutdown (May 2023) - Historical data archive no longer operational - Impacted 1,700+ academic research papers - **Alternative**: Static Pushshift dumps (outdated) ## Rate Limit Management ### Token Bucket Algorithm ```python class RateLimiter: def __init__(self, max_calls=100, period=60): self.max_calls = max_calls self.period = period self.calls = [] def wait_if_needed(self): now = datetime.now() self.calls = [c for c in self.calls if now - c < timedelta(seconds=self.period)] if len(self.calls) >= self.max_calls: sleep_time = (self.calls[0] + timedelta(seconds=self.period) - now).total_seconds() time.sleep(max(0, sleep_time)) self.calls = [] self.calls.append(now) ``` ### PRAW Built-in Rate Limiting - Automatic 30-second caching - Exponential backoff on 429 errors - Respects `X-Ratelimit-*` headers ## Caching Strategy ### Recommended TTLs | Content Type | Cache Duration | Reasoning | |-------------|----------------|-----------| | Hot posts | 5 minutes | Changes rapidly | | New posts | 2 minutes | Real-time monitoring | | Top posts (historical) | 1 hour | Stable over time | | Comments | 15 minutes | Relatively static after initial activity | | User profiles | 10 minutes | Changes slowly | | Subreddit info | 1 hour | Very stable | ### Cache Key Pattern ``` reddit:{endpoint}:{params_hash}:{version} ``` ## Cost Analysis ### Free Tier Economics - 100 QPM = 6,000 requests/hour = 144,000 requests/day - Sufficient for **500-1,000 moderate users** - With 75% cache hit rate: **2,000-4,000 users** sustainable ### Paid Tier Costs - $0.24 per 1,000 calls - 1M calls/month = $240/month - Typical user: 50 calls/day = 1,500/month - 1,000 users = 1.5M calls = $360/month ### Optimization Strategy 1. Aggressive caching (75%+ hit rate target) 2. Request deduplication (coalesce identical queries) 3. Batch operations where possible 4. Use `.json` endpoints for public data (no auth needed, 10 QPM limit) ## Error Handling ### Common Errors - **429 Too Many Requests**: Rate limit exceeded - exponential backoff - **401 Unauthorized**: Token expired - refresh OAuth token - **403 Forbidden**: Insufficient permissions - check scopes - **404 Not Found**: Deleted/removed content - return cached if available - **500/502/503**: Reddit server errors - retry with backoff ### Retry Strategy ```python for attempt in range(max_retries): try: response = api_request() return response except RateLimitError as e: wait = int(e.retry_after) time.sleep(wait) except ServerError as e: wait = (2 ** attempt) + random.uniform(0, 1) time.sleep(wait) ``` ## Data Structure ### Post Object (Key Fields) ```json { "id": "t3_abc123", "title": "Post title", "author": "username", "subreddit": "technology", "created_utc": 1699123456, "score": 1234, "upvote_ratio": 0.94, "num_comments": 567, "url": "https://...", "selftext": "Post content...", "permalink": "/r/technology/comments/..." } ``` ### Comment Object (Key Fields) ```json { "id": "t1_def456", "author": "commenter", "body": "Comment text", "score": 89, "created_utc": 1699123789, "depth": 0, "parent_id": "t3_abc123", "replies": [] } ``` ## Best Practices 1. **User-Agent**: Always set unique, descriptive User-Agent 2. **OAuth Tokens**: Store securely, refresh before expiration 3. **Respect robots.txt**: Reddit updates regularly 4. **Cache Aggressively**: 30+ second minimum for all responses 5. **Batch Requests**: Group operations to minimize API calls 6. **Monitor Headers**: Track `X-Ratelimit-Remaining` proactively 7. **Handle Deletion**: Detect and purge deleted content from cache 8. **Comply with ToS**: No model training without permission ## Technical Constraints for MCP Server ### Must Address - ✅ Rate limit management (100 QPM free tier) - ✅ 1,000 item pagination cap - ✅ OAuth token lifecycle management - ✅ Intelligent caching (target 75%+ hit rate) - ✅ Error handling with user-friendly messages - ✅ Deleted content detection ### Cannot Solve - ❌ NSFW content access (blocked by Reddit) - ❌ Historical data beyond 1,000 items (Pushshift shutdown) - ❌ Rate limit increases (fixed by Reddit) ## Implementation Recommendations 1. Use **PRAW** for Python (battle-tested, handles complexity) 2. Implement **Redis caching** with TTL-based invalidation 3. Build **request queue** with priority levels 4. Add **circuit breaker** for Reddit API failures 5. Monitor **cache hit rates** and optimize TTLs 6. Provide **transparent rate limit feedback** to users ## References - Reddit API Docs: https://www.reddit.com/dev/api/ - PRAW Documentation: https://praw.readthedocs.io/ - OAuth2 Spec: https://github.com/reddit-archive/reddit/wiki/OAuth2 - Rate Limits: https://support.reddithelp.com/hc/en-us/articles/16160319875092

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/padak/apify-actor-reddit-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

reddit-api-deep-dive.md•6.42 KiB