Research Powerpack MCP

CLAUDE.md•9.06 KiB

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What This Is

MCP server providing 5 research tools to AI assistants: web search (Google via Serper), Reddit search (also via Serper with `site:reddit.com`) & post extraction (via Reddit OAuth), URL scraping with optional AI extraction, and AI-powered deep research synthesis. Supports STDIO, HTTP Streamable, and Cloudflare Workers transports. Node.js >= 20.0.0.

## Build & Run

```bash
npm run build           # tsc + copy src/config/yaml/ to dist/config/
npm run dev             # tsx live TypeScript
npm start               # node dist/index.js
npm run typecheck       # type check without emitting (does NOT check worker.ts)
```

```bash
npm run test:web-search     # test web search tool
npm run test:reddit-search  # test Reddit search
npm run test:scrape-links   # test scraping
npm run test:deep-research  # test deep research
npm run test:all            # run all tests
npm run test:check          # check environment setup
```

Transport modes:
```bash
npx mcp-researchpowerpack                      # STDIO (default)
MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-researchpowerpack  # HTTP
```

Binary names: `mcp-researchpowerpack`, `research-powerpack-mcp`.

## Environment Variables

Server starts with any configuration — tools are silently disabled if their API keys are missing.

**API keys (each enables a capability):**
| Variable | Enables | Free Tier |
|----------|---------|-----------|
| `SERPER_API_KEY` | `web_search`, `search_reddit` | 2,500 queries/mo |
| `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` | `get_reddit_post` | unlimited |
| `SCRAPEDO_API_KEY` | `scrape_links` | 1,000 credits/mo |
| `OPENROUTER_API_KEY` | `deep_research`, `use_llm` in scrape_links | pay-as-you-go |

Note: `search_reddit` uses Google Serper (`site:reddit.com`), NOT the Reddit API. Only `get_reddit_post` uses Reddit OAuth credentials.

**Optional tuning:**
- `RESEARCH_MODEL` — deep research primary model (default: `x-ai/grok-4-fast`)
- `RESEARCH_FALLBACK_MODEL` — fallback when primary fails (default: `google/gemini-2.5-flash`)
- `LLM_EXTRACTION_MODEL` — extraction model (default: `openai/gpt-oss-120b:nitro`)
- `API_TIMEOUT_MS` — request timeout (default: 1,800,000 / 30min)
- `DEFAULT_REASONING_EFFORT` — `low|medium|high` (default: `high`)
- `DEFAULT_MAX_URLS` — max search results per research question (default: 100, range: 10-200)
- `LLM_ENABLE_REASONING` — enable reasoning in LLM extraction (default: `true`, set `false` to disable)
- `OPENROUTER_BASE_URL` — override OpenRouter endpoint
- `DEBUG_REDDIT` — set `true` for Reddit token cache debug logging

## Architecture

```
src/
├── index.ts                    # STDIO + HTTP server entry point, graceful shutdown, stdin disconnect detection
├── worker.ts                   # Cloudflare Workers entry point (excluded from tsconfig.json, compiled by Wrangler)
├── version.ts                  # Version string (hardcoded fallback 3.6.9 for Workers where package.json unavailable)
├── config/
│   ├── index.ts                # Env parsing, capability detection, lazy Proxy config objects with resetEnvCache()
│   ├── loader.ts               # YAML tool config loader (readFileSync — incompatible with Workers runtime)
│   ├── types.ts                # Config type definitions
│   └── yaml/tools.yaml         # Complete tool specifications (single source of truth for tool metadata)
├── clients/                    # External API integrations
│   ├── search.ts               # Google Serper API (8 concurrent calls)
│   ├── reddit.ts               # Reddit OAuth API (5 concurrent calls, module-level token cache with 60s expiry)
│   ├── scraper.ts              # Scrape.do with 3-mode fallback: basic → JS rendering → JS + US geo
│   └── research.ts             # OpenRouter LLM client (tries primary model, falls back to RESEARCH_FALLBACK_MODEL)
├── tools/
│   ├── definitions.ts          # Tool metadata generated from YAML
│   ├── registry.ts             # Central handler registry & execution pipeline
│   ├── search.ts               # web_search handler (3-100 parallel keywords)
│   ├── reddit.ts               # search_reddit (10-50 queries via Serper) + get_reddit_post (via Reddit API)
│   ├── scrape.ts               # scrape_links handler (1-50 URLs, 10 concurrent, batches of 30)
│   ├── research.ts             # deep_research handler (1-10 questions, 32K token budget, file attachments)
│   └── utils.ts                # Tool-specific utilities
├── schemas/                    # Zod input validation
│   ├── web-search.ts
│   ├── scrape-links.ts
│   └── deep-research.ts
├── services/
│   ├── llm-processor.ts        # OpenRouter API integration (3 concurrent extractions, maxRetries: 0)
│   ├── markdown-cleaner.ts     # HTML → Markdown (turndown), truncates at 512K chars, linear-time comment removal
│   └── file-attachment.ts      # File attachment handling for deep_research (reads filesystem, supports line ranges)
└── utils/
    ├── errors.ts               # Error classification: retryable (429, 5xx → backoff) vs non-retryable
    ├── logger.ts               # MCP SDK structured logging (falls back to stderr)
    ├── concurrency.ts          # Bounded parallel execution (pMap)
    ├── url-aggregator.ts       # CTR-weighted URL ranking with consensus detection
    ├── markdown-formatter.ts   # Markdown output formatting
    └── response.ts             # 70/20/10 response formatter (formatSuccess, formatError, formatBatchHeader)
```

**Key design decisions:**
- **Tool specs in YAML** (`config/yaml/tools.yaml`) — single source of truth for tool names, descriptions, and parameter specs. Copied into `dist/config/` during build. If you add new YAML files, they must be in this directory.
- **Never-throw pattern** — server never crashes on tool failures. All errors go through `classifyError()` which categorizes as retryable (429, 5xx → exponential backoff) or non-retryable (→ user-friendly message with setup instructions).
- **Capability-based degradation** — missing API keys disable specific tools with helpful setup instructions rather than failing.
- **Bounded concurrency** — web_search: 8, search_reddit: 8, get_reddit_post: 5 (batches of 10), scrape_links: 10 (batches of 30) + 3 LLM extractions, deep_research: 3.
- **CTR-weighted URL ranking** — search results ranked by click-through rates with consensus detection across multiple queries.
- **Smart Reddit comment allocation** — 1000 total budget, capped at 200/post: 5+ posts → budget/count each, 2 posts → 200 each (capped from 500).
- **70/20/10 response format** — all tools return: 70% summary, 20% structured data, 10% actionable next steps.
- **Model fallback** — `ResearchClient` tries primary model, then falls back to `RESEARCH_FALLBACK_MODEL` on failure.
- **Scraper fallback** — `ScraperClient.scrapeWithFallback()` tries 3 modes: basic → JavaScript rendering → JavaScript + US geo-targeting.
- **Gemini special handling** — models matching `google/gemini*` get `tools: [{type: 'google_search'}]` instead of `search_parameters`.

**Execution pipeline** (`tools/registry.ts`): lookup tool → check capability → validate with Zod → execute handler → transform response. Every step catches errors gracefully.

## Gotchas

- **ESM project** (`"type": "module"`) — all imports must use `.js` extensions.
- **worker.ts excluded from tsconfig** — `npm run typecheck` does NOT check the Workers entry point. Use `npx wrangler deploy --dry-run` to verify.
- **Build copies YAML** — the build script copies `src/config/yaml/` to `dist/config/`. New YAML files must be in this directory.
- **version.ts hardcoded fallback** — has a stale fallback version for Workers runtime where `package.json` is unavailable. Can drift from actual version.
- **All OpenAI client instances set `maxRetries: 0`** — retry logic is handled manually with custom backoff.
- **Lazy Proxy config** — `RESEARCH`, `LLM_EXTRACTION` config objects use Proxy for deferred env reads. `resetEnvCache()` clears them for Workers env bridging.
- **stdin disconnect detection** — explicit handlers for `stdin close/end` and `stdout EPIPE` to prevent 100% CPU when parent process disconnects.

## Cloudflare Workers

`src/worker.ts` bridges Cloudflare env bindings into `process.env` so config modules work in both STDIO and Workers. Avoids importing `tools/definitions.ts` (which uses filesystem YAML loading) and instead registers tools directly from the registry. Uses Durable Objects (`ResearchPowerpackMCP` class with SQLite). Custom domain: `mcp-researchpowerpack.workers.yigitkonur.com`.

## CI/CD

GitHub Actions (`.github/workflows/npmrelease.yml`): pushes to main auto-publish to npm with OIDC provenance. Auto-bumps patch version if already published. Ignores markdown/docs-only changes. `[skip ci]` in commit message skips build. Manual trigger supports patch/minor/major bump.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yigitkonur/research-powerpack-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•9.06 KiB

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What This Is

MCP server providing 5 research tools to AI assistants: web search (Google via Serper), Reddit search (also via Serper with `site:reddit.com`) & post extraction (via Reddit OAuth), URL scraping with optional AI extraction, and AI-powered deep research synthesis. Supports STDIO, HTTP Streamable, and Cloudflare Workers transports. Node.js >= 20.0.0.

## Build & Run

```bash
npm run build           # tsc + copy src/config/yaml/ to dist/config/
npm run dev             # tsx live TypeScript
npm start               # node dist/index.js
npm run typecheck       # type check without emitting (does NOT check worker.ts)
```

```bash
npm run test:web-search     # test web search tool
npm run test:reddit-search  # test Reddit search
npm run test:scrape-links   # test scraping
npm run test:deep-research  # test deep research
npm run test:all            # run all tests
npm run test:check          # check environment setup
```

Transport modes:
```bash
npx mcp-researchpowerpack                      # STDIO (default)
MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-researchpowerpack  # HTTP
```

Binary names: `mcp-researchpowerpack`, `research-powerpack-mcp`.

## Environment Variables

Server starts with any configuration — tools are silently disabled if their API keys are missing.

**API keys (each enables a capability):**
| Variable | Enables | Free Tier |
|----------|---------|-----------|
| `SERPER_API_KEY` | `web_search`, `search_reddit` | 2,500 queries/mo |
| `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` | `get_reddit_post` | unlimited |
| `SCRAPEDO_API_KEY` | `scrape_links` | 1,000 credits/mo |
| `OPENROUTER_API_KEY` | `deep_research`, `use_llm` in scrape_links | pay-as-you-go |

Note: `search_reddit` uses Google Serper (`site:reddit.com`), NOT the Reddit API. Only `get_reddit_post` uses Reddit OAuth credentials.

**Optional tuning:**
- `RESEARCH_MODEL` — deep research primary model (default: `x-ai/grok-4-fast`)
- `RESEARCH_FALLBACK_MODEL` — fallback when primary fails (default: `google/gemini-2.5-flash`)
- `LLM_EXTRACTION_MODEL` — extraction model (default: `openai/gpt-oss-120b:nitro`)
- `API_TIMEOUT_MS` — request timeout (default: 1,800,000 / 30min)
- `DEFAULT_REASONING_EFFORT` — `low|medium|high` (default: `high`)
- `DEFAULT_MAX_URLS` — max search results per research question (default: 100, range: 10-200)
- `LLM_ENABLE_REASONING` — enable reasoning in LLM extraction (default: `true`, set `false` to disable)
- `OPENROUTER_BASE_URL` — override OpenRouter endpoint
- `DEBUG_REDDIT` — set `true` for Reddit token cache debug logging

## Architecture

```
src/
├── index.ts                    # STDIO + HTTP server entry point, graceful shutdown, stdin disconnect detection
├── worker.ts                   # Cloudflare Workers entry point (excluded from tsconfig.json, compiled by Wrangler)
├── version.ts                  # Version string (hardcoded fallback 3.6.9 for Workers where package.json unavailable)
├── config/
│   ├── index.ts                # Env parsing, capability detection, lazy Proxy config objects with resetEnvCache()
│   ├── loader.ts               # YAML tool config loader (readFileSync — incompatible with Workers runtime)
│   ├── types.ts                # Config type definitions
│   └── yaml/tools.yaml         # Complete tool specifications (single source of truth for tool metadata)
├── clients/                    # External API integrations
│   ├── search.ts               # Google Serper API (8 concurrent calls)
│   ├── reddit.ts               # Reddit OAuth API (5 concurrent calls, module-level token cache with 60s expiry)
│   ├── scraper.ts              # Scrape.do with 3-mode fallback: basic → JS rendering → JS + US geo
│   └── research.ts             # OpenRouter LLM client (tries primary model, falls back to RESEARCH_FALLBACK_MODEL)
├── tools/
│   ├── definitions.ts          # Tool metadata generated from YAML
│   ├── registry.ts             # Central handler registry & execution pipeline
│   ├── search.ts               # web_search handler (3-100 parallel keywords)
│   ├── reddit.ts               # search_reddit (10-50 queries via Serper) + get_reddit_post (via Reddit API)
│   ├── scrape.ts               # scrape_links handler (1-50 URLs, 10 concurrent, batches of 30)
│   ├── research.ts             # deep_research handler (1-10 questions, 32K token budget, file attachments)
│   └── utils.ts                # Tool-specific utilities
├── schemas/                    # Zod input validation
│   ├── web-search.ts
│   ├── scrape-links.ts
│   └── deep-research.ts
├── services/
│   ├── llm-processor.ts        # OpenRouter API integration (3 concurrent extractions, maxRetries: 0)
│   ├── markdown-cleaner.ts     # HTML → Markdown (turndown), truncates at 512K chars, linear-time comment removal
│   └── file-attachment.ts      # File attachment handling for deep_research (reads filesystem, supports line ranges)
└── utils/
    ├── errors.ts               # Error classification: retryable (429, 5xx → backoff) vs non-retryable
    ├── logger.ts               # MCP SDK structured logging (falls back to stderr)
    ├── concurrency.ts          # Bounded parallel execution (pMap)
    ├── url-aggregator.ts       # CTR-weighted URL ranking with consensus detection
    ├── markdown-formatter.ts   # Markdown output formatting
    └── response.ts             # 70/20/10 response formatter (formatSuccess, formatError, formatBatchHeader)
```

**Key design decisions:**
- **Tool specs in YAML** (`config/yaml/tools.yaml`) — single source of truth for tool names, descriptions, and parameter specs. Copied into `dist/config/` during build. If you add new YAML files, they must be in this directory.
- **Never-throw pattern** — server never crashes on tool failures. All errors go through `classifyError()` which categorizes as retryable (429, 5xx → exponential backoff) or non-retryable (→ user-friendly message with setup instructions).
- **Capability-based degradation** — missing API keys disable specific tools with helpful setup instructions rather than failing.
- **Bounded concurrency** — web_search: 8, search_reddit: 8, get_reddit_post: 5 (batches of 10), scrape_links: 10 (batches of 30) + 3 LLM extractions, deep_research: 3.
- **CTR-weighted URL ranking** — search results ranked by click-through rates with consensus detection across multiple queries.
- **Smart Reddit comment allocation** — 1000 total budget, capped at 200/post: 5+ posts → budget/count each, 2 posts → 200 each (capped from 500).
- **70/20/10 response format** — all tools return: 70% summary, 20% structured data, 10% actionable next steps.
- **Model fallback** — `ResearchClient` tries primary model, then falls back to `RESEARCH_FALLBACK_MODEL` on failure.
- **Scraper fallback** — `ScraperClient.scrapeWithFallback()` tries 3 modes: basic → JavaScript rendering → JavaScript + US geo-targeting.
- **Gemini special handling** — models matching `google/gemini*` get `tools: [{type: 'google_search'}]` instead of `search_parameters`.

**Execution pipeline** (`tools/registry.ts`): lookup tool → check capability → validate with Zod → execute handler → transform response. Every step catches errors gracefully.

## Gotchas

- **ESM project** (`"type": "module"`) — all imports must use `.js` extensions.
- **worker.ts excluded from tsconfig** — `npm run typecheck` does NOT check the Workers entry point. Use `npx wrangler deploy --dry-run` to verify.
- **Build copies YAML** — the build script copies `src/config/yaml/` to `dist/config/`. New YAML files must be in this directory.
- **version.ts hardcoded fallback** — has a stale fallback version for Workers runtime where `package.json` is unavailable. Can drift from actual version.
- **All OpenAI client instances set `maxRetries: 0`** — retry logic is handled manually with custom backoff.
- **Lazy Proxy config** — `RESEARCH`, `LLM_EXTRACTION` config objects use Proxy for deferred env reads. `resetEnvCache()` clears them for Workers env bridging.
- **stdin disconnect detection** — explicit handlers for `stdin close/end` and `stdout EPIPE` to prevent 100% CPU when parent process disconnects.

## Cloudflare Workers

`src/worker.ts` bridges Cloudflare env bindings into `process.env` so config modules work in both STDIO and Workers. Avoids importing `tools/definitions.ts` (which uses filesystem YAML loading) and instead registers tools directly from the registry. Uses Durable Objects (`ResearchPowerpackMCP` class with SQLite). Custom domain: `mcp-researchpowerpack.workers.yigitkonur.com`.

## CI/CD

GitHub Actions (`.github/workflows/npmrelease.yml`): pushes to main auto-publish to npm with OIDC provenance. Auto-bumps patch version if already published. Ignores markdown/docs-only changes. `[skip ci]` in commit message skips build. Manual trigger supports patch/minor/major bump.