Which integrations are available for this server?

Provides web search capabilities through DuckDuckGo HTML endpoint with locale support and rate limiting, allowing LLMs to retrieve search results without API costs Enables code search across GitHub repositories with support for advanced queries including language, repo, and file filters, allowing LLMs to find and extract code examples Converts extracted web content to sanitized Markdown format with limited formatting (headings, bold, italic, links only) optimized for LLM consumption Uses Mozilla's Readability library for content extraction from web pages, enabling LLMs to retrieve clean, article-focused content from arbitrary web pages

How do I use LLM Researcher?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@LLM Researcher search for React hooks best practices on GitHub" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

LLM Researcher

A lightweight MCP (Model Context Protocol) server for LLM orchestration that provides efficient web content search and extraction capabilities. This CLI tool enables LLMs to search DuckDuckGo and extract clean, LLM-friendly content from web pages.

Built with TypeScript, tsup, and vitest for modern development experience.

Features

MCP Server Support: Provides Model Context Protocol server for LLM integration
Free Operation: Uses DuckDuckGo HTML endpoint (no API costs)
GitHub Code Search: Search GitHub repositories for code examples and implementation patterns
Smart Content Extraction: Playwright + @mozilla/readability for clean content
LLM-Optimized Output: Sanitized Markdown (h1-h3, bold, italic, links only)
Rate Limited: Respects DuckDuckGo with 1 req/sec limit
Cross-Platform: Works on macOS, Linux, and WSL
Multiple Modes: CLI, MCP server, search, direct URL, and interactive modes
Type Safe: Full TypeScript implementation with strict typing
Modern Tooling: Built with tsup bundler and vitest testing

Related MCP server: MCP-Repo2LLM

Installation

Prerequisites

Node.js 20.0.0 or higher
No local Chrome installation required (uses Playwright's bundled Chromium)

Setup

# Clone or download the project cd light-research-mcp # Install dependencies (using pnpm) pnpm install # Build the project pnpm build # Install Playwright browsers pnpm install-browsers # Optional: Link globally for system-wide access pnpm link --global

Usage

MCP Server Mode

Use as a Model Context Protocol server to provide search and content extraction tools to LLMs:

# Start MCP server (stdio transport) llmresearcher --mcp # The server provides these tools to MCP clients: # - github_code_search: Search GitHub repositories for code # - duckduckgo_web_search: Search the web with DuckDuckGo # - extract_content: Extract detailed content from URLs

Setting up with Claude Code

# Add as an MCP server to Claude Code claude mcp add light-research-mcp /path/to/light-research-mcp/dist/bin/llmresearcher.js --mcp # Or with project scope for team sharing claude mcp add light-research-mcp -s project /path/to/light-research-mcp/dist/bin/llmresearcher.js --mcp # List configured servers claude mcp list # Check server status claude mcp get light-research-mcp

MCP Tool Usage Examples

Once configured, you can use these tools in Claude:

> Search for React hooks examples on GitHub Tool: github_code_search Query: "useState useEffect hooks language:javascript" > Search for TypeScript best practices Tool: duckduckgo_web_search Query: "TypeScript best practices 2024" Locale: us-en (or wt-wt for no region) > Extract content from a search result Tool: extract_content URL: https://example.com/article-from-search-results

Command Line Interface

# Search mode - Search DuckDuckGo and interactively browse results llmresearcher "machine learning transformers" # GitHub Code Search mode - Search GitHub for code llmresearcher -g "useState hooks language:typescript" # Direct URL mode - Extract content from specific URL llmresearcher -u https://example.com/article # Interactive mode - Enter interactive search session llmresearcher # Verbose logging - See detailed operation logs llmresearcher -v "search query" # MCP Server mode - Start as Model Context Protocol server llmresearcher --mcp

Development

Scripts

# Build the project pnpm build # Build in watch mode (for development) pnpm dev # Run tests pnpm test # Run tests in CI mode (single run) pnpm test:run # Type checking pnpm type-check # Clean build artifacts pnpm clean # Install Playwright browsers pnpm install-browsers

Interactive Commands

When in search results view:

1-10: Select a result by number
b or back: Return to search results
open <n>: Open result #n in external browser
q or quit: Exit the program

When viewing content:

b or back: Return to search results
/<term>: Search for term within the extracted content
open: Open current page in external browser
q or quit: Exit the program

Configuration

Environment Variables

Create a .env file in the project root:

USER_AGENT=Mozilla/5.0 (compatible; LLMResearcher/1.0) TIMEOUT=30000 MAX_RETRIES=3 RATE_LIMIT_DELAY=1000 CACHE_ENABLED=true MAX_RESULTS=10

Configuration File

Create ~/.llmresearcherrc in your home directory:

{ "userAgent": "Mozilla/5.0 (compatible; LLMResearcher/1.0)", "timeout": 30000, "maxRetries": 3, "rateLimitDelay": 1000, "cacheEnabled": true, "maxResults": 10 }

Configuration Options

Option	Default	Description
`userAgent`	`Mozilla/5.0 (compatible; LLMResearcher/1.0)`	User agent for HTTP requests
`timeout`	`30000`	Request timeout in milliseconds
`maxRetries`	`3`	Maximum retry attempts for failed requests
`rateLimitDelay`	`1000`	Delay between requests in milliseconds
`cacheEnabled`	`true`	Enable/disable local caching
`maxResults`	`10`	Maximum search results to display

Architecture

Core Components

MCPResearchServer (src/mcp-server.ts)
- Model Context Protocol server implementation
- Three main tools: github_code_search, duckduckgo_web_search, extract_content
- JSON-based responses for LLM consumption
DuckDuckGoSearcher (src/search.ts)
- HTML scraping of DuckDuckGo search results with locale support
- URL decoding for /l/?uddg= format links
- Rate limiting and retry logic
GitHubCodeSearcher (src/github-code-search.ts)
- GitHub Code Search API integration via gh CLI
- Advanced query support with language, repo, and file filters
- Authentication and rate limiting
ContentExtractor (src/extractor.ts)
- Playwright-based page rendering with resource blocking
- @mozilla/readability for main content extraction
- DOMPurify sanitization and Markdown conversion
CLIInterface (src/cli.ts)
- Interactive command-line interface
- Search result navigation
- Content viewing and text search
Configuration (src/config.ts)
- Environment and RC file configuration loading
- Verbose logging support

Content Processing Pipeline

MCP Server Mode

Search:
- DuckDuckGo: HTML endpoint → Parse results → JSON response with pagination
- GitHub: Code Search API → Format results → JSON response with code snippets
Extract: URL from search results → Playwright navigation → Content extraction
Process: @mozilla/readability → DOMPurify sanitization → Clean JSON output
Output: Structured JSON for LLM consumption

CLI Mode

Search: DuckDuckGo HTML endpoint → Parse results → Display numbered list
Extract: Playwright navigation → Resource blocking → JS rendering
Process: @mozilla/readability → DOMPurify sanitization → Turndown Markdown
Output: Clean Markdown with h1-h3, bold, italic, links only

Security Features

Resource Blocking: Prevents loading of images, CSS, fonts for speed and security
Content Sanitization: DOMPurify removes scripts, iframes, and dangerous elements
Limited Markdown: Only allows safe formatting elements (h1-h3, strong, em, a)
Rate Limiting: Respects DuckDuckGo's rate limits with exponential backoff

Examples

MCP Server Usage with Claude Code

1. GitHub Code Search

You: "Find React hook examples for state management" Claude uses github_code_search tool: { "query": "useState useReducer state management language:javascript", "results": [ { "title": "facebook/react/packages/react/src/ReactHooks.js", "url": "https://raw.githubusercontent.com/facebook/react/main/packages/react/src/ReactHooks.js", "snippet": "function useState(initialState) {\n return dispatcher.useState(initialState);\n}" } ], "pagination": { "currentPage": 1, "hasNextPage": true, "nextPageToken": "2" } }

2. Web Search with Locale

You: "Search for Vue.js tutorials in Japanese" Claude uses duckduckgo_web_search tool: { "query": "Vue.js チュートリアル入門", "locale": "jp-jp", "results": [ { "title": "Vue.js入門ガイド", "url": "https://example.com/vue-tutorial", "snippet": "Vue.jsの基本的な使い方を学ぶチュートリアル..." } ] }

3. Content Extraction

You: "Extract the full content from that Vue.js tutorial" Claude uses extract_content tool: { "url": "https://example.com/vue-tutorial", "title": "Vue.js入門ガイド", "extractedAt": "2024-01-15T10:30:00.000Z", "content": "# Vue.js入門ガイド\n\nVue.jsは...\n\n## インストール\n\n..." }

CLI Examples

Basic Search

$ llmresearcher "python web scraping" 🔍 Search Results: ══════════════════════════════════════════════════ 1. Python Web Scraping Tutorial URL: https://realpython.com/python-web-scraping-practical-introduction/ Complete guide to web scraping with Python using requests and Beautiful Soup... 2. Web Scraping with Python - BeautifulSoup and requests URL: https://www.dataquest.io/blog/web-scraping-python-tutorial/ Learn how to scrape websites with Python, Beautiful Soup, and requests... ══════════════════════════════════════════════════ Commands: [1-10] select result | b) back | q) quit | open <n>) open in browser > 1 📥 Extracting content from: Python Web Scraping Tutorial 📄 Content: ══════════════════════════════════════════════════ **Python Web Scraping Tutorial** Source: https://realpython.com/python-web-scraping-practical-introduction/ Extracted: 2024-01-15T10:30:00.000Z ────────────────────────────────────────────────── # Python Web Scraping: A Practical Introduction Web scraping is the process of collecting and parsing raw data from the web... ## What Is Web Scraping? Web scraping is a technique to automatically access and extract large amounts... ══════════════════════════════════════════════════ Commands: b) back to results | /<term>) search in text | q) quit | open) open in browser > /beautiful soup 🔍 Found 3 matches for "beautiful soup": ────────────────────────────────────────────────── Line 15: Beautiful Soup is a Python library for parsing HTML and XML documents. Line 42: from bs4 import BeautifulSoup Line 67: soup = BeautifulSoup(html_content, 'html.parser')

Direct URL Mode

$ llmresearcher -u https://docs.python.org/3/tutorial/ 📄 Content: ══════════════════════════════════════════════════ **The Python Tutorial** Source: https://docs.python.org/3/tutorial/ Extracted: 2024-01-15T10:35:00.000Z ────────────────────────────────────────────────── # The Python Tutorial Python is an easy to learn, powerful programming language... ## An Informal Introduction to Python In the following examples, input and output are distinguished...

Verbose Mode

$ llmresearcher -v "nodejs tutorial" [VERBOSE] Searching: https://duckduckgo.com/html/?q=nodejs%20tutorial&kl=us-en [VERBOSE] Response: 200 in 847ms [VERBOSE] Parsed 10 results [VERBOSE] Launching browser... [VERBOSE] Blocking resource: https://example.com/style.css [VERBOSE] Blocking resource: https://example.com/image.png [VERBOSE] Navigating to page... [VERBOSE] Page loaded in 1243ms [VERBOSE] Processing content with Readability... [VERBOSE] Readability extraction successful [VERBOSE] Closing browser...

Testing

Running Tests

# Run tests in watch mode pnpm test # Run tests once (CI mode) pnpm test:run # Run tests with coverage pnpm test -- --coverage

Test Coverage

The test suite includes:

Unit Tests: Individual component testing
- search.test.ts: DuckDuckGo search functionality, URL decoding, rate limiting
- extractor.test.ts: Content extraction, Markdown conversion, resource management
- config.test.ts: Configuration validation and environment handling
Integration Tests: End-to-end workflow testing
- integration.test.ts: Complete search-to-extraction workflows, error handling, cleanup

Test Features

Fast: Powered by vitest for quick feedback
Type-safe: Full TypeScript support in tests
Isolated: Each test cleans up its resources
Comprehensive: Covers search, extraction, configuration, and integration scenarios

Troubleshooting

Common Issues

"Browser not found" Error

pnpm install-browsers

Rate Limiting Issues

The tool automatically handles rate limiting with 1-second delays
If you encounter 429 errors, the tool will automatically retry with exponential backoff

Content Extraction Failures

Some sites may block automated access
The tool includes fallback extraction methods (main → body content)
Use verbose mode (-v) to see detailed error information

Permission Denied (Unix/Linux)

chmod +x bin/llmresearcher.js

Performance Optimization

The tool is optimized for speed:

Resource Blocking: Automatically blocks images, CSS, fonts
Network Idle: Waits for JavaScript to complete rendering
Content Caching: Supports local caching to avoid repeated requests
Minimal Dependencies: Uses lightweight, focused libraries

Development

Project Structure

light-research-mcp/ ├── dist/ # Built JavaScript files (generated) │ ├── bin/ │ │ └── llmresearcher.js # CLI entry point (executable) │ └── *.js # Compiled TypeScript modules ├── src/ # TypeScript source files │ ├── bin.ts # CLI entry point │ ├── index.ts # Main LLMResearcher class │ ├── mcp-server.ts # MCP server implementation │ ├── search.ts # DuckDuckGo search implementation │ ├── github-code-search.ts # GitHub Code Search implementation │ ├── extractor.ts # Content extraction with Playwright │ ├── cli.ts # Interactive CLI interface │ ├── config.ts # Configuration management │ └── types.ts # TypeScript type definitions ├── test/ # Test files (vitest) │ ├── search.test.ts # Search functionality tests │ ├── extractor.test.ts # Content extraction tests │ ├── config.test.ts # Configuration tests │ ├── mcp-locale.test.ts # MCP locale functionality tests │ ├── mcp-content-extractor.test.ts # MCP content extractor tests │ └── integration.test.ts # End-to-end integration tests ├── tsconfig.json # TypeScript configuration ├── tsup.config.ts # Build configuration ├── vitest.config.ts # Test configuration ├── package.json └── README.md

Dependencies

Runtime Dependencies

@modelcontextprotocol/sdk: Model Context Protocol server implementation
@mozilla/readability: Content extraction from HTML
cheerio: HTML parsing for search results
commander: CLI argument parsing
dompurify: HTML sanitization
dotenv: Environment variable loading
jsdom: DOM manipulation for server-side processing
playwright: Browser automation for JS rendering
turndown: HTML to Markdown conversion

Development Dependencies

typescript: TypeScript compiler
tsup: Fast TypeScript bundler
vitest: Fast unit test framework
@types/*: TypeScript type definitions

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Roadmap

Planned Features

Enhanced MCP Tools: Additional specialized search tools for documentation, APIs, etc.
Caching Layer: SQLite-based URL → Markdown caching with 24-hour TTL
Search Engine Abstraction: Support for Brave Search, Bing, and other engines
Content Summarization: Optional AI-powered content summarization
Export Formats: JSON, plain text, and other output formats
Batch Processing: Process multiple URLs from file input
SSE Transport: Support for Server-Sent Events MCP transport

Performance Improvements

Parallel Processing: Concurrent content extraction for multiple results
Smart Caching: Intelligent cache invalidation based on content freshness
Memory Optimization: Streaming content processing for large documents

LLM Researcher

Features

Installation

Prerequisites

Setup

Usage

MCP Server Mode

Setting up with Claude Code

MCP Tool Usage Examples

Command Line Interface

Development

Scripts

Interactive Commands

Configuration

Environment Variables

Configuration File

Configuration Options

Architecture

Core Components

Content Processing Pipeline

MCP Server Mode

CLI Mode

Security Features

Examples

MCP Server Usage with Claude Code

1. GitHub Code Search

2. Web Search with Locale

3. Content Extraction

CLI Examples

Basic Search

Direct URL Mode

Verbose Mode

Testing

Running Tests

Test Coverage

Test Features

Troubleshooting

Common Issues

Performance Optimization

Development

Project Structure

Dependencies

Runtime Dependencies

Development Dependencies

License

Contributing

Roadmap

Planned Features

Performance Improvements

Resources

Appeared in Searches

Latest Blog Posts

MCP directory API