solocrawl
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| SOLOCRAWL_LOG_FILE | No | Optional log file path (also logs to stderr) | |
| SOLOCRAWL_LOG_LEVEL | No | Log level: DEBUG, INFO, WARNING, ERROR | |
| SOLOCRAWL_PROXY_LIST | No | Comma-separated proxy URLs | |
| SOLOCRAWL_PROXY_MODE | No | Proxy mode: list (rotate a pool) or endpoint (single rotating endpoint) | |
| SOLOCRAWL_USER_AGENT | No | Override HTTP User-Agent for API requests | |
| SOLOCRAWL_MAX_RETRIES | No | Retries on network errors / rate limits | |
| SOLOCRAWL_SEARXNG_URL | No | Base URL of a self-hosted SearXNG instance (enables the searxng provider) | |
| SOLOCRAWL_PROXY_ENABLED | No | Enable optional proxy layer | |
| SOLOCRAWL_PROXY_ENDPOINT | No | Single rotating proxy endpoint | |
| SOLOCRAWL_PROXY_PASSWORD | No | Proxy auth password | |
| SOLOCRAWL_PROXY_USERNAME | No | Proxy auth username | |
| SOLOCRAWL_RESPECT_ROBOTS | No | Honour robots.txt on scrape (fail-open); set false to skip | |
| SOLOCRAWL_BROWSER_ALLOWED | No | Allow Playwright fallback when installed | |
| SOLOCRAWL_MAX_CONCURRENCY | No | Global fetch concurrency limit | |
| SOLOCRAWL_TIMEOUT_SECONDS | No | Per-request timeout in seconds | |
| SOLOCRAWL_ENABLE_PROVIDERS | No | Comma-separated opt-in provider names | |
| SOLOCRAWL_PER_DOMAIN_LIMIT | No | Per-domain concurrency limit | |
| SOLOCRAWL_CACHE_TTL_SECONDS | No | In-memory fetch cache TTL in seconds (0 = disabled) | |
| SOLOCRAWL_MAX_RESPONSE_BYTES | No | Cap on fetched response body size (10 MiB) | |
| SOLOCRAWL_ALLOW_INTERNAL_URLS | No | Allow scraping localhost/private IPs (dev only) |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| logging | {} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| extensions | {
"io.modelcontextprotocol/ui": {}
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| web_searchB | Search the web across SoloCrawl's configured providers and return unified results. |
| scrapeA | Fetch a URL and return the main page content as markdown suitable for LLM context. |
| researchA | Search the web, scrape the top results, and return an aggregated cited report. |
| package_versionB | Look up the latest or constraint-satisfying version of a package from a registry. |
| list_providersA | List the registered search and package providers (default vs. opt-in). |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/hlavacm/solocrawl'
If you have feedback or need assistance with the MCP directory API, please join our Discord server