docs-mcp-server

Overview Schema Related Servers Score Discussions

configuration.md•8.41 KiB

# Configuration The Docs MCP Server uses a unified configuration system that aggregates settings from multiple sources, validating them against a strict schema. This ensures consistency whether you are running the server via CLI, Docker, or as a library. ## Configuration File By default, configuration is stored in your system's preferences directory: - **macOS**: `~/Library/Preferences/docs-mcp-server/config.yaml` - **Linux**: `~/.config/docs-mcp-server/config.yaml` - **Windows**: `%APPDATA%\docs-mcp-server\config.yaml` Example `config.yaml`: ```yaml app: storePath: ~/.docs-mcp-server telemetryEnabled: true embeddingModel: text-embedding-3-small scraper: maxPages: 1000 maxDepth: 3 document: maxSize: 10485760 # 10MB splitter: preferredChunkSize: 1500 maxChunkSize: 5000 ``` The server **automatically updates** this file on startup with new defaults. ### Using an Explicit Config File You can specify a custom config file with `--config` or `DOCS_MCP_CONFIG`: ```bash docs-mcp-server --config /path/to/config.yaml ``` **Note:** Explicit config files are treated as **read-only**. The server will not modify them. ## Overriding Configuration Configuration values are merged from multiple sources, with **later sources taking precedence**: 1. **Defaults** (lowest priority) 2. **Config File** 3. **Environment Variables** 4. **CLI Arguments** (highest priority) ### Environment Variables Any configuration setting can be overridden via environment variables using the naming convention: ``` DOCS_MCP_<SECTION>_<SETTING> ``` Rules: - Convert `camelCase` to `UPPER_SNAKE_CASE` - Join nested paths with underscores **Examples:** ```bash # Override scraper settings export DOCS_MCP_SCRAPER_MAX_PAGES=2000 export DOCS_MCP_SCRAPER_DOCUMENT_MAX_SIZE=52428800 # Override splitter settings export DOCS_MCP_SPLITTER_PREFERRED_CHUNK_SIZE=2000 # Override app settings export DOCS_MCP_APP_TELEMETRY_ENABLED=false ``` Some settings also have **legacy aliases** for convenience: | Setting | Alias | |---------|-------| | `server.ports.default` | `PORT` | | `server.host` | `HOST` | ### CLI Arguments Common settings have dedicated CLI flags: ```bash docs-mcp-server --port 8080 --host 0.0.0.0 docs-mcp-server --store-path /data/docs --read-only ``` ## CLI Configuration Commands Manage configuration directly from the command line: ```bash # View current configuration (JSON format) docs-mcp-server config # View current configuration (YAML format) docs-mcp-server config --yaml # Get a specific value docs-mcp-server config get scraper.maxPages # Output: 1000 # Get a nested object docs-mcp-server config get scraper.fetcher # Output: { "maxRetries": 6, ... } # Set a value (persists to config file) docs-mcp-server config set scraper.maxPages 500 # Output: Updated scraper.maxPages = 500 ``` **Note:** `config set` only modifies the system default configuration file. If you specify `--config`, the file is treated as read-only. --- ## Configuration Reference ### App (`app`) General application settings. | Option | Default | Description | |:-------|:--------|:------------| | `storePath` | `~/.docs-mcp-server` | Directory for storing databases and logs. | | `telemetryEnabled` | `true` | Enable anonymous usage telemetry. | | `readOnly` | `false` | Prevent modification of data (scraping/indexing). | | `embeddingModel` | `text-embedding-3-small` | Model to use for vector embeddings. | ### Server (`server`) Settings for the API and MCP servers. | Option | Default | Description | |:-------|:--------|:------------| | `protocol` | `auto` | Server protocol (`stdio`, `http`, or `auto`). | | `host` | `127.0.0.1` | Host interface to bind to. | | `heartbeatMs` | `30000` | MCP protocol heartbeat interval (ms). | | `ports.default` | `6280` | Default port for the main server. | | `ports.worker` | `8080` | Port for the background worker service. | | `ports.mcp` | `6280` | Port for the specific MCP interface. | | `ports.web` | `6281` | Port for the web dashboard. | ### Authentication (`auth`) Security settings for the HTTP server. | Option | Default | Description | |:-------|:--------|:------------| | `enabled` | `false` | Enable JWT authentication. | | `issuerUrl` | - | OIDC Issuer URL (e.g., Clerk, Auth0). | | `audience` | - | Expected JWT audience claim. | ### Scraper (`scraper`) Settings controlling the web scraping behavior. | Option | Default | Description | |:-------|:--------|:------------| | `maxPages` | `1000` | Maximum number of pages to crawl per job. | | `maxDepth` | `3` | Maximum link depth to traverse. | | `maxConcurrency` | `3` | Number of concurrent page fetches. | | `pageTimeoutMs` | `5000` | Timeout for a single page load (ms). | | `browserTimeoutMs` | `30000` | Timeout for the browser instance (ms). | | `fetcher.maxRetries` | `6` | Number of retries for failed requests. | | `fetcher.baseDelayMs` | `1000` | Initial delay for exponential backoff (ms). | | `document.maxSize` | `10485760` | Maximum size (bytes) for PDF/Office documents. | _Note: Scraper settings are often overridden per-job via CLI arguments like `--max-pages`._ > **Migration Note:** In versions prior to 1.37, `document.maxSize` was a top-level setting. It has been moved to `scraper.document.maxSize`. Update your config files accordingly. ### GitHub Authentication Environment variables for authenticating with GitHub when scraping private repositories. | Env Var | Description | | :------------- | :------------------------------------------------------------------------------------------------ | | `GITHUB_TOKEN` | GitHub personal access token or fine-grained token. Used for private repo access and higher rate limits. | | `GH_TOKEN` | Alternative to `GITHUB_TOKEN`. Used if `GITHUB_TOKEN` is not set. | **Authentication Resolution Order:** 1. Explicit `Authorization` header passed in scraper options 2. `GITHUB_TOKEN` environment variable 3. `GH_TOKEN` environment variable 4. Local `gh` CLI authentication (via `gh auth token`) If no authentication is available, public repositories are still accessible but with lower rate limits (60 requests/hour vs 5,000 authenticated). ### Splitter (`splitter`) Settings for chunking text for vector search. | Option | Default | Description | |:-------|:--------|:------------| | `minChunkSize` | `500` | Minimum characters per chunk body. Chunks below this threshold are merged with adjacent chunks by the greedy optimizer. | | `preferredChunkSize` | `1500` | Soft target for chunk body size in characters. The greedy optimizer splits when combining two chunks would exceed this value, provided both sides are already above `minChunkSize`. | | `maxChunkSize` | `5000` | Hard upper limit for chunk body size in characters. No chunk body will exceed this value. | > **Note:** These size limits apply to the **text body** of each chunk. Before embedding, > a small metadata header (page title, URL, section path) is prepended to each chunk, > adding to the total character count sent to the embedding model. Because characters are > not tokens, the actual token count depends on your embedding model's tokenizer. If your > model has a small context window (e.g., some local models), consider lowering > `maxChunkSize` to leave headroom for metadata and token expansion. ### Embeddings (`embeddings`) Settings for the vector embedding generation. > **Detailed Guide:** See [Embedding Model Configuration](../guides/embedding-models.md) for provider-specific setup (OpenAI, Ollama, Gemini, etc.). | Option | Default | Description | |:-------|:--------|:------------| | `batchSize` | `100` | Number of chunks to embed in one request. | | `vectorDimension` | `1536` | Dimension of the vector space (must match model). | ### Database (`db`) Internal database settings. | Option | Default | Description | |:-------|:--------|:------------| | `migrationMaxRetries` | `5` | Retries for database migrations on startup. | ### Assembly (`assembly`) Settings for reassembling search results. | Option | Default | Description | |:-------|:--------|:------------| | `maxChunkDistance` | `3` | Maximum sort_order difference to merge chunks. | | `maxParentChainDepth` | `10` | Maximum depth for parent context traversal. | | `childLimit` | `3` | Maximum number of child chunks to include. | | `precedingSiblingsLimit` | `1` | Number of preceding sibling chunks to include. | | `subsequentSiblingsLimit` | `2` | Number of subsequent sibling chunks to include. |

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arabold/docs-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

configuration.md•8.41 KiB