Doclea MCP

Official

Overview Schema Related Servers Score Discussions

doclea-mcp
documentation

live-e2e-benchmark-plan-2026-02-16.md•2.76 KiB

# Live E2E Benchmark Plan (2026-02-16) ## Goal Replace modeled LLM timing with real wall-clock, live-inference measurements so results reflect agent-like production behavior. ## Scope - Keep current retrieval quality metrics (recall, precision, wrong-path ratio, hallucinated/nonexistent path ratio). - Replace estimated timing with measured end-to-end latency per query and per mode. - Keep token accounting (input/output) from real requests. - Regenerate the HTML deck with explicit labels for what is measured vs modeled. ## Decision (Doc Drift) For **Doc Drift Detection Workload**, we standardize on: - `Doclea Guardrail` as the Doclea mode. - No `Doclea Full` row in doc-drift comparison charts/tables. Reason: doc-drift is a high-precision triage workflow where guardrail constraints are the intended production mode. ## Tomorrow Execution Steps 1. Add live-inference runner - Add a mode in benchmark scripts to call the selected LLM provider for actual completion. - Capture timestamps around full request lifecycle: - retrieval start/end - prompt build end - model request start/end - total end-to-end 2. Enforce apples-to-apples prompts - Same task framing and answer format across all non-Doclea baselines and Doclea mode. - Same max output tokens and temperature for all modes. - Store raw prompt token count and output token count per run. 3. Update doc-drift benchmark wiring - Filter doc-drift comparison to `Doclea Guardrail` vs non-Doclea methods only. - Remove `Doclea Full` from doc-drift tables/charts in the presentation layer. - Keep `Doclea Full` in other sections where relevant. 4. Add realism controls - Run each query with cold/warm cache variants. - Include concurrency setting (serial and small parallel batch) and report both. - Record p50/p95, not just averages. 5. Expand hard multi-file tasks - Ensure each benchmark query requires cross-app/package traversal. - Keep expected-file ground truth for correctness scoring. - Add at least one code+docs drift triage task and one dependency-chain task. 6. Output + report updates - Regenerate JSON + HTML. - Add clear “Measured live E2E” badges where applicable. - Keep a separate “Modeled” badge only where unavoidable, and isolate those charts. ## Acceptance Criteria - No sub-ms “LLM traversal” claims in live sections. - For every reported timing chart: source is explicit (`measured` vs `modeled`). - Doc-drift section shows only `Doclea Guardrail` for Doclea side. - Per-mode report includes: recall, precision, wrong-path ratio, input tokens, output tokens, p50/p95 end-to-end. ## Run Checklist - Confirm active model and region. - Confirm cache state (cold/warm). - Confirm token caps and output caps. - Run benchmark suite. - Regenerate presentation. - Sanity-check one query trace end-to-end with raw logs.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/docleaai/doclea-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

live-e2e-benchmark-plan-2026-02-16.md•2.76 KiB