97% of #MCP tool descriptions have quality defects

NOTE

We renamed "Tool Description Quality Score" to "Tool Definition Quality Score" to more accurately reflect what is being measured.

NOTE

We have since made TDQS frameworkopen-source

Two studies we have referenced as part of this undertaking:

Paper 1: "MCP Tool Descriptions Are Smelly" (SAIL Research, arXiv 2602.14878, Feb 2026)

A study of 856 tools across 103 MCP servers found that 97% of tool descriptions contain at least one quality defect. 56% don't clearly state what the tool does. 89% fail to mention when you should or shouldn't use them.

Why does this matter? Because the description is the only thing an AI agent reads when deciding which tool to call. A vague description means wrong tool selection, wasted steps, and failed tasks.

Paper 2: "From Docs to Descriptions" (arXiv 2602.18914, Feb 2026)

A second study of 10,831 MCP servers quantified the impact: tools with well-written descriptions get selected 260% more often in competitive settings where multiple servers offer similar functionality.

The good news: fixing descriptions works. Improving them boosts task success rates by ~6 percentage points. And the fixes are specific and actionable – not "write better docs" but "your description doesn't distinguish this tool from three siblings on the same server."

We built a Tool Description Quality Score (TDQS) that evaluates every tool across six research-backed dimensions:

Purpose Clarity
Usage Guidelines
Behavioral Transparency
Parameter Semantics
Conciseness
Contextual Completeness

Each tool gets a 1–5 score per dimension, rolled up into an overall tier. Server authors see exactly what's broken and how to fix it. Users see which servers will actually work well with their agents.

We are making the TDQS framework Open Source so that anyone can evaluate MCP servers as part of their development workflow.

We are also rolling this out to all Glama hosted MCP servers and MCP connectors. It will complement our existing MCP server scoring system that's used to identify well crafted servers.

If you are a researcher and would like collaborate, get in touch.

The data is still populating but you can already see a preview on some of the tools. Example:

TDQS Authors: Frank Fiegel and Om Shree