ReftrixMCP
Server Quality Checklist
This repository includes a README.md file.
This repository includes a LICENSE file.
Latest release: v0.3.0
No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.
Tip: use the "Try in Browser" feature on the server page to seed initial usage.
This repository includes a glama.json configuration file.
- This server provides 35 tools. View schema
No known security issues or vulnerabilities reported.
This server has been verified by its author.
Tool Scores
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true and idempotentHint=true, establishing the safety profile. The description adds context about the evaluation methodology (three axes, cliche detection) but omits significant behavioral details like Playwright vs JSDOM execution, responsive evaluation capabilities, pattern comparison features, and the summary mode's truncation behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
The single sentence is information-dense with zero waste, clearly listing the three evaluation axes and cliche detection feature. However, given the tool's complexity (12 parameters, nested objects, multiple evaluation modes), it may be excessively concise rather than appropriately sized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness2/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex tool with 12 parameters including responsive evaluation, accessibility testing via Playwright, pattern comparison, and contextual analysis capabilities, the description is incomplete. It mentions only the core 3-axis evaluation and cliche detection, missing major functional areas that would help an agent understand the full scope.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description implicitly references the 'weights' parameter via the three axes mention but provides no additional parameter guidance, syntax examples, or explanations beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool evaluates web design quality using three specific axes (originality, craftsmanship, contextuality) and mentions AI cliche detection. However, it fails to distinguish this from siblings like accessibility.audit, design.compare, or page.analyze, which could confuse tool selection.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like accessibility.audit or design.compare, nor does it clarify input requirements (e.g., when to use pageId vs html) or prerequisites. No alternatives or exclusions are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable security context beyond annotations: SSRF protection scope and HTML sanitization. However, despite readOnlyHint: false indicating state mutation, the description fails to disclose the tool's persistence behavior (save_to_db, auto_analyze features) or what computational resources it consumes (browser instances, WebGL processes).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste. The description front-loads the core action, follows with security constraints, and ends with data processing guarantees. Every sentence earns its place with high information density.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness2/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (4 top-level parameters with deeply nested options including responsive analysis, external CSS fetching, browser process management, and DB persistence), the description is insufficient. It omits major capabilities like responsive viewport analysis, database integration, and timeout/performance behaviors that would help an agent configure the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description implicitly references the url parameter and screenshot/html retrieval options but does not add syntax guidance, format details, or semantic explanations beyond what the schema already provides for the 25+ nested option fields.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb (Fetch), resource (HTML/screenshot from URL), and purpose (layout analysis). However, it does not distinguish this tool from siblings like layout.batch_ingest or layout.inspect, leaving ambiguity about which layout tool to use for single vs. multiple pages or external vs. internal analysis.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions SSRF protection constraints (blocks private IPs), which implies limitations, but provides no positive guidance on when to prefer this tool over layout.batch_ingest for bulk operations or layout.inspect for existing data. No workflow context or prerequisites are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate read-only and idempotent operations. The description adds what gets extracted (sections, grid, typography) but omits significant behavioral details present in the schema, such as the optional Vision API integration for visual analysis and CPU-intensive processing options.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, front-loaded with the core action. Efficiently worded without filler, though arguably too terse given the tool's complexity and multiple input modes.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 100% schema coverage, the description does not need to enumerate parameters, but it should mention the dual input methods (ID vs HTML) and Vision API capability given their functional significance. No output schema exists, yet the description does not hint at return structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is adequately met. The description maps loosely to the extraction options (detectSections, detectGrid, analyzeTypography) but does not clarify the mutual exclusivity relationship between 'id' and 'html' inputs or the nested visionOptions configuration.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool parses HTML and extracts specific layout elements (sections, grid, typography). However, it does not differentiate from sibling tools like layout.ingest or layout.search, which may also process HTML but for different purposes (storage vs. analysis).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus layout.ingest, layout.search, or page.analyze. No mention of prerequisites for the 'id' parameter (WebPage from DB) versus direct 'html' input, or when to enable the Vision API option.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds valuable context about return values (a score and suggestions) that annotations lack, but does not disclose validation criteria, scoring ranges, or specific checks performed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficiently structured sentence of 11 words. It is front-loaded with the verb and contains no redundant or filler text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complex nested input schema and lack of output schema, the description adequately mentions return values (score + suggestions). However, it omits validation scope details (e.g., required vs optional field checking) that would help an agent predict the tool's utility for specific brief states.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description implicitly references the 'brief' parameter by mentioning 'design brief', but adds no semantic details about 'strictMode' or parameter interdependencies beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Validate') and resource ('design brief'), and specifies the outputs ('completeness score with improvement suggestions'). However, it does not explicitly differentiate from similar validation tools like 'quality.evaluate' or 'audit.query'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites (e.g., brief completion requirements) or when to avoid using it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only and idempotent behavior. The description adds valuable context by specifying the exact detection types (coverage, anomaly, drift) and target models (DINOv2/e5-base), but does not disclose what data structure or format the tool returns.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The bilingual description is efficiently structured with zero waste: the first sentence defines the action, the second enumerates specific operations and models. Every clause serves a purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While parameters are fully documented in the schema and annotations cover safety, the description lacks completeness due to the absence of an output schema. It should describe what quality metrics or report structure users can expect as a return value.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description does not add semantic meaning beyond the schema (e.g., explaining the trade-offs between 'sections' vs 'parts' scope, or when to enable distribution statistics).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool monitors embedding quality and performs specific operations (coverage, anomaly detection, drift detection) on specific models (DINOv2/e5-base). However, it does not explicitly differentiate this from sibling tools like `quality.evaluate` or `page.analyze`.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives such as `quality.evaluate` or `audit.query`, nor does it mention prerequisites like requiring existing embeddings to analyze.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnly/idempotent hints, so the description is relieved of full burden. It adds value by disclosing warning generation ('Warns about performance/accessibility issues') and analysis methodology. However, it omits key behavioral traits: database persistence ('save_to_db' defaults to true), graceful degradation on timeout, and the conditional parameter requirements across different detection modes.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three short sentences with zero waste. Information is front-loaded with the core purpose in sentence one, methodology in sentence two, and output characteristics in sentence three. Every sentence earns its place with no redundant or filler text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 19 parameters with 100% schema coverage and high complexity (nested objects, enums, conditional requirements), the description is minimally viable but incomplete. It lacks output schema guidance (only hints at warnings) and fails to explain the complex mode-dependent parameter logic that agents need to invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is appropriately 3. The description implies CSS-focused analysis but does not add parameter-specific semantics (e.g., explaining that 'detection_mode' determines which other parameters become effectively required, or detailing the interaction between 'fetchExternalCss' and 'baseUrl').
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description states a clear specific action ('Detect/classify motion patterns') and resource ('web page'), distinguishing from sibling 'motion.search'. However, it significantly under-represents scope by only mentioning CSS parsing ('Parses CSS animations...') while omitting the video, runtime, and hybrid detection modes described in the schema, which could mislead agents about capabilities.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'accessibility.audit' (general accessibility) or 'motion.search' (querying existing patterns). It also fails to clarify that certain detection modes require specific parameter combinations (e.g., 'url' for video/runtime modes vs 'html/pageId' for CSS mode).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnly/idempotent hints. Description adds valuable search methodology context (Hybrid Search, Vector + Full-text, RRF integration) and embedding dimension requirements. However, it omits reranking behavior with profile_id, result scoring explanation, and pagination limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Two dense sentences with zero waste. Front-loaded with resource domain (worldview/layout), input flexibility, and technical implementation details. Every phrase earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a search tool with annotations present, covering search domain and methodology. However, lacks explanation of result structure (no output schema exists), ranking behavior, or guidance on interpreting minConfidence vs minSimilarity parameters.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is moderate (60%). Description provides a concrete natural language example ('サイバーセキュリティ感のあるダークなデザイン') that clarifies expected query semantics, and emphasizes the 768-dim embedding constraint. Does not explain filter combinations, options weighting interplay, or profile_id reranking purpose.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
Specifies semantic search on '世界観・レイアウト構成' (worldview/layout composition) with clear input methods (natural language or 768-dim embedding). Distinguishes domain from siblings like layout.search (structural) or design.search_by_image (visual), though it could explicitly name contrasting tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides technical input guidance (query vs embedding) but lacks strategic advice on when to use this versus layout.search, search.unified, or design.search_by_image. No mention of prerequisites or when to prefer hybrid vs vector-only mode.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true and idempotentHint=true, establishing this is a safe, non-destructive operation. The description adds context that the tool has two distinct behavioral modes (searching vs. generating code) and implies the output format (CSS/JS). It does not disclose rate limits, caching behavior, or whether generated code is persisted, but this is acceptable given the annotations cover the safety profile.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely compact (two sentences) yet conveys the essential dual-mode nature of the tool. There is no redundant text or repetition of the tool name. Every sentence earns its place by defining either the purpose or the action parameter's effect.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness2/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having 15 parameters, complex nested filtering objects (filters, js_animation_filters, webgl_animation_filters), and dual operating modes, the description is minimal. With no output schema provided, the description fails to explain what the search returns (pattern metadata? code snippets?) or what the generate action returns, leaving significant gaps for an AI agent trying to understand the complete contract.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema carries the full burden of parameter documentation. The description repeats the action parameter's purpose ('action: search...で検索') without adding syntax details, validation rules, or semantic relationships between parameters (e.g., that 'query' and 'samplePattern' are mutually exclusive for search). Baseline 3 is appropriate as the description adds minimal value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the dual purpose: 'モーションパターンを類似検索' (similarity search for motion patterns) and '実装コードを生成' (generate implementation code). It specifies the resource (motion patterns/CSS/JS code) and uses specific verbs. However, it does not distinguish from the sibling tool 'motion.detect', leaving ambiguity about when to use search vs. detection.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides internal usage guidance by explaining the 'action' parameter ('searchで検索、generateでCSS/JS実装コードを生成'), clarifying when to use each mode. However, it lacks external guidelines comparing this tool to alternatives like 'motion.detect' or 'layout.generate_code', and provides no 'when not to use' constraints.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds the GDPR compliance framework context, which explains the nature of the audit logs, but doesn't disclose additional behavioral traits like rate limits, retention periods, or what specific processing activities are captured beyond the high-level GDPR reference.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely efficient bilingual format (Japanese/English) with zero waste. The two-sentence structure front-loads the action (search/query) and immediately qualifies the regulatory scope, delivering maximum information density without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While the GDPR context is valuable, the description remains minimal for a compliance-focused tool with 5 optional parameters. It doesn't address the all-optional parameter design, expected result format, or pagination behavior beyond the limit constraint. Given the lack of output schema, additional context about return values would strengthen completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage including examples (e.g., 'data.delete, page.analyze'), the schema carries the full burden of parameter documentation. The description provides no additional parameter semantics beyond the schema, warranting the baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool queries/searches audit logs (specific verb + resource) and adds valuable regulatory context by specifying GDPR Article 30 compliance records. However, it doesn't explicitly differentiate from the sibling 'accessibility.audit' tool, though the GDPR mention implicitly signals this is for data processing records rather than accessibility testing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The GDPR Art.30 reference provides implied usage context (use when needing records of processing activities for compliance), but lacks explicit when-to-use guidance, exclusions, or comparisons to alternatives like 'data.export' for data retrieval scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds value by clarifying this is 'semantic' search (vector-based) rather than keyword matching, and lists specific background pattern examples. However, it omits details about the reranking behavior with profile_id or result pagination patterns.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three well-structured sentences with zero waste: (1) core function declaration, (2) searchable content examples, (3) filtering capabilities. Information is front-loaded and appropriately sized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the comprehensive schema coverage (100%) and complex nested filter structure, the description successfully provides the conceptual framing for the domain. It appropriately omits return value details (no output schema exists) but could briefly mention pagination behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
While the schema has 100% coverage (baseline 3), the description adds concrete examples of the abstract 'BackgroundDesign' concept (gradients, glassmorphism, SVG) and explicitly mentions the 14 designType options, helping agents formulate appropriate natural language queries.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the tool performs semantic search on BackgroundDesign resources and provides concrete examples of searchable patterns (gradients, glassmorphism, SVG backgrounds). However, it does not explicitly differentiate from sibling search tools like layout.search or design.search_by_image.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use this tool versus the numerous alternative search tools available (layout.search, narrative.search, design.search_by_image, etc.). No exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare read-only/idempotent safety. Description adds valuable content context (OKLCH color values, gradient definitions) but omits behavioral details about gradient auto-generation logic, pagination for list results, or how brand_name partial matching behaves.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste. Front-loaded with action verb, followed by usage pattern, then content specifics. Each sentence earns its place with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for basic retrieval but underserves the tool's complexity. With 6 parameters including nested gradient configuration objects, the description should highlight the auto-generation capabilities and search functionality more explicitly. No output schema exists, increasing the description's burden.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, baseline is 3. Description adds the 'no params for list' usage pattern which clarifies the optional ID behavior, but does not explain relationships between parameters (e.g., that gradient_options requires auto_generate_gradients) or add semantic context beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
States specific action ('Get') and resource ('brand palette') clearly. Explains the dual-mode behavior (ID for details vs no params for list) effectively. Distinguishes from design/search siblings implicitly through resource specificity, though explicit differentiation is absent.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear guidance on the primary list-vs-detail usage pattern ('Specify ID for details or no params for list'). However, offers no guidance on when to use brand_name search, mode filtering, or the complex auto-generate gradient features versus standard retrieval.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds valuable context about parallel processing and error handling modes not covered by annotations. However, it fails to disclose significant side effects (database persistence to WebPage table, embedding generation) implied by the schema options and write annotations (readOnlyHint: false).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly constructed sentences with zero waste. Front-loaded with purpose, followed by processing behavior and error handling. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers core batch processing mechanics adequately given the rich schema, but omits critical persistence details (save_to_db, auto_analyze side effects) that would help an agent understand the full scope of the mutation. Acceptable but incomplete for a state-changing operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, baseline is 3. The description adds semantic value by explaining the behavioral impact of key options: 'configurable concurrency' and 'skip/abort modes' for error handling, helping the agent understand why these parameters matter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
Clear specific verb ('batch ingest') and resource ('URLs for layout analysis'). Implicitly distinguishes from sibling 'layout.ingest' by emphasizing 'batch' and 'multiple', though explicit sibling contrast is absent.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides guidance on error handling modes ('skip/abort') but lacks explicit direction on when to use this tool versus the single-URL 'layout.ingest' alternative or when to adjust concurrency settings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds valuable behavioral context about default profile returns when profile_id is omitted. However, it fails to explain what 'signals' are (mentioned in schema) or elaborate on the GDPR data portability implications beyond the schema definition.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The bilingual description (Japanese/English) is efficiently structured with zero waste. It front-loads the primary action ('Get current preference profile') and immediately follows with the critical default behavior constraint. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (2 optional parameters, read-only operation) and excellent schema/annotation coverage, the description provides sufficient context for invocation. It appropriately delegates GDPR details to the schema, though briefly mentioning what 'signals' represent would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description enhances the profile_id parameter by explaining the default profile behavior when omitted, adding semantic value. However, it completely ignores the include_signals parameter, leaving its GDPR context solely to the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it retrieves the current preference profile using specific verbs (取得/get). It effectively distinguishes the resource (preference profile) and mentions the default behavior when profile_id is omitted. However, it does not explicitly differentiate from siblings like 'preference.hear' or 'preference.reset'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implicit usage guidance by explaining that omitting profile_id returns the default profile. However, it lacks explicit when-to-use guidance or comparison against alternatives like 'preference.reset' (likely for clearing) or 'preference.hear' (likely for updating/listening).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds useful behavioral details beyond annotations: specifies file extensions generated (.svelte, .astro) and explains responsive breakpoint conversion logic (mobile-first class transformation). However, fails to clarify the side effect nature given readOnlyHint=false—specifically whether generated code is returned in the response, saved to disk, or stored elsewhere.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Two efficient sentences with zero waste. Front-loaded with the core action ('generates code'), followed by specific framework options and configuration variants. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for the nested parameter structure, explaining the responsive and framework options. However, given the absence of an output schema and readOnlyHint=false indicating mutation, the description should clarify what happens to generated artifacts (return value vs. persistence) to be complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is met. The description summarizes available options (frameworks, TypeScript, Tailwind) but adds no semantic depth beyond the schema's own property descriptions regarding valid values or interdependencies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the tool generates code from section patterns, specifying the exact output formats (React/Vue/Svelte/Astro/HTML) and distinguishes itself from sibling layout tools (ingest, inspect, search) by focusing specifically on code generation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage through capability listing (specify patternId to output code), but lacks explicit when-to-use guidance versus alternatives like layout.inspect or layout.search, and does not mention prerequisites such as having a valid patternId from prior layout operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description excellently discloses the similarity algorithm (section embeddings, mean pooling, pgvector HNSW, RRF 3-source fusion with specific weights) beyond the readOnly/idempotent annotations. However, it doesn't explicitly mention the 404 error case for unanalyzed URLs or describe the return format, which would be helpful given no output schema exists.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
The bilingual format efficiently packs technical implementation details (embedding models, fusion weights) into two sentences per language. While dense with implementation specifics (DINOv2, pgvector, RRF), these details are front-loaded and relevant to understanding the tool's matching behavior, though slightly overwhelming for basic usage decisions.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complex retrieval algorithm and lack of output schema, the description adequately explains the matching methodology but omits description of return values (e.g., similarity scores, ranked list structure). The include_details parameter hints at output capabilities (common patterns/differences), but explicit return documentation would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents all parameters (URL existence requirement, limit range 1-20, include_details boolean). The description focuses on algorithmic internals rather than parameter semantics, which is acceptable given the comprehensive schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states the tool searches for similar website designs given a URL input, using specific technical verbs (検索/searches, 生成/generates, 発見/finds). The detailed algorithm description (DINOv2 vision + e5-base text embeddings) clearly distinguishes this from siblings like design.search_by_image or design.compare.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
While the description specifies URL input and implies the requirement for the URL to exist in the database (via the embedding generation explanation), it lacks explicit guidance on when to use this tool versus alternatives like design.search_by_image or layout.search. The constraint that URLs must exist in DB is only in the schema, not the main description.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, covering safety characteristics. The description adds domain context about what gets searched (responsive analysis results) and mentions filtering capabilities, but does not disclose rate limits, pagination behavior beyond the schema, or result ranking methodology.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: (1) defines the action and target resource, (2) specifies searchable content with examples, (3) lists filter capabilities. Information density is high and front-loaded with the core semantic search capability.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 100% schema coverage and no output schema, the description adequately covers the tool's domain (responsive design analysis) and key filtering dimensions. It could improve by clarifying that it queries existing analysis data rather than creating new analyses, but otherwise provides sufficient context for invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, providing a baseline of 3. The description adds value by enumerating example filter dimensions (difference categories, viewport pairs, breakpoint ranges, screenshot diff rates) that help agents understand the search domain, though it doesn't document all filter options (industry, audience, tags) or the profile_id parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool performs 'semantic search on responsive design analysis results' with specific scope on viewport differences (layout changes, navigation changes, display switching). It distinguishes from siblings like layout.search or design.search_by_image by emphasizing responsive-specific analysis and cross-viewport comparisons.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage by detailing searchable content (viewport differences) and available filters, but lacks explicit guidance on when to use this versus siblings like responsive.capture (which likely creates data) or layout.search. No prerequisites are mentioned (e.g., that analysis results must exist first).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint/idempotentHint; description adds valuable behavioral context including the specific RRF weighting (60% vision + 40% text), multilingual support, and concrete examples of filterable section types (hero, feature, cta, etc.) beyond the structured annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Four dense sentences with zero waste: defines core function, language support, filtering options, and advanced vision features. Appropriately front-loaded with the essential semantic search purpose before detailing optional capabilities.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a complex 16-parameter tool with nested objects, but gaps remain: no output schema exists, yet description fails to explain what data structure or fields are returned (e.g., ranked section patterns with similarity scores, HTML content). The include_html/preview parameters imply return content, but this should be explicit.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% description coverage, establishing baseline 3. Description highlights key parameters like use_vision_search and section type filtering, but does not add significant semantic depth beyond what the detailed schema already provides for the 16 available parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool performs semantic search of section patterns using natural language queries, specifies multilingual support (Japanese/English), and distinguishes itself from siblings by highlighting the vision embedding hybrid search capability (RRF 60/40 split) specific to this layout search tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implicit guidance by describing filtering capabilities (section types) and when to enable vision search (use_vision_search=true), but lacks explicit guidance on when to use this versus sibling tools like design.search_by_image or search.unified, or prerequisites like existing ingested patterns.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
While annotations already declare readOnlyHint and idempotentHint, the description adds valuable behavioral context by detailing what data is returned (styles, bounding boxes, etc.) and implicitly confirming safety through 'sanitized HTML snippet' in parameter descriptions. It does not contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The bilingual format efficiently serves both Japanese and English contexts without verbosity. The structure is front-loaded: first sentence states the action, second lists specific return values. Every clause provides distinct information (scope, identifier type, data categories).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of an output schema, the description adequately compensates by enumerating the specific data fields returned (styles, HTML, bounding box, etc.). Combined with strong annotations covering safety properties, the description provides sufficient context for invocation, though it could briefly note this is for detailed analysis versus overview.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is appropriately 3. The description mentions HTML and Embedding in the return value list, which semantically connects to the include_html and include_embedding flags, but this is redundant with the already-comprehensive schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('取得します' / 'Inspect') and resources ('UIコンポーネントパーツ' / 'UI component part'), clearly targeting retrieval of detailed component data. It distinguishes from sibling tools like 'part.search' (which finds parts) by emphasizing inspection 'by ID' and lists specific return values (styles, HTML, bounding box, interaction info, embedding status).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage through the phrase 'by ID' and the detailed return value list, suggesting it's for deep inspection of known components. However, it lacks explicit guidance on when to use this versus 'part.search' (for discovery) or 'part.compare', leaving agents to infer the workflow.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint, idempotentHint, and openWorldHint. The description adds valuable behavioral context beyond these: the specific measurement methodology ('via Playwright PerformanceObserver API'), the grading scale ('score 0-100, grade'), and the specific metrics included. It does not contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. First sentence front-loads the action, technology, and target metrics. Second sentence efficiently covers return value structure. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description compensates by detailing return values (score range, grade, recommendations). Combined with 100% input schema coverage and complete annotations (readOnly, idempotent, openWorld), this provides sufficient context for invoking the 3-parameter tool, though it could note network dependency implications of openWorldHint.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, fully documenting all parameters including nested budget fields. The description mentions 'optional improvement recommendations' which loosely corresponds to the include_details parameter, but adds no semantic detail, syntax guidance, or examples beyond what the schema already provides. Baseline 3 appropriate for high coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description states a specific action ('Evaluate') and resource ('web page performance using Core Web Vitals') and lists exact metrics (LCP, FID, CLS, INP, TTFB). This clearly distinguishes it from siblings like quality.evaluate (generic quality assessment) and accessibility.audit (accessibility focus).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
While the description explains what the tool returns (score, grade, recommendations), it provides no explicit guidance on when to choose this over similar analysis tools like page.analyze or quality.evaluate. Usage is implied but not contextualized against alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true. The description adds valuable behavioral context beyond these annotations by detailing exactly what layout changes are detected (section visibility, font/grid/spacing changes) and specifying the return format (0-100 score). It does not mention rate limits or error handling scenarios.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, dense sentence with zero waste. It front-loads the core action (simultaneous capture), specifies the viewports, lists the analysis targets, and defines the return value—all essential information with no filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description explicitly states the return value (diff score 0-100) and explains the analysis methodology. Combined with comprehensive annotations and 100% schema coverage, this provides sufficient context for invocation, though error handling scenarios remain undocumented.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, documenting all parameters including the default viewport values. The description reinforces the default three viewports but does not add syntax details, format constraints, or usage examples beyond what the schema already provides, warranting the baseline score for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description precisely states the tool captures web pages across three specific viewports (desktop 1920x1080, tablet 768x1024, mobile 375x812) simultaneously, analyzes responsive layout differences (section visibility, fonts, grids, spacing), and returns a diff score (0-100). This clearly distinguishes it from siblings like responsive.search or page.analyze.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
While the detailed functional description implies usage scenarios (responsive layout testing), it lacks explicit guidance on when to use this capture tool versus responsive.search (likely for retrieving existing captures) or other analysis tools. No 'when-not' exclusions or alternative recommendations are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already establish read-only, idempotent, closed-world behavior. The description adds valuable scope context by listing the specific diagnostic categories checked (metrics, cache stats, initialization status, pattern services) and noting that it 'returns diagnostics,' though it omits rate limits or return format details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Two tightly constructed sentences with zero redundancy. The first establishes the core action; the second enumerates specific check categories. Every word serves to define scope or functionality.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the rich parameter schema and clear annotations, the description is nearly complete. The only gap is the lack of output schema paired with minimal description of the return value (just 'returns diagnostics'), leaving the agent uncertain about response structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents all 7 boolean parameters. The description provides a high-level summary of parameter groups (metrics, cache, etc.) but does not add syntax details, default value implications, or cross-parameter relationships beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description states a specific action ('Run MCP server health check') and enumerates the exact subsystems checked (tool metrics, embedding cache, initialization status, pattern services). It clearly distinguishes this diagnostic tool from operational siblings like 'data.delete' or 'layout.inspect' by focusing on monitoring rather than mutation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies this is for diagnostics and monitoring through the term 'health check,' but provides no explicit guidance on when to invoke it versus other tools, nor does it mention prerequisites or conditions that would trigger its use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnlyHint: false, indicating state mutation), the description adds valuable behavioral context: it discloses the embedding diff methodology, explains the 0-1 change score semantics (0=identical, 1=completely different), and details the four section-level change categories (added/removed/modified/unchanged). This effectively compensates for the lack of output schema by explaining what the user can expect from the tool's analysis.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
The bilingual format (Japanese/English) efficiently serves dual locales without redundancy. The description is front-loaded with the core purpose, followed by action enumeration and output semantics. Every sentence conveys essential information about capabilities, methodology, or return values, though the density of the bilingual format slightly impacts readability.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having no output schema, the description comprehensively explains the return semantics through the change score (0-1) and categorical change classifications (added/removed/modified/unchanged). It adequately covers the complexity of the four-mode tool (snapshot/compare/history/detect) and their distinct behaviors, providing sufficient context for an agent to select appropriate actions based on user intent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema itself fully documents all five parameters including the action enum and conditional fields (snapshot_ids for compare, limit for history). The description reinforces the four action types but does not add significant semantic detail beyond what the structured schema already provides, which is appropriate given the high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool tracks design changes for the same URL over time (時系列で追跡/Tracks design changes of the same URL over time). It explicitly enumerates the four distinct actions provided (snapshot, compare, history, detect) and distinguishes itself from siblings like design.compare by emphasizing temporal tracking of identical URLs versus cross-site comparison.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear workflow context by referencing page.analyze in the auto_snapshot parameter description (page.analyze後の自動スナップショット), suggesting integration with that sibling tool. While it doesn't explicitly list when *not* to use the tool, the specific focus on '同一URL' (same URL) tracking provides sufficient implicit guidance to differentiate it from alternatives like design.compare or design.search_by_image.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint, idempotentHint, and openWorldHint. Description adds valuable behavioral context: parallel execution model, unified result aggregation, and real-time streaming capabilities. However, it omits mention of database persistence side effects (evident in schema saveToDb defaults), which is relevant given the read-only annotation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero redundancy: (1) core capability, (2) implementation architecture, (3) protocol feature. Front-loaded with specific actions and sibling references. Every clause earns its place in guiding tool selection.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 15 complex nested parameters and no output schema, the description is minimally viable. It mentions 'returns unified results' but provides no structure, fields, or shape guidance. For a tool of this complexity with no output schema, additional guidance on return value structure would significantly improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage, establishing a baseline of 3. Description adds invocation context by referencing '_meta.progressToken' for streaming, which is not a schema parameter but protocol metadata relevant to usage. No additional parameter-specific guidance is provided, but none is needed given comprehensive schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states the tool performs 'layout detection, motion pattern extraction, and quality evaluation' on a URL, and crucially distinguishes itself from siblings by stating it 'Executes layout.ingest, motion.detect, and quality.evaluate in parallel.' This composite positioning makes the orchestration purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies this is the convenience tool for comprehensive analysis by mentioning parallel execution of three specific sibling tools. Mentions MCP streaming support via _meta.progressToken. However, it doesn't explicitly state when to prefer individual tools (e.g., for single-domain analysis or resource constraints) versus this composite tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
While annotations declare readOnly/idempotent status, description adds critical behavioral details: 'JSON format' output, 'PII fields are explicitly marked' data handling, and 'all related data' scope—though it omits export file retention/delivery mechanics.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Bilingual format efficiently serves both Japanese and English users without redundancy. Three sentences: legal basis, technical scope/format, and PII handling—each earning its place with zero waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the high-stakes GDPR compliance context, description adequately covers legal basis, output format, and PII handling. No output schema exists, but 'JSON format' partially compensates. Minor gap regarding export file security or retention policies.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents 'page' vs 'profile' distinctions and UUIDv7 format. Description references 'page/profile' generally but adds no parameter-specific semantics beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states 'GDPR Art.20 Right to Data Portability' as the legal basis and function, uses specific verb 'Exports' with resource 'all related data for the specified target', and distinguishes from sibling 'data.delete' by emphasizing export/portability versus deletion.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear contextual trigger via 'GDPR Art.20' indicating when to use (data portability requests), but lacks explicit 'when not to use' guidance or named alternatives like 'preference.get' for simpler data retrieval.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true and idempotentHint=true. The description adds valuable behavioral context beyond these: it specifies the output scale (0-1 scores) and explains what include_details actually returns (common patterns and key differences). No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Perfectly structured bilingual description. First sentence delivers core functionality (comparison dimensions and scoring), second sentence explains the optional details flag. Zero waste, front-loaded with essential information, appropriate length for complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema exists, but the description adequately covers the return value concept (pairwise scores 0-1, optional pattern details). Given the tool's focused scope (comparison only) and strong annotations, the description provides sufficient context for invocation, though exact response structure could be detailed further.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the baseline is 3. The description adds meaningful semantics by explaining the functional impact of include_details (what 'details' means: common patterns and differences) and reinforcing the cardinality constraint (2-5 pages). This goes beyond the schema's boolean flag description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action (compare), resource (2-5 web pages), methodology (4 axes: layout, visual, quality, color), and output (pairwise similarity scores 0-1). It distinguishes from siblings like design.search_by_image (search by image) and design.track_changes (temporal tracking) by specifying direct pairwise comparison of specific page IDs.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies prerequisites by specifying UUID format for page_ids, suggesting pages must be ingested first. However, it lacks explicit when-not-to-use guidance or named alternatives (e.g., 'use design.similar_site for finding similar sites by URL instead'). Usage context is present but implied rather than explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond the readOnly/idempotent annotations, the description adds valuable behavioral context: it clarifies the comparison is 'parallel,' specifies that it 'reports property-level identity' (indicating granular diff-style output), and enumerates the four comparable aspects. This helps agents understand the depth of analysis performed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is tightly structured with zero redundancy—each clause conveys distinct information (scope, aspects, default, output type). The bilingual format efficiently serves international contexts without duplication of meaning, and the information is front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter comparison tool with complete schema coverage and safety annotations, the description adequately covers the essential behavioral traits and hints at output characteristics ('property-level identity'). It would benefit from explicit mention of return value structure, but given the annotations cover safety and the schema covers inputs, the description provides sufficient context for invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents both parameters (part_ids constraints and compare_aspects enum/default). The description reinforces the 2-5 range and default aspects but adds no additional syntax, format details, or semantic nuances beyond what the schema already provides, meeting the baseline for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool performs parallel comparison of 2-5 UI parts across four specific dimensions (styles, layout, interaction, accessibility), clearly distinguishing it from siblings like design.compare (which handles designs) and part.inspect (which analyzes single parts). The bilingual text maintains specificity in both languages.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description effectively communicates default behavior (styles+layout) and the specific comparison scope, helping agents understand what happens when compare_aspects is omitted. However, it lacks explicit guidance on when to prefer this over design.compare or part.inspect, though the 'UI parts' specificity provides implicit differentiation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds valuable technical context beyond readOnlyHint/idempotentHint annotations: discloses e5-base embedding model, full-text hybrid approach, and future support limitations for image_url. No contradictions with safety annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Bilingual format is efficient with zero waste. Front-loaded with core purpose (semantic search for UI parts), followed by technical methodology (e5-base) and filtering capabilities. Two dense sentences cover all essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequately covers 11 parameters with 100% schema coverage by highlighting key filtering dimensions (partType, searchMode) and search methodology. No output schema exists, so explanation of return values isn't expected. Could briefly note that all parameters are optional (0 required).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage (baseline 3), description adds that query uses 'e5-base + full-text' hybrid logic and clarifies image_url is not yet functional. Explicitly enumerates partType count (16) and searchMode options, reinforcing schema semantics.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Specific verb 'Search' with clear resource 'UI component parts' and examples (buttons, cards, links). Distinguishes from siblings like layout.search or design.search_by_image by specifying 'parts' and 'semantic/hybrid text search' methodology.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage through resource naming (parts vs layouts) and notes image_url is 'future support' (useful when-not). However, lacks explicit guidance on when to use text vs hybrid vs visual modes, or when to prefer this over design.search_by_image or part.inspect.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint and idempotentHint. The description adds valuable behavioral context not in annotations: it discloses the internal parallel execution model and similarity-score merging logic. It does not contradict annotations (searching/merging aligns with read-only). Could improve by mentioning latency implications of parallel execution or cache behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Despite being bilingual (Japanese/English), the description is extremely efficient: two sentences per language with zero waste. First sentence establishes scope (five domains), second explains mechanism (parallel execution + merge). Information is front-loaded and every clause earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex 12-parameter cross-domain tool, the description provides sufficient high-level context given the rich annotations and complete schema. It explains the merging behavior adequately. Minor gap: no output schema exists, and the description doesn't detail return structure (e.g., whether it returns unified objects or typed collections), though it implies similarity-scored results.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents all 12 parameters including the five type enums. The description mentions the five component types (layout, part, motion, background, narrative) which reinforces the 'types' parameter semantics, but does not need to duplicate the comprehensive schema documentation. Baseline 3 is appropriate given schema completeness.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description precisely defines the tool's scope using specific verbs ('cross-component semantic search') and enumerates all five searchable domains (Layout, Part, Motion, Background, Narrative). It clearly distinguishes this as an aggregator that 'executes individual search tools in parallel,' differentiating it from sibling single-domain tools like layout.search or part.search.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by stating it executes individual search tools in parallel and merges results, suggesting this is for cross-cutting queries rather than single-domain searches. However, it lacks explicit 'when to use unified vs. specific tools' guidance (e.g., 'use this when searching across multiple component types, use layout.search for layout-only queries').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate destructive (readOnlyHint: false) and non-idempotent (idempotentHint: false) behavior. The description adds valuable technical specifics about deletion behavior: CASCADE DELETE for pages, hard delete for profiles, and bulk delete for all_user_data, which helps the agent understand scope and impact beyond the annotation flags.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5Is the description appropriately sized, front-loaded, and free of redundancy?
The bilingual presentation is efficient and front-loaded with the legal basis and action. The structure separates the legal context, target type explanations, and confirmation requirement clearly. Minor redundancy between Japanese and English is necessary for the intended audience and does not detract from clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a high-stakes destructive operation with legal compliance requirements, the description adequately covers the deletion modes, UUID requirements, and audit trail needs (reason parameter). It appropriately delegates specific parameter formats to the 100%-covered schema. Missing output schema is acceptable for a deletion operation, though noting the non-idempotent behavior from annotations in the description would strengthen it further.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, baseline is 3. The description adds meaningful technical semantics by specifying 'CASCADE DELETE', 'hard delete', and 'bulk delete' for the target parameter options, providing database-level context not present in the schema descriptions. It also emphasizes the confirm flag requirement.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states it performs permanent data deletion under GDPR Art.17 'Right to Erasure' and distinguishes the three specific deletion modes (page CASCADE DELETE, profile hard delete, all_user_data bulk delete), clearly differentiating from sibling tools like data.export or preference.reset.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The GDPR Art.17 reference provides clear legal context for when to invoke this tool. It explicitly states the confirm: true requirement as a safety gate. However, it lacks explicit comparison to alternatives (e.g., preference.reset for non-permanent removal) or warnings about irreversibility beyond 'permanently deletes'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds substantial context beyond annotations: specifies scoring methodology (0-100 scale), violation classification (severity levels), contrast algorithm (implied by 'checks text/background'), and engine (axe-core). No contradictions with readOnlyHint/idempotentHint annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Two dense sentences with zero redundancy. Front-loads the standard (WCAG 2.1) and engine, then details capabilities (violations, scoring, contrast). Every clause conveys distinct functionality (compliance levels, severity classification, score range).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 5 parameters with complete schema coverage and safety annotations, the description adequately covers the tool's behavior. Mentions key outputs (score, violations, contrast) despite absence of output schema. Minor gap: could explicitly state that url and html are mutually exclusive inputs in the description text itself.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, baseline is 3. Description enriches parameter understanding by mapping 'url/html' to the audit target, 'level' to WCAG conformance tiers, and 'include_contrast' to text/background validation. Clarifies the mutual exclusivity concept between url and html through 'HTML or URL' phrasing.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
States specific action ('WCAG 2.1 accessibility audit'), engine ('axe-core'), and scope (contrast ratio checking, compliance levels). Clearly distinguishes from general 'page.analyze' or 'quality.evaluate' siblings by specifying accessibility-specific methodology and outputs.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context (use when checking WCAG A/AA/AAA compliance, need accessibility scores, or contrast validation). While it doesn't explicitly name sibling alternatives like 'page.analyze', the specificity of 'WCAG 2.1' and 'axe-core' makes the appropriate use case unambiguous.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
While annotations indicate read-only/idempotent status, the description adds valuable algorithmic transparency: DINOv2 visual embeddings, HNSW search methodology, and specific RRF fusion weights (text 40% + vision 30% + fulltext 30%). This discloses how results are ranked and combined, though it omits rate limits or result format details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with zero waste: purpose declaration, input specification, pure visual search mechanics, and hybrid search behavior. Technical details (DINOv2, RRF weights) are densely packed but clearly presented. Front-loaded with the core function.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the rich schema (5 parameters, 100% coverage) and complex hybrid search behavior, the description adequately covers input requirements and search logic. However, without an output schema, it could benefit from describing what constitutes a 'design section' result or return structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the baseline is 3. The description adds semantic value by explaining that the image parameter drives DINOv2 visual embedding generation and that the query parameter triggers multi-source RRF fusion, providing behavioral context beyond the schema's format specifications.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool searches for visually similar design sections from images, using specific technical mechanisms (DINOv2, HNSW) that distinguish it from text-based siblings like search.unified or layout.search. It precisely identifies the resource (design sections) and operation (visual similarity search).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description effectively explains when to use the optional text query parameter (triggering hybrid RRF fusion) versus image-only search. However, it lacks explicit comparison to sibling alternatives like design.compare or design.similar_site for determining when this specific tool is preferred.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint: false and idempotentHint: false. The description adds valuable behavioral context not found in annotations: it clarifies the conditional mutation behavior (only updates when feedback is present) and distinguishes between sample retrieval vs profile mutation modes. Does not mention side effects like profile creation when profile_id is omitted.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely efficient bilingual format (Japanese/English) with zero redundancy. Two sentences cover concept and mechanics. First sentence establishes the domain (preference hearing), second explains the conditional logic. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 6-parameter mutation tool with no output schema, the description adequately covers the core interaction model (dual modes) and hints at return behavior ('present samples'). However, it could explicitly describe the return structure or pagination behavior given the limit/offset parameters exist.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description adds conceptual meaning beyond the schema by explaining the semantic relationship between the feedback parameter and the tool's behavior mode (feedback presence triggers profile updates). It provides the 'why' for the feedback array parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the dual-purpose nature: 'present samples without feedback, update profile with feedback.' It uses specific verbs (present/update) and clearly distinguishes from sibling tools like preference.get (retrieval) and preference.reset by defining this as an interactive 'hearing session' that modifies state.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context for the two operational modes (Mode A: sampling without feedback, Mode B: updating with feedback), which effectively guides when to include the feedback parameter. Lacks explicit naming of alternatives like preference.get for read-only access, though the distinction is implied by the 'hearing' concept.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds critical behavioral context beyond annotations by disclosing CASCADE deletion of preference_signals (side effect not in annotations). Confirms the confirmation-gate pattern. Annotations already establish idempotency and non-read-only status; description complements this with data impact details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Bilingual format is efficiently compressed into three high-value sentences: action definition, safety requirement, and side-effect warning. Front-loaded with the verb, no filler content. Critical safety information (confirm, cascade) is prominently placed.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Excellent coverage for a destructive operation with annotations present. Addresses confirmation requirements, cascade effects, and distinguishes from related preference tools. No output schema exists, but description adequately covers the operation's scope and risks.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage (baseline 3), adds value by emphasizing the confirm requirement and explaining the cascade behavior implications. The hard_delete parameter's GDPR context is well-documented in schema, but description reinforces the deletion semantics through the cascade warning.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Explicitly states the tool resets the preference profile (嗜好プロファイルをリセットします) using a specific verb+resource combination. Clearly distinguishes from siblings preference.get (retrieval) and preference.hear (recording) by specifying the destructive reset action.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states the safety requirement that confirm: true is mandatory (必須です). Warns about CASCADE deletion of preference_signals, implying data loss risks. Could improve by explicitly contrasting with data.delete for full erasure vs standard reset.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate read-only/idempotent operations, which the description supports by emphasizing 'counts' and 'classification' rather than mutation. The description adds valuable behavioral context about the return format (counts per value) and intended UX pattern (refinement UI) not present in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Exceptionally well-structured: deprecation warning front-loaded, followed by concise bilingual explanation of functionality, use case, and exact replacement syntax. Despite supporting two languages, there is no redundant information—every clause serves a distinct purpose (status, function, migration).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Without an output schema, the description adequately explains the return values conceptually ('counts per value'). Given the tool's deprecated status, the detailed migration path provided is more valuable than exhaustive return value documentation, though explicit return structure details would have warranted a 5.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents all 7 parameters including field descriptions and enums. The description references the facet fields conceptually but does not add syntax details or validation rules beyond what the schema already provides, meeting the baseline for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool's purpose using specific verbs ('classifies', 'returning counts') and resources (search results by sectionType, industry, audience, tags). It effectively distinguishes itself from sibling search tools by explicitly identifying search.unified as the replacement.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Excellent guidance provided: explicitly marked as DEPRECATED with a clear 'when-not-to-use' directive, and provides exact parameter mappings for the alternative (search.unified with include_facets: true, enable_reranking: false, etc.). Also specifies the UI use case ('refinement UI and filter selection').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
While annotations declare readOnlyHint and idempotentHint, the description adds significant behavioral context: the specific job states (waiting/active/completed/failed), infrastructure dependency ('Requires Redis'), and data retention policy ('24 hours'). It documents the return structure which is absent from annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Information is efficiently front-loaded and structured: one-sentence purpose, explicit usage instruction, bulleted return values, and critical operational notes. No redundancy with the schema or annotations.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensively covers the async job lifecycle (states, progress percentage, completion/failure outcomes) since no output schema exists. Includes operational constraints (Redis, retention) essential for correct deployment.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing a baseline of 3. The description adds workflow context by framing the job_id within the async polling pattern ('poll for the status'), implying repeated invocation patterns that the schema alone doesn't convey.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a specific verb ('Check') and resource ('status of an async page analysis job'), clearly distinguishing this as the status-polling counterpart to the sibling tool 'page.analyze'. It explicitly scopes the tool to async jobs, preventing confusion with synchronous analysis.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use ('poll for the status') and the prerequisite workflow ('submitted with page.analyze(async=true)'). It names the sibling tool directly as the required antecedent, providing clear guidance on the tool chain.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
GitHub Badge
Glama performs regular codebase and documentation scans to:
- Confirm that the MCP server is working as expected.
- Confirm that there are no obvious security issues.
- Evaluate tool definition quality.
Our badge communicates server capabilities, safety, and installation instructions.
Card Badge
Copy to your README.md:
Score Badge
Copy to your README.md:
How to claim the server?
If you are the author of the server, you simply need to authenticate using GitHub.
However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.
{
"$schema": "https://glama.ai/mcp/schemas/server.json",
"maintainers": [
"your-github-username"
]
}Then, authenticate using GitHub.
Browse examples.
How to make a release?
A "release" on Glama is not the same as a GitHub release. To create a Glama release:
- Claim the server if you haven't already.
- Go to the Dockerfile admin page, configure the build spec, and click Deploy.
- Once the build test succeeds, click Make Release, enter a version, and publish.
This process allows Glama to run security checks on your server and enables users to deploy it.
How to add a LICENSE?
Please follow the instructions in the GitHub documentation.
Once GitHub recognizes the license, the system will automatically detect it within a few hours.
If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.
How to sync the server with GitHub?
Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.
To manually sync the server, click the "Sync Server" button in the MCP server admin interface.
How is the quality score calculated?
The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).
Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.
Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).
Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/TKMD/reftrix-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server