Skip to main content
Glama
evaluate-improvements.md3.82 kB
# Evaluate Tool — Token Efficiency and UX Improvements This note proposes targeted changes to reduce overuse of `evaluate`, improve LLM guidance, and lower token costs while preserving flexibility. ## Goals - Encourage specialized tools over `evaluate` when appropriate - Cut output token usage for large/structured results - Avoid schema churn and friction (no new required params) - Support repeated custom logic without resending scripts ## Quick Fixes (Applied) - Correct suggestion names in `evaluate` to match actual tools: - `get_visible_text()` → `get_text` - `element_visibility` → `check_visibility` - `compare_positions` → `compare_element_alignment` ## Large‑Output Guard (Preview + Confirm) - Add a preview/truncation guard to `evaluate` (same pattern as planned for console logs): - Defaults: show up to ~2,000 chars; include counts (`totalLength`, `shownLength`), `truncated` flag, and a one‑time `confirmToken` to fetch full output. - Tips in preview: suggest refining via specialized tools or narrowing the script. - Update metadata to enumerate conditional outputs (preview counts, truncation flags, token, exact line formats). - Summarize bulky structures by default: - Large Arrays: `Array(n) [first 3 items…]` - Large Objects: list top‑level keys only - DOM nodes: compact tag/id/class with one‑line position: `"<tag id=#id class=.a.b> @ (x,y) WxH"` ## Function Registry (Define Once, Call Many) Introduce two atomic tools to avoid resending the same script repeatedly: - `eval_define_function({ name: string, source: string })` - Defines `window.__mcpFns[name]` now via `page.evaluate` and persists across navigations via `page.addInitScript`. - Output: `✓ Function defined: <name>` or a concise error. - `eval_call_function({ name: string, argsJson?: string })` - Calls `window.__mcpFns[name](...JSON.parse(argsJson||'[]'))` within the page. - Reuse the same preview/truncation guard as `evaluate` for returned values. - (Optional) `eval_list_functions()` to inspect registered names. Rationale: honors Atomic Operations and Minimal Parameters; significantly reduces repeated token cost when LLMs re‑run nearly identical scripts. ## Intent & Redundancy Nudges (No New Params) - Do not add a required “explain why” parameter (adds friction, violates Parameter Discipline). - Instead, embed lightweight nudges in `evaluate`: - First use in session: a 1–2 line reminder of specialized tools with concrete examples. - On repeated similar calls (e.g., identical or >80% similar script hashes): - Single‑line hint: “You’ve run a similar script ×N; define once via `eval_define_function` and call it next.” - Implementation: maintain a tiny in‑memory circular buffer of recent script hashes per tool instance. ## Metadata Improvements - `evaluate.getMetadata()` should explicitly list all outputs: - Success with result (preview or full) - When suggestions are produced - When truncated: counts, `truncated: true`, `confirmToken` - Error path with `isError: true` - Provide example calls and outputs (including preview/confirm flow). - Optionally lower `priority` to make `evaluate` less attractive in pickers that respect it. ## Safety/UX Enhancements (Optional) - Set a conservative default timeout for script execution and return a short timeout message with suggestions to narrow the script or use specialized tools. ## Next Steps 1. Implement preview/confirm guard and summarization in `evaluate` and update metadata. 2. Add `eval_define_function` and `eval_call_function` tools; register them and add tests. 3. Add small de‑duplication nudges for repeated scripts. 4. Extend tests for the guard, function lifecycle, and suggestion rendering. 5. Regenerate README via `npm run generate:readme`.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/antonzherdev/mcp-web-inspector'

If you have feedback or need assistance with the MCP directory API, please join our Discord server