Valuein — SEC EDGAR Fundamentals & Smart-Money Data
Server Details
Point-in-time, survivorship-free SEC EDGAR fundamentals + smart-money signals for AI agents.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- valuein/valuein
- GitHub Stars
- 0
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.5/5 across 69 of 69 tools scored. Lowest: 3.2/5.
Each tool has a distinct purpose with detailed descriptions that clarify differences. Overlaps like get_peer_comparables vs screen_universe are well-differentiated by scope (single company vs cross-sectional). Similarly, get_insider_sentiment vs get_smart_money_flow are clearly distinguished by data sources and methodology.
All tool names follow a consistent verb_noun snake_case pattern (e.g., create_report, get_financial_ratios, delete_alert). No mixing of conventions or inconsistent verbs.
With 69 tools, the count far exceeds the 25+ threshold for 'too many'. While the domain is broad, the sheer volume likely overwhelms agents and increases selection complexity.
The tool set covers a wide range of SEC filings, ratios, smart-money data, alerts, reports, and more. Minor gaps exist (e.g., no options or detailed debt data), but most analyst workflows are supported.
Available Tools
95 toolsapprove_staged_actionApprove Staged ActionADestructiveIdempotentInspect
Approve a staged action by id and RUN the underlying tool call it proposed, using the caller's own current credentials — never the original proposer's. Idempotent and race-safe: an action already decided (approved by a concurrent call, rejected, executed, or failed) is NEVER re-executed — this returns the action's current state with executed_now: false instead. On a fresh approval, executed_now is true and tool_result carries the underlying tool's own structured result, exactly what a direct call to that tool would have returned. If the underlying tool itself fails, the staged action transitions to 'failed' with a reason — this call still succeeds (the approval + execution ATTEMPT is what it promises; a failed underlying write is a normal, inspectable outcome, not a tool error). An id belonging to a different customer's token is indistinguishable from an unknown id (returns NOT_FOUND) — ownership is never leaked. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| staged_action_id | Yes | Id of the staged action to approve, from stage_action or list_pending_approvals. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| tool_result | Yes | The underlying tool's structuredContent, when available (executed now, or previously executed). |
| executed_now | Yes | True only if THIS call is the one that ran the underlying tool (won the approval race). |
| staged_action | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (idempotentHint=true, destructiveHint=true, etc.), the description provides extensive behavioral context: credential delegation, race-safety, non-re-execution, return fields (executed_now, tool_result), error handling (underlying failure leads to 'failed' state but call succeeds), and ownership privacy. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is detailed but efficient, with no redundant sentences. It front-loads the core action and then adds important behavior details. Slightly verbose due to thoroughness, but every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (action approval with side effects, idempotency, error handling), the description covers all critical aspects. It explains return values (executed_now, tool_result, state), ownership privacy, and behavior on failures. The output schema is not provided but referenced; the description sufficiently compensates.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and the description of the single parameter ('Id of the staged action to approve, from stage_action or list_pending_approvals.') aligns with the schema's description. The description adds no additional semantics beyond identifying the parameter's source, so baseline score of 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Approve a staged action by id and RUN the underlying tool call it proposed'. It specifies the verb (approve/run), resource (staged action), and distinguishes from siblings like stage_action and reject_staged_action by emphasizing execution using caller credentials.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implicitly guides when to use the tool by detailing idempotency and race-safety behavior, and mentions access control ('Tier: sp500+'). It does not explicitly list alternatives, but the sibling context and name make it clear. The description could explicitly state 'Use when you want to approve a pending action', but is otherwise effective.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
cancel_scheduled_taskCancel Scheduled TaskADestructiveIdempotentInspect
Cancel a pending scheduled task by id (from schedule_task or list_scheduled_tasks). Only a pending task can be cancelled — one that already woke (completed) cannot be un-woken. Idempotent: cancelling an already-cancelled task is a no-op. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| task_id | Yes | Identifier of the scheduled task to cancel. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| task_id | Yes | |
| cancelled | Yes | True if THIS call moved the task to cancelled; false if it was already completed/cancelled or not found. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnlyHint=false, destructiveHint=true, idempotentHint=true), description adds that only pending tasks are cancellable, idempotent behavior (already cancelled is no-op), and tier/access restriction, providing full behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences, front-loaded with main action, no redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Complete for a simple cancellation tool: specifies constraints, idempotency, and has output schema indicated. Covers all necessary behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% for parameter 'task_id', and description does not add extra meaning beyond the schema's description. Baseline 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description states 'Cancel a pending scheduled task by id', specifying the action and resource. It also clarifies the source of the id (from schedule_task or list_scheduled_tasks), distinguishing the tool's purpose among siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Clearly states when to use: only 'pending' tasks can be cancelled, and completed tasks cannot. Mentions idempotency. Does not explicitly name alternatives but context suffices.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
compare_periodsCompare Financial PeriodsARead-onlyIdempotentInspect
Compare a company's core financial metrics across two fiscal periods side-by-side. Shows absolute and percentage changes with significance classification (minor < 5%, notable 5–15%, significant > 15%). The response includes a material_changes count: this is the number of metrics whose significance ∈ {notable, significant} (i.e. absolute percentage change > 5%). Use it as a quick scalar to triage filings — anything > ~3 typically signals a material event worth deeper review. Use period format: 'FY2024' for annual, 'Q1-2024' for quarterly. Pass period_a as the EARLIER period and period_b as the LATER one — if you invert them the server auto-swaps and sets swapped: true in the response so deltas always carry the correct sign (rather than silently flipping). Point-in-time safe via as_of_date. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT, BRK.B | |
| period_a | Yes | Earlier fiscal period. Format: 'FY2023' for annual or 'Q1-2023' for quarterly. | |
| period_b | Yes | Later fiscal period. Format: 'FY2024' for annual or 'Q1-2024' for quarterly. | |
| as_of_date | No | Point-in-time date (YYYY-MM-DD). Only returns facts with accepted_at on or before this date — eliminates look-ahead bias for backtesting. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| changes | Yes | Per-metric deltas: metric, label, period_a, period_b, delta, delta_pct, significance |
| swapped | Yes | True when inputs were reordered so period_b is the more recent period |
| period_a | Yes | Earlier period descriptor: label, fiscal_year, fiscal_period, period_end, filing_date |
| period_b | Yes | Later period descriptor, same shape as period_a |
| as_of_date | No | |
| company_name | No | |
| total_metrics | Yes | Count of metrics compared across the two periods |
| material_changes | Yes | Count of metrics flagged as a material change |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses auto-swap behavior with swapped flag, significance classification thresholds (<5%, 5-15%, >15%), and material_changes count. Annotations already indicate readOnly/ idempotent, but description adds rich behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single paragraph covering multiple aspects efficiently. No obvious filler, but could be broken into bullet points for easier scanning. Still highly effective.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all aspects: purpose, usage, response structure (significance, material_changes, swapped flag), triage advice, point-in-time safety. Output schema exists but description still provides essential context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Adds significant meaning beyond the input schema: explains period format explicitly, order convention, auto-swap, and as_of_date function. Schema coverage is 100%, but description still adds value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool compares core financial metrics across two fiscal periods side-by-side, showing absolute and percentage changes with significance classification. It distinguishes itself from siblings by focusing on period comparison, which no other tool in the list explicitly does.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit format guidance ('FY2024', 'Q1-2024'), order convention (period_a earlier, period_b later), and auto-swap behavior. Offers a triage guideline using material_changes count. Includes point-in-time safety mention and plan availability.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
compute_dcfCompute Forward DCFARead-onlyIdempotentInspect
Forward discounted-cash-flow valuation (two-stage Gordon-growth model): caller provides growth + WACC + terminal assumptions, returns per-share intrinsic value (value_per_share_cents, cents USD) + 5×5 sensitivity grid. Pulls FCF base + net debt + shares from R2; caller can override any field. Definitions (consistent with get_financial_ratios / get_capital_allocation_profile): FCF base = operating_cash_flow − capex (absolute USD); net_debt = total_debt − (cash + short-term investments). Shares resolve via a fallback chain (valuation row → fact CommonSharesOutstanding → net_income/eps_diluted), reported as result.shares_source. The pulled inputs are echoed in result.inputs_echo with their source lineage so the valuation is reproducible and traceable. A null value_per_share_cents means the model is degenerate (e.g. WACC ≤ terminal growth, or FCF base ≤ 0) or a required input was unavailable — it is NOT a zero valuation; the reason field explains. Use the returned figures exactly. Use this when you want to drive the assumptions yourself; for the pipeline's pre-computed DCF/DDM value and inputs (no assumptions needed) use get_valuation_metrics instead. Does NOT persist a report — use create_report (report_type:'reverse_dcf') for that. Tier: sp500+.
| Name | Required | Description | Default |
|---|---|---|---|
| wacc | No | Discount rate. Default 0.09. | |
| ticker | Yes | Stock ticker symbol of the company to value, e.g. AAPL, MSFT, BRK.B. | |
| as_of_date | No | Point-in-time cutoff (YYYY-MM-DD) for the auto-pulled inputs. Fundamentals are filtered by SEC accepted_at (strict PIT); valuation.parquet inputs are best-effort PIT (filtered by created_at, its accepted_at proxy — no SEC acceptance timestamp exists for pipeline-computed valuations). Omit to use the latest knowable inputs. | |
| stage1_years | No | Number of explicit high-growth projection years before the terminal stage (3–15). Defaults to 5. | |
| shares_override | No | Override shares outstanding. Leave unset to use R2-derived. | |
| fcf_base_override | No | Override the auto-pulled FCF base (USD). Leave unset to use R2-derived. | |
| stage1_growth_rate | Yes | Stage-1 FCF growth rate (e.g. 0.12 = 12%/yr). | |
| terminal_growth_rate | No | Long-run growth. Default 0.025. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| result | Yes | |
| ticker | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds significant behavioral context beyond annotations: it explains the source of pulled inputs (R2), the fallback chain for shares resolution, the echo of inputs with source lineage for reproducibility, and the meaning of null value_per_share_cents (degenerate model, not zero) with a reason field. No contradiction with annotations (readOnlyHint, idempotentHint, destructiveHint).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized for the tool's complexity. It is front-loaded with the main purpose, followed by behavioral details, usage guidance, and persistence note. Every sentence serves a purpose, no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (two-stage model, 8 parameters, sensitivity grid, input overrides, fallback chains, null handling) and the presence of an output schema, the description covers all necessary aspects: inputs, outputs, edge cases, alternatives, and persistence. It leaves no ambiguity for an AI agent to select and invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so the baseline is 3. The description adds valuable context beyond schema: it defines FCF base and net debt consistently with other tools (get_financial_ratios, get_capital_allocation_profile), explains the shares resolution fallback chain, and mentions the echo behavior. This provides richer meaning for the parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool computes a forward DCF valuation using a two-stage Gordon-growth model, specifies inputs (growth, WACC, terminal assumptions) and outputs (per-share intrinsic value in cents and a sensitivity grid). It distinguishes from sibling tools by explicitly naming get_valuation_metrics as an alternative for pre-computed values and create_report for persistence.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit when-to-use (drive own assumptions) and when-not-to-use (use get_valuation_metrics instead for pre-computed values). It also clarifies that the tool does not persist a report, directing to create_report for that purpose. Alternatives are clearly named.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_alertCreate AlertAInspect
Persist an alert and register it with the firing pipeline. Five condition shapes:
filing_event— fire when a ticker files a chosen form type (8-K, 10-K, etc.).ratio_threshold— fire when a ticker's financial ratio crosses a threshold (e.g. interest_coverage < 1.5).watchlist_change— fire on any filing on any ticker in a named watchlist.price_move(Pro+) — fire when a ticker's close-to-close move over 1/5/21 trading days crosses a percent threshold in a given direction.fundamental_change(Pro+) — fire when a standard_concept reports a brand-new period or gets restated.
Delivery channels: email (transactional via Resend), webhook (HMAC-SHA256-signed POST), slack (hooks.slack.com incoming webhook), dashboard (in-app inbox), or agent_run (Pro+ — runs a standing agent team and delivers the finished artifact to your inbox). The cron evaluator runs every 5 minutes. Use test_alert to verify your channel is wired correctly before relying on the cron.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Human-readable label. | |
| channel | Yes | Delivery channel for a match — `email` (Resend transactional email), `webhook` (HMAC-SHA256-signed POST to your URL), `slack` (POST to a hooks.slack.com incoming-webhook URL), `dashboard` (in-app inbox, readable via list_alert_inbox), or `agent_run` (Pro+ — runs a standing agent team identified by its id, delivering the finished artifact to your inbox). | |
| condition | Yes | Condition evaluated each cron tick — a discriminated union of `filing_event` (a watched ticker files a new form), `ratio_threshold` (a financial ratio crosses a comparator/threshold), `watchlist_change` (a named watchlist's membership changes), `price_move` (Pro+ — a close-to-close move crosses a percent threshold), or `fundamental_change` (Pro+ — a standard_concept reports a new period or gets restated). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| alert | Yes | |
| cron_indexed | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate the tool is not read-only, not idempotent, and not destructive. The description elaborates on behavior: it persists an alert, registers it with a firing pipeline that runs every 5 minutes, and supports multiple channel types. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized for a complex tool, with a clear structure: opening statement, bulleted condition shapes, then delivery channels. It is front-loaded with the main action. Every sentence provides meaningful information, though it could be slightly more concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (3 required parameters with nested discriminated unions), the description covers all condition types and channels. The presence of an output schema reduces the need to discuss return values. The description is complete enough for an agent to understand usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. The description adds value by providing a narrative summary of condition shapes and channel types that enhances understanding beyond the schema's structured descriptions. For example, it explains delivery channel specifics like HMAC for webhook and slack URL requirements.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description starts with a specific verb ('Persist an alert and register it with the firing pipeline') and resource, clearly distinguishing what the tool does. It enumerates five distinct condition shapes and delivery channels, setting it apart from siblings like test_alert, list_alerts, and delete_alert.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool, including a note to verify channels with test_alert before relying on cron. It lists condition types with their requirements and Pro+ restrictions. However, it does not explicitly mention when not to use it or compare to alternative tools beyond test_alert.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_reportCreate Research ReportAIdempotentInspect
Synchronously generate a research report and persist it under the caller's authorship. Two subtypes:
• reverse_dcf — solves the stage-1 free-cash-flow growth rate the market price implies, with a 5×5 sensitivity grid across WACC × terminal-growth assumptions. Returns full markdown + structured JSON + every numerical claim's citation chain to the originating SEC accession.
• thesis — snapshot a saved thesis (via save_thesis) as a frozen narrative report with at-a-glance table, author notes, anchor fundamentals (latest annual), and lineage to the source filing. Later edits to the thesis do NOT propagate — generate a new report to capture new state.
Tier: sample tier rejected — reports are per-author state.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Optional human-supplied title; auto-generated when omitted. | |
| params | No | Reverse-DCF parameters — required for report_type=reverse_dcf. | |
| ticker | No | US-listed ticker — required for report_type=reverse_dcf. Case-insensitive. | |
| thesis_id | No | Id of a saved thesis owned by the caller — required for report_type=thesis. | |
| report_type | Yes | Subtype. `reverse_dcf` requires ticker + params; `thesis` requires thesis_id (from save_thesis / list_theses). | |
| idempotency_key | No | Optional key for at-most-once semantics. Same key from the same user always yields the same report id. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| report | Yes | |
| markdown | Yes | |
| sections | Yes | |
| citations | Yes | |
| structured | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds significant behavioral context beyond annotations (which indicate idempotentHint=true, destructiveHint=false, readOnlyHint=false). It explains that reports are per-author state, synchronous generation, and that the tool returns full markdown, JSON, and citation chains. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with bullet points for subtypes and clear separation of concerns. It is slightly lengthy but earns its length by explaining two distinct use cases. The main action is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (6 parameters, two subtypes, nested objects, and an output schema), the description is fairly complete. It explains the purpose of each parameter per subtype and the output format. The mention of 'per-author state' and 'idempotency_key' adds necessary context. The output schema is not described, but that is acceptable since it exists separately.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. The description adds value by grouping parameters by subtype and explaining their roles (e.g., params are for reverse_dcf, thesis_id for thesis). It does not add new syntax or formatting details beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool synchronously generates a research report and persists it under the caller's authorship, with two distinct subtypes (reverse_dcf and thesis) that are thoroughly described. It distinguishes itself from siblings like get_report, update_report, and delete_report by focusing on creation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use each subtype and which parameters are required (reverse_dcf requires ticker+params, thesis requires thesis_id). It also notes that edits to a thesis do not propagate to an existing report, implying a new report should be generated. However, it lacks explicit advice on when not to use this tool vs alternatives like compute_dcf.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_ruleCreate RuleAInspect
Persist a trigger -> action rule and register it with the evaluator. Seven trigger types accepted (alert_fired, schedule_tick, inbox_item, price_threshold, filing_event, manual, scheduled_task_wake) x six action types (run_team, send_alert, create_report, score_thesis, schedule_task, post_inbox). ONLY FOUR trigger types actually dispatch today: alert_fired, inbox_item, scheduled_task_wake, schedule_tick — use one of these for a rule that will really fire. price_threshold, filing_event, and manual are accepted and persisted (forward-compatible schema) but have NO live event source wired yet, so a rule created with one of them is saved as enabled:true and simply never fires — check the returned rule's trigger_wiring_status field ("live" vs "not_yet_wired") to confirm before relying on it. condition_expr is an OPTIONAL single comparison ("field op value", op one of gt/gte/lt/lte/eq, e.g. "price_change_pct gt 5") evaluated against the trigger event's payload — omit to fire on the trigger alone. Deliberately NOT a general expression language (no AND/OR, no loops) — this is both an anti-complexity and an anti-loop guard; compose multiple rules if you need more than one comparison. Use test_rule immediately after creating to verify it fires as expected WITHOUT spending a real dispatch. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Human-readable label. | |
| action | Yes | Discriminated union — what happens when the rule fires. | |
| trigger | Yes | Discriminated union — which signal fires this rule. | |
| condition_expr | No | Optional single comparison against the trigger payload, e.g. "price_change_pct gt 5". Omit to fire on the trigger alone. |
Output Schema
| Name | Required | Description |
|---|---|---|
| rule | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| warning | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (which only provide hints), the description discloses that some triggers are persisted but not wired to live sources, and that condition_expr is deliberately restricted as an anti-complexity guard. Also mentions the trigger_wiring_status field in the output.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured and front-loaded with core purpose, but somewhat lengthy. Every sentence is purposeful, but could be tightened slightly without losing value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of the tool (multiple trigger/action types, condition expression, tier restriction), the description is fully comprehensive. It covers live status, testing workflow, and output field mention, making it complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds substantial meaning: explains condition_expr format with examples, clarifies trigger and action discriminated unions, and provides context for fields like hmac_secret and condition_type_filter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: persisting a trigger-action rule and registering it with the evaluator. It lists the exact trigger and action types, and distinguishes from siblings like 'test_rule' by explicitly recommending its use for verification.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on when to use: which trigger types are actually live, that condition_expr is optional and limited to single comparisons, and to avoid general expressions. Recommends using test_rule for verification. Also mentions tier restriction.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
delete_alertDelete AlertADestructiveIdempotentInspect
Soft-delete an alert by its id (from create_alert/list_alerts): status flips to deleted and it is removed from the cron evaluator index so it stops firing. Alerts are immutable — to change one, delete then create_alert. Idempotent. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| alert_id | Yes | Identifier of the alert to soft-delete, as returned by create_alert or list_alerts. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| status | Yes | |
| alert_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate destructive and idempotent. Description adds soft-delete detail, status flip, cron removal, and tier info. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, each packed with useful information, front-loaded with key verb and effect. No redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With output schema present, description doesn't need to cover return. Covers deletion behavior, idempotency, tier, and parameter source. Complete for this simple tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers 100% with clear parameter description. Description does not add new semantic meaning beyond schema, so baseline 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'Soft-delete an alert by its id', specifies the status change and removal from cron evaluator. Distinguishes from siblings like create_alert and list_alerts.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says when to use (to delete) and when not to use (to change, instead delete then create). Mentions idempotent and tier constraints.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
delete_citation_overrideDelete Citation OverrideADestructiveIdempotentInspect
Remove a user-authored citation correction by fact_id. Idempotent — deleting a missing override returns deleted=false without error. Once deleted, reports that previously rendered the corrected value revert to the canonical fact value on next regeneration. Tier: paid + free (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| fact_id | Yes | Fact identifier whose citation override should be removed, as returned by save_citation_override or list_citation_overrides. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| deleted | Yes | |
| fact_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (idempotentHint=true, destructiveHint=true), the description adds valuable behavioral details: returning deleted=false for missing overrides and the revert effect on reports. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with four short sentences, each adding distinct information: purpose, idempotency, side effects, and tier. Front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists and covers return values, the description sufficiently explains behavior on missing overrides and downstream effects. The tier note adds access context. No gaps are evident.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and the schema already describes fact_id with length constraints and a helpful description referencing save_citation_override and list_citation_overrides. The tool description adds minimal new parameter information ('by fact_id'), so the baseline of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Remove') and the resource ('citation correction') identified by fact_id. It distinguishes from siblings like save_citation_override and list_citation_overrides by specifying removal and idempotent behavior.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides context for when to use the tool: it mentions idempotency (safe to call on missing overrides) and the effect on reports. However, it does not explicitly state when not to use or mention alternatives beyond the sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
delete_claimDelete ClaimADestructiveIdempotentInspect
Soft-delete a claim by id. The row and its score history are preserved for audit (archived, not erased); the claim drops out of default list_claims results. Idempotent — deleting an already-archived claim succeeds.
Tier: all paid + free tiers (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| claim_id | Yes | Id of the claim to archive. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| archived | Yes | |
| claim_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds significant context beyond annotations: soft-delete, data preservation for audit, idempotency even if already archived. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise – two sentences plus a tier line – with no unnecessary words. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given a single parameter, good annotations, and presence of output schema (implied), the description covers behavior, idempotency, audit trail, and tier restrictions adequately.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and the schema already describes the parameter. The description adds little beyond confirming 'by id', so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states a specific verb ('soft-delete'), resource ('claim by id'), and distinguishes from other delete tools by explaining the archival behavior and idempotency.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions idempotency and the effect on list results but does not explicitly state when to use this tool vs alternatives (e.g., other delete tools). The tier info is present but not usage-specific.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
delete_reportDelete (Soft) Research ReportADestructiveIdempotentInspect
Soft-delete a report owned by the caller: status flips to delisted, visibility to private — not a hard delete, the row and R2 artifact are preserved (90-day audit window). Idempotent (deleting an already-delisted report succeeds). Sample tier rejected.
| Name | Required | Description | Default |
|---|---|---|---|
| report_id | Yes | Id from `create_report` or `list_my_reports`. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| status | Yes | |
| report_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description goes well beyond annotations by explaining the soft-delete mechanism, preservation of data for 90 days, idempotency details, and the 'Sample tier rejected' constraint. Annotations only provide hints, while the description fully discloses the non-destructive yet state-changing nature of the operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise, using only three sentences to cover purpose, behavior, idempotency, audit window, and access notes. All information is front-loaded and necessary, with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one parameter, no output schema needed due to existing output schema), the description covers the main aspects: what it does, how it behaves, and constraints. However, the 'Sample tier rejected' note is vague and could confuse an agent about whether the tool is available; otherwise, it is complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with the parameter already well-described as an ID from create_report or list_my_reports. The description does not add new parameter semantics beyond that, but the baseline of 3 is appropriate since the schema handles it fully.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool performs a soft delete of a report owned by the caller, specifying exact state changes (status to delisted, visibility to private). It distinguishes itself from sibling delete tools by focusing on reports and ownership, leaving no ambiguity about the resource and action.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for use: only for reports owned by the caller, with idempotent behavior. It implies when to use (soft delete) but does not explicitly compare with other delete tools or state when not to use. The 'Sample tier rejected' note hints at access restrictions, adding some guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
delete_ruleDelete RuleADestructiveIdempotentInspect
Delete a rule by id (from create_rule/list_rules) — removes it from both the catalog and the evaluator's scan index, so it stops firing immediately. Rules are immutable — to change one, delete then create_rule. Idempotent. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| rule_id | Yes | Identifier of the rule to delete. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| status | Yes | |
| rule_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate destructiveHint=true and idempotentHint=true. The description adds crucial context: the rule is removed from both the catalog and evaluator's scan index, it stops firing immediately, and it's idempotent. It also mentions the tier restriction, which is behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise, with two sentences that front-load the action and key details. Every word adds value, and the structure is clear and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple deletion tool with a single parameter, the description is complete. It explains the effect, idempotency, tier restriction, and the immutability workaround. The presence of an output schema (not shown but noted) further supports completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description adds value beyond the input schema by specifying the source of the rule_id ('from create_rule/list_rules'), which helps the agent know where to obtain valid identifiers. The schema already covers the parameter with 100% coverage, so the description enhances usability.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Delete a rule by id'), specifies the resource ('rule', sourced from create_rule/list_rules), and describes the effects (removes from catalog and scan index, stops firing). It distinguishes from siblings by mentioning create_rule and list_rules as sources for the id.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use this tool (to delete a rule) and provides guidance on immutability ('to change one, delete then create_rule'). It also notes the tier restriction (sp500+). While it doesn't explicitly list alternatives, the context is sufficient for decision-making.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
delete_thesisArchive Saved ThesisADestructiveIdempotentInspect
Soft-delete a saved thesis: status flips to archived (the row stays for audit / re-scoring). Idempotent — archiving an already-archived thesis succeeds. Hard-delete is not supported by design; future versions may expire archived theses after N years. This does not delete the claims linked to the thesis — use delete_claim for those. Tier: paid + free (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| thesis_id | Yes | Id returned by `save_thesis` or `list_theses`. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| status | Yes | |
| thesis_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description adds context beyond annotations: soft-delete nature, idempotency, audit trail, no claim deletion, future expiration. Consistent with idempotentHint=true and destructiveHint=true. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences plus one about tier. Front-loaded with main action. Every sentence adds unique information, no redundancy. Highly efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With output schema present, description doesn't need return details. Covers all key aspects: behavior, idempotency, limitations (no hard delete, no claim deletion), future plans, tier. Complete for a single-param mutation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Only one parameter thesis_id with schema coverage 100%. Description adds value by specifying its origin (from save_thesis or list_theses), which helps the agent know valid input sources. Slightly above baseline 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it soft-deletes (archives) a thesis, using specific verb 'delete_thesis' with resource 'saved thesis'. Distinguishes from hard-delete and specifies behavior (status flips to 'archived'). Sibling contrast (delete_claim) is noted.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says when to use (for archiving) and when not to (hard-delete not supported, use delete_claim for claims). Mentions tier restrictions. Provides clear guidance on alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
delete_watchlistArchive WatchlistADestructiveIdempotentInspect
Soft-delete a watchlist by its name (not id): status flips to archived (still readable via list_watchlists status=all/archived). The name is freed for reuse by a new save_watchlist. Idempotent. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Watchlist name to soft-delete (case-insensitive, 1–80 chars); frees the name for reuse. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| status | Yes | |
| watchlist_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare destructiveHint=true and idempotentHint=true. The description adds context: explains soft-delete mechanics, clarifies that the watchlist remains readable via list_watchlists with specific status filter, and mentions tier restriction. Does not contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise: two sentences plus a tier note. All information is front-loaded with no fluff. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given single parameter, full schema coverage, annotations, and presence of output schema, the description is complete. It covers all necessary behavioral aspects like soft-delete, archiving, idempotency, and tier restriction.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed parameter description. The description adds value by highlighting the 'by name not id' aspect and 'frees the name for reuse,' which enriches the schema's documentation. Baseline 3, slight improvement.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (soft-delete), resource (watchlist), and key behavior (status flips to archived, name freed for reuse). It distinguishes from hard delete and mentions the alternative list_watchlists for reading archived entries.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit context: when to use (to archive a watchlist), idempotency, tier restriction (sp500+), and what happens after (name reusable). It implicitly advises against using for permanent deletion since it's a soft delete.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
describe_schemaDescribe Data SchemaARead-onlyIdempotentInspect
Returns the Parquet schema for all tables in the Valuein SEC data warehouse. Includes table descriptions, column names, types, primary keys, and foreign-key references. Use this tool to understand the data model before querying with other tools. No data reads required — schema is embedded in the manifest. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| table | No | Filter to a single table name (e.g. 'fact', 'entity', 'references'). Omit to return the full schema for all tables. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| table | No | Single-table mode: the requested table name |
| tables | No | Full-schema mode: map of table name → { description, column_count, columns } |
| columns | No | Single-table mode: map of column name → definition |
| project | No | Full-schema mode: source project name |
| description | No | Single-table mode: the table's description |
| schema_version | Yes | Parquet schema version from the active R2 manifest |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the safety profile is already clear. The description adds valuable context that 'No data reads required — schema is embedded in the manifest', which explains the read-only nature beyond annotations. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise at three sentences, front-loaded with the core purpose, followed by details and usage advice. Every sentence adds value with no redundancy or fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has one optional parameter and an output schema (context confirms). The description covers the essential: what it returns, when to use, and that it is non-destructive. For a simple read-only schema tool, this is fully adequate and leaves no gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% (one parameter with schema description). Baseline is 3. The description adds extra meaning by stating 'Omit to return the full schema for all tables', clarifying behavior beyond the schema. This justifies a higher score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Returns' and the resource 'the Parquet schema for all tables in the Valuein SEC data warehouse'. It lists included elements (table descriptions, column names, types, primary keys, foreign-key references), and distinguishes from siblings by advising to use this tool before querying with others.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly says 'Use this tool to understand the data model before querying with other tools', providing clear context. It also notes that no data reads are required and it's available on all plans. However, it does not explicitly mention when not to use it or name alternative tools, but the guidance is sufficient.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
dismiss_inbox_itemDismiss Inbox ItemADestructiveIdempotentInspect
Soft-delete a single inbox item by its id (from list_alert_inbox) — not an alert id; sets dismissed_at. The row stays queryable via list_alert_inbox(include_dismissed=true) for audit. Idempotent. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| inbox_id | Yes | Identifier of the inbox item to dismiss (soft-delete), as returned by list_alert_inbox. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| inbox_id | Yes | |
| dismissed | Yes | |
| unread_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds crucial behavioral details beyond annotations: explains it is a soft-delete (sets dismissed_at, row remains queryable), idempotency, and tier restrictions. No contradiction with annotations (destructiveHint=true aligns with soft-delete).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Concise at ~50 words, front-loaded with the key action. Each sentence adds distinct value (soft-delete, parameter source, audit note, idempotency, tier). Optimal length with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter tool with good annotations and output schema, the description covers all necessary aspects: purpose, parameter semantics, behavioral details, and access constraints. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds extra context by specifying the parameter must come from list_alert_inbox and is not an alert id, which adds meaning beyond the schema's description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool soft-deletes a single inbox item, specifies the parameter source (from list_alert_inbox), and distinguishes it from an alert id. It also notes the effect (sets dismissed_at) and idempotency, making the purpose highly specific and distinct from siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context on what the tool does and how to use it (parameter from list_alert_inbox, not alert id). However, it does not explicitly compare to sibling 'mark_inbox_read' or state when not to use it, so slightly lacking in exclusion guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
forensic_auditForensic Audit (Beneish + Sloan + Solvency)ARead-onlyIdempotentInspect
Deterministic forensic-accounting scores for a single ticker: partial Beneish M-Score, Sloan accruals, and a solvency snapshot. Returns a red-flag narrative ranked by severity, with citations to source filings. Used by the forensic_earnings_brief SOP.
Note: full Beneish needs AR / current assets / PPE / SGA / current liabilities, which aren't in our fundamentals model. We compute the recoverable subset (SGI + TATA + LVGI) and flag partial=true. Tier: sp500+.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Stock ticker symbol of the company to audit, e.g. AAPL, MSFT, BRK.B. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| result | Yes | |
| ticker | Yes | |
| sec_url | Yes | |
| period_end | Yes | |
| source_filing | Yes | |
| prior_period_end | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true. The description adds that the tool is deterministic, returns red-flag narratives with citations to source filings, and discloses limitations (partial Beneish due to missing data). This provides useful behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with three sentences that quickly convey purpose, output, and limitations. No redundant information; every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has an output schema (not shown but noted), so description does not need to detail return values. It covers purpose, limitations, and usage context (SOP). However, it could mention that the output includes a narrative with citations, which it does. Nearly complete for agent decision-making.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% for the single parameter (ticker). The description does not add new semantic details beyond the schema, but confirms it is for a single ticker. Baseline 3 is appropriate since schema already documents the parameter well.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it computes deterministic forensic-accounting scores (partial Beneish M-Score, Sloan accruals, solvency snapshot) for a single ticker, and returns a red-flag narrative with severity and citations. This is specific and distinct from sibling tools like get_earnings_signals or get_company_fundamentals.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions it is used by the forensic_earnings_brief SOP, providing context, but does not explicitly state when to use this tool vs alternatives or when not to use it. No direct comparison to siblings is given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_comps_xlsxGenerate Peer Comparables Workbook (xlsx)AInspect
Render a peer comparables table into an Excel workbook. The Comps sheet is formatted as a named Excel Table (ValueinPeerComps) so the user gets one-click Insert Chart on any column — the cleanest workaround for not embedding chart objects server-side. Subject-row highlight makes side-by-side comparison instant. A Summary sheet adds subject vs peer-median deltas.
SERVER-TRUST: the ratios you pass are rendered as-supplied and are NOT re-derived by Valuein, so the workbook carries a visible 'figures supplied by caller, not verified by Valuein' watermark (response verification.status = 'unverified'). For authoritative numbers, source them from get_peer_comparables / get_financial_ratios first.
Pair with get_peer_comparables for a typical flow.
Tier: pro+.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Optional free-text note (≤500 chars) rendered on the Summary sheet. | |
| peers | Yes | Peer companies to tabulate against the subject (1–50 rows); each row carries the peer's ticker, name, and comparable ratio values. | |
| subject_ticker | Yes | Stock ticker symbol of the subject company the comps sheet is built around, e.g. AAPL. | |
| subject_company_name | No | Optional display name for the subject company; falls back to the ticker if omitted. |
Output Schema
| Name | Required | Description |
|---|---|---|
| url | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| r2_key | Yes | |
| filename | Yes | |
| expires_at | Yes | |
| size_bytes | Yes | |
| content_type | Yes | |
| verification | Yes | Server-trust record. Comps ratios are rendered as supplied and are NOT re-derived by Valuein, so the workbook carries a visible 'figures supplied by caller' watermark. Pull authoritative ratios via get_peer_comparables / get_financial_ratios. |
| expires_in_seconds | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses that ratios are rendered as-supplied without server-side verification, resulting in 'unverified' watermark. Explains Excel table feature and subject-row highlight. No contradiction with annotations (readOnlyHint: false, destructiveHint: false).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Concise yet informative: first sentence states purpose, then key features, then trust note, then pairing. No wasted words; good structure with section headers.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, output format, behavioral nuances, workflow integration, and limitations. Complete given 4 parameters and output schema existence.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% parameter description coverage, so baseline is 3. Description adds context like optional notes on Summary sheet and fallback for subject_company_name, but does not significantly enhance parameter understanding beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States verb 'Render' with resource 'peer comparables table into an Excel workbook'. Specifies output format, subject-highlight, and pairing with get_peer_comparables, distinguishing it from siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises pairing with get_peer_comparables for authoritative figures, and warns about watermark for unverified data. Provides clear when-to-use and when-not-to-use guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_dcf_xlsxGenerate DCF Workbook (xlsx)AInspect
Render a forward DCF result into a professional Excel workbook (Summary + 5×5 Sensitivity heatmap + Inputs sheet). Native conditional formatting — no chart images needed. Returns a 15-minute presigned R2 download URL.
SERVER-TRUST: the DCF is re-derived in-Worker from the supplied inputs_echo (the math is pure + deterministic) and the workbook renders Valuein's recomputed figures — never the caller's claimed values. If the claimed figures disagree, the workbook is still produced but stamped with a visible correction banner and the response verification.status is 'corrected'. A fabricated per-share value can never appear as Valuein-authoritative.
Pair with compute_dcf for a typical analyst flow: agent calls compute_dcf({ticker, ...}), then passes the structured result straight to generate_dcf_xlsx({ticker, dcf_result, ...}) to materialise a shareable file.
Tier: pro+.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Stock ticker symbol of the company the DCF workbook is built for, e.g. AAPL. | |
| dcf_result | Yes | Structured DCF result — typically the `result` field returned by `compute_dcf`. | |
| company_name | No | Optional — surfaces on the cover row. Falls back to ticker only. |
Output Schema
| Name | Required | Description |
|---|---|---|
| url | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| r2_key | Yes | |
| filename | Yes | |
| expires_at | Yes | |
| size_bytes | Yes | |
| content_type | Yes | |
| verification | Yes | Server-trust record. status='verified' when the caller's figures matched the server re-derivation; 'corrected' when they did not (the workbook shows the SERVER figures + a banner). `mismatches` lists every field that disagreed. |
| expires_in_seconds | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds significant context beyond annotations: re-derivation in-worker, correction banner for discrepancies, 15-minute presigned URL, and that fabricated values cannot appear. Annotations are non-contradictory (not read-only, not destructive).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is comprehensive but slightly long; each sentence adds value. Could be trimmed slightly, but front-loaded with key verbs and details. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complex nested input schema, the description covers purpose, output, trust model, correction behavior, and pairing. With output schema existing, it need not detail return structure. Fully adequate for an agent to use correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so description adds marginal value. It notes dcf_result comes from compute_dcf and company_name appears on cover row, but these are minor. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it renders a DCF result into an Excel workbook with specific sheets (Summary, Sensitivity heatmap, Inputs). It names the output format and distinguishes from sibling compute_dcf, which computes the DCF.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly pairs with compute_dcf for a typical analyst flow, describing the sequential call pattern. Also explains behavior when claimed figures disagree (correction banner) and re-derivation in-worker, guiding appropriate usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
generate_research_brief_docxGenerate Research Brief (docx)AInspect
Render a structured research brief into a professionally-styled Word document — cover, abstract, optional snapshot table, body sections, and a citations table with clickable SEC EDGAR links. No embedded charts in v1; pair with generate_dcf_xlsx / generate_comps_xlsx for visuals the analyst pastes in.
SERVER-TRUST: prose, snapshot rows, and citations are rendered as-supplied and are NOT verified by Valuein, so the brief carries a visible 'figures supplied by caller, not verified by Valuein' watermark (response verification.status = 'unverified'). Resolve each citation via verify_fact_lineage before publishing.
Consumes the same sections + citations shape create_report emits, so the typical flow is two tool calls: create_report → generate_research_brief_docx.
Tier: pro+.
| Name | Required | Description | Default |
|---|---|---|---|
| title | Yes | Document title rendered on the cover page (1–200 chars). | |
| ticker | Yes | Stock ticker symbol the brief covers, e.g. AAPL, MSFT, BRK.B. | |
| abstract | No | Optional executive-summary paragraph (≤2000 chars) shown after the cover page. | |
| sections | Yes | Ordered body sections of the brief (1–20); each has a heading and body text. | |
| snapshot | No | Optional at-a-glance metric rows (≤20) rendered as the snapshot table. | |
| citations | No | Optional source citations (≤60) rendered as a table with clickable SEC EDGAR hyperlinks. | |
| company_name | No | Optional display name shown on the cover page; falls back to the ticker if omitted. |
Output Schema
| Name | Required | Description |
|---|---|---|
| url | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| r2_key | Yes | |
| filename | Yes | |
| expires_at | Yes | |
| size_bytes | Yes | |
| content_type | Yes | |
| verification | Yes | Server-trust record. Brief prose, snapshot rows, and citations are rendered as supplied and are NOT verified by Valuein, so the brief carries a visible 'figures supplied by caller' watermark. Resolve each citation via verify_fact_lineage for one-click SEC verification. |
| expires_in_seconds | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description clearly discloses that content is not verified by Valuein, resulting in a watermark and unverified status in the response. This goes beyond the annotations (readOnlyHint/destructiveHint) to inform the agent of trust implications.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections, front-loaded with the main purpose. It is slightly verbose (e.g., repeating 'cover, abstract...' from parameter descriptions), but every sentence adds value. Could trim redundant references to the same content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema (not shown), the description adequately covers purpose, usage, behavioral notes, and integration with other tools. It explains the watermark and verification status, and the typical workflow with create_report. No significant gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
All 7 parameters have schema descriptions covering length, defaults, and roles. The description adds context about how parameters map to document sections (cover, snapshot, citations) and integration with create_report, but does not significantly enhance parameter meaning beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description precisely states the tool renders a structured research brief into a Word document with specific components (cover, abstract, snapshot table, body sections, citations). It distinguishes itself by specifying no embedded charts and referencing sibling tools for visuals, leaving no ambiguity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit when-to-use guidance (pair with generate_dcf_xlsx/generate_comps_xlsx for charts), a typical flow (create_report then this tool), and a prerequisite (resolve citations via verify_fact_lineage before publishing). It also notes the tool is tier pro+.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_agent_runGet Agent RunARead-onlyIdempotentInspect
Fetch full detail for one of the caller's own standing-agent runs by id (from list_agent_runs) — status, goal, tickers, cost, artifact ids, role breakdown, and any error. A run may have been triggered by this same agent or by the customer's own Workspace; this tool works either way. Returns found: false (not an error) for an unknown id OR an id belonging to another customer — there is no distinguishing signal, by design. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| run_id | Yes | Run identifier, from list_agent_runs. |
Output Schema
| Name | Required | Description |
|---|---|---|
| run | No | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| found | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description adds valuable behavioral info: returns 'found: false' for unknown ids or ids belonging to another customer by design (no distinguishing signal), and mentions Tier restriction. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with no redundancy. First sentence front-loads purpose and returned fields. Second sentence clarifies scope. Third sentence covers error cases and access tier. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter tool with an output schema (not shown but implied by field listing), the description covers all necessary aspects: what it returns, edge cases, access control. No gaps identified.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers 100% of parameters with description 'Run identifier, from list_agent_runs.' Description adds context about where the id comes from and edge-case behavior (found:false). Baseline 3 is exceeded due to extra usage-derived context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'Fetch full detail' and clearly identifies the resource 'standing-agent runs by id'. It distinguishes from sibling tool 'list_agent_runs' by referencing it as the source of ids and detailing the returned fields (status, goal, tickers, cost, artifact ids, role breakdown, error).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description states when to use: to get detail for a run from list_agent_runs. It clarifies both agent-triggered and workspace-triggered runs are supported. It does not explicitly mention when not to use or name alternatives, but the context is clear enough given the sibling list.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_blockholdersBlockholders (SC 13D / 13G)ARead-onlyIdempotentInspect
Returns SC 13D / SC 13G blockholder disclosures (5%+ stakes) for a US public company. Each row carries percent_owned, sole/shared voting + dispositive split, schedule_type, and the first-class going_active flag — TRUE when the same filer flipped 13G → 13D within the lookback window (the single most actionable activist signal in this dataset). Use latest_only=true (default) to dedupe to the most recent filing per filer. Use collapse_groups=true to fold multi-person filings into one row. Institutional tier only.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Stock ticker symbol of the issuer. | |
| as_of_date | No | PIT filter on accepted_at — only filings on or before this date. | |
| latest_only | No | When true (default), keep only the most recent filing per (filer, schedule prefix) — typically what analysts want. Set false to see the full filing history. | |
| lookback_days | No | Window for the going_active (13G → 13D) detection. Default 365 days. | |
| lineage_detail | No | Per-row provenance envelope. | compact |
| collapse_groups | No | When true, fold multi-reporting-person filings into a single row, with secondary persons in the ``persons[]`` field. Default false: each person stays as its own row. | |
| schedule_filter | No | Which schedule(s) to return. '13D' = activist (intent to influence). '13G' = passive. 'both' = no filter. | both |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| rows | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| company_name | Yes | |
| data_age_days | Yes | |
| staleness_warning | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint. The description adds valuable context about the going_active flag, default parameter behaviors (latest_only, collapse_groups), and the institutional tier restriction, which enhances transparency beyond the structured annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (about 6 sentences) with no wasted words. It uses code formatting for emphasis and is front-loaded with the core purpose. Every sentence adds value, including direct usage hints and explanation of the most important flag.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema (so no need to detail return values), the description covers the key data fields, parameter defaults, and the unique going_active detection. It explains the dedup and group collapse options, making it complete for an analyst to understand the tool's functionality.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. The description adds extra meaning for key parameters like latest_only (dedup logic) and collapse_groups (folding multi-person filings), and explains the going_active flag and its significance, which is not fully covered in the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states it returns SC 13D/13G blockholder disclosures for US public companies with specific data fields. It is specific about the resource and verb but does not explicitly differentiate from sibling tools like get_institutional_holdings or get_top_holders.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context (US public companies, blockholder filings) and mentions 'Institutional tier only' as a restriction, but it does not provide explicit guidance on when to use this tool versus alternatives or when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_capital_allocation_profileCapital Allocation ProfileARead-onlyIdempotentInspect
Get a multi-year capital allocation breakdown for a US public company. Shows how management deploys cash across all six categories — capex, R&D, M&A, dividends, buybacks, and debt — plus pre-computed deployment ratios (% of operating cash flow) and over-distribution flags. Use this tool when the user asks: how does a company allocate capital, what's the buyback-vs-dividend mix, is the company over-distributing, is growth funded by R&D or M&A, what's the cash-return-ratio trend, or any 'where does the money go' question — including owner-earnings (Buffett-style) and reinvestment-rate (Damodaran-style) analysis. Data sourced from annual 10-K filings; PIT-safe via as_of_date. R&D is included as a deployment category (the primary growth-reinvestment vehicle for knowledge-economy firms), but since it's already deducted before operating cash flow, rd_pct_ocf is INFORMATIONAL and total_deployment_pct_ocf EXCLUDES R&D to preserve the cash-flow identity (OCF = capex + M&A + dividends + buybacks + debt repayment + Δcash). The flags object carries pre-computed booleans: buybacks_exceed_fcf, total_returns_exceed_fcf (buybacks + dividends > FCF), and debt_funded_distribution (over-distribution funded by leverage vs cash). Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT | |
| as_of_date | No | Point-in-time date (YYYY-MM-DD). Only returns facts with accepted_at on or before this date — eliminates look-ahead bias. | |
| lookback_years | No | Number of fiscal years to look back from the most recent filing (1–20). Defaults to 5 years for a full capital allocation cycle. |
Output Schema
| Name | Required | Description |
|---|---|---|
| data | Yes | Per-period capital-allocation rows: capex, R&D, M&A, dividends, buybacks, debt, and deployment-mix flags |
| note | No | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| as_of_date | No | |
| lookback_years | Yes | Number of fiscal years summarized |
| periods_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations (readOnlyHint=true, idempotentHint=true) are complemented by detailed behavioral context: data source (10-K filings), PIT-safety via as_of_date, R&D treatment rationale, and flag definitions. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is somewhat lengthy but well-structured: purpose first, then use cases, then technical details. Every sentence adds value given the tool's complexity. Could be slightly more concise but is effective.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists (not shown but noted), the description covers all needed context: data source, scope (US public companies), annual frequency, and special handling of R&D. Complete for a data retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear parameter descriptions. The description adds minor context (e.g., 'PIT-safe', 'Defaults to 5 years') but these are already in the schema. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Get a multi-year capital allocation breakdown for a US public company' with a specific verb and resource. It distinguishes from sibling tools by listing exactly which user questions trigger this tool, making it unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description provides explicit when-to-use guidance with example questions ('how does a company allocate capital', 'what's the buyback-vs-dividend mix', etc.) and mentions specific analysis styles. Lacks explicit 'do not use' or alternatives, but the use-case list is sufficiently clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_claimGet ClaimARead-onlyIdempotentInspect
Fetch a single claim by id, plus the ids of theses it supports/refutes and its full append-only score history. Use this to inspect a claim's evidence, current status, and how its outcome has evolved.
Tier: all paid + free tiers (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| claim_id | Yes | Id returned by save_claim or list_claims. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| claim | Yes | |
| score_events | Yes | |
| linked_thesis_ids | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds context about the append-only score history and inclusion of thesis IDs, confirming non-destructive behavior. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences: first states action and result, second gives usage guidance. Concise and front-loaded with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists, the description adequately covers purpose, usage, and parameter source. The tool is simple with a single parameter, and the description is complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Only one parameter, claim_id, with description 'Id returned by save_claim or list_claims.' Schema coverage is 100%. The description adds meaningful origin context beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Fetch' and the resource 'claim by id', and specifies what additional data is returned (thesis IDs, score history). It distinguishes from siblings like list_claims and save_claim.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states 'Use this to inspect a claim's evidence, current status, and how its outcome has evolved.' Provides clear context for when to use, though no explicit 'when not to use' is given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_company_fundamentalsCompany FundamentalsARead-onlyIdempotentInspect
Retrieve standardized SEC EDGAR fundamental financial metrics for a US public company. Returns revenue, gross profit, operating income, net income, EPS (diluted), total assets, total liabilities, stockholders' equity, cash & equivalents, total debt, operating cash flow, and capital expenditures for one or more fiscal periods. Data sourced from 10-K (annual) and 10-Q (quarterly) filings. Point-in-time: no look-ahead bias — pass as_of_date (YYYY-MM-DD) to reconstruct exactly the information set known on that date. This returns the raw as-reported line items; for pipeline-computed ratios (margins, ROE, ROIC, leverage, per-share) use get_financial_ratios, and for margins combined with DCF/DDM model inputs use get_valuation_metrics — both derive from the figures this tool returns.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of periods to return (1–40). Defaults to 5. | |
| period | No | Filing period granularity. Annual uses 10-K; quarterly uses 10-Q. | annual |
| strict | No | When true, fail with PLAN_LIMIT_EXCEEDED if the plan cannot satisfy the requested limit. Default false: return what's available and explain the gap in _meta.truncation. | |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT, BRK.B | |
| as_of_date | No | Point-in-time date (YYYY-MM-DD). Only returns facts with accepted_at on or before this date — eliminates look-ahead bias for backtesting. Omit for the full dataset. | |
| fiscal_year | No | Fiscal year (YYYY). Omit to return the most recent available years. | |
| lineage_detail | No | Per-period provenance envelope + per-metric availability/provenance sidecars. 'compact' (default) returns source_filing + source_url + restated flag, plus lean per-metric availability + fact_id + source_filing. 'full' adds first_filed_at + accepted_at + per-metric source_url + computed inputs[]. 'off' omits all provenance. | compact |
| response_format | No | Output shape. 'flat' (default) returns the legacy `metrics` object plus the additive `metrics_availability`/`metrics_provenance` sidecars. 'envelope' additionally attaches `metric_envelopes` — one canonical {metric,value,unit,scale,period,availability,provenance} object per metric. | flat |
Output Schema
| Name | Required | Description |
|---|---|---|
| data | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| period | Yes | |
| ticker | Yes | |
| as_of_date | Yes | |
| company_name | Yes | |
| years_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds significant behavioral context beyond annotations: it explains the lack of look-ahead bias with as_of_date, the behavior when strict is false (return available data and explain truncation), and the data source (10-K/10-Q). It does not contradict the annotations (readOnlyHint, idempotentHint, destructiveHint).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (4 sentences) and front-loaded with the primary purpose. Every sentence adds value: purpose, metrics, data source, point-in-time feature, and sibling differentiation. No redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (8 parameters, output schema exists), the description covers all essential aspects: what it returns, data source, point-in-time ability, behavior of strict parameter, and alternatives. The output schema documents return values, so no need to repeat them.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema already covers all 8 parameters with descriptions. The description adds context like 'no look-ahead bias' for as_of_date and the meaning of strict parameter behavior. While the schema is complete, the description enhances understanding of usage scenarios.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb 'Retrieve' and clearly identifies the resource: 'standardized SEC EDGAR fundamental financial metrics for a US public company.' It lists specific metrics and distinguishes from sibling tools (get_financial_ratios, get_valuation_metrics), making the purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use this tool versus alternatives: 'for pipeline-computed ratios... use get_financial_ratios, and for margins combined with DCF/DDM model inputs use get_valuation_metrics.' It also explains the point-in-time usage for backtesting, providing clear guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_compute_ready_streamCompute-Ready StreamARead-onlyIdempotentInspect
Returns a short-lived (15-min) download URL for a bulk Parquet object that can be piped directly into Python/DuckDB/Polars for high-throughput computation that exceeds the MCP context window. The URL streams the object straight from Valuein storage and supports HTTP range reads, so duckdb.read_parquet(url) / pl.read_parquet(url) work without downloading the whole file first. Datasets: fact (per-entity partition — requires ticker), ratio (all computed ratios), valuation (DCF inputs), filing (SEC filing metadata), references (company universe), index_membership (historical index composition). Scoped to the caller's tier bucket; the link is signed and cannot be used to list the bucket or read other objects.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | No | Required when dataset_type is 'fact'. Resolves to the per-entity fact/{CIK}.parquet partition for that company. | |
| dataset_type | Yes | Dataset to access. 'fact' requires ticker (per-entity partition). All others are full-universe tables. |
Output Schema
| Name | Required | Description |
|---|---|---|
| url | No | Signed, time-limited (15-min) download URL for the Parquet object (Range-enabled) |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| scope | No | What the presigned URL is scoped to (method, object_key_only, etc.) |
| usage | No | Ready-to-run DuckDB / Polars snippets |
| bucket | No | |
| format | No | |
| ticker | No | |
| url_hash | No | |
| expires_at | No | |
| object_key | No | |
| dataset_type | Yes | |
| expires_in_seconds | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description goes beyond annotations by disclosing the 15-minute URL expiration, HTTP range read support, scope to caller's tier bucket, and security constraints (signed link prevents listing or reading other objects). It also explains dataset-specific behavior (ticker required for 'fact'). No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, information-dense paragraph that front-loads the core action (returns a short-lived URL) and then efficiently covers all relevant details. Every sentence earns its place, with no fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 2 parameters, 100% schema coverage, and an output schema (implied by the URL return), the description covers the tool's purpose, usage, behavioral details, dataset options, and prerequisites. It is sufficiently complete for an agent to invoke correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the description adds significant value by explaining that 'ticker' is required only for dataset_type='fact' and resolves to per-entity partitions, while other datasets are full-universe tables. It also clarifies the partition naming convention, which is absent from the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns a short-lived download URL for bulk Parquet objects, specifying the exact use case (high-throughput computation beyond MCP context). It lists the available datasets, making the resource unambiguous and distinguishing it from sibling tools like 'get_company_fundamentals' which return structured data directly.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context on when to use the tool (for high-throughput computation exceeding context window) and how it works (streams from storage, supports range reads). However, it does not explicitly mention when not to use it or list alternative tools for smaller data.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_earnings_signalsEarnings SignalsARead-onlyIdempotentInspect
Reported earnings results and a model-derived earnings-trend signal for a company, by fiscal period: actual reported EPS, a trailing-trend EPS estimate (eps_trend_est), the deviation of actual vs that trend (eps_surprise_pct), reported revenue, and year-over-year revenue growth. IMPORTANT: eps_trend_est is NOT Wall Street analyst consensus — Valuein is sourced purely from SEC EDGAR and carries no consensus feed. It is a deterministic estimate computed from the company's own prior reported EPS, so eps_surprise_pct measures how far the print landed from its own trailing trend, not whether it 'beat the Street'. Use it to track earnings/revenue trajectory and momentum, not to claim a consensus beat or miss. Point-in-time safe — pass as_of_date to filter by SEC acceptance (accepted_at) for look-ahead-free backtests. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of periods to return (1–40), most recent first. Defaults to 8 — covers 2 years of quarterly signals plus their TTM equivalents. earnings_signals.parquet currently emits one row per (entity, period_end); older rows surface here as more historical periods are published. | |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT | |
| as_of_date | No | Point-in-time filter: only return signals with accepted_at on or before this date. Use for backtesting to avoid look-ahead bias. |
Output Schema
| Name | Required | Description |
|---|---|---|
| data | Yes | |
| note | Yes | |
| plan | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| as_of_date | No | |
| estimate_basis | Yes | |
| periods_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint, idempotentHint, destructiveHint, and the description aligns, adding that the tool is safe for point-in-time use. It transparently explains that eps_trend_est is not consensus but a deterministic estimate from the company's own prior data, and that eps_surprise_pct measures deviation from the company's trend. This adds valuable context beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is relatively long but each sentence provides important information, such as the caveats about consensus and point-in-time safety. It front-loads the purpose and key outputs. While slightly verbose, it remains focused and structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (earnings signals with derived estimates), the description adequately covers the return values, key usage caveats, and data source. An output schema exists to handle the return format. The description is complete enough for an agent to understand the tool's behavior and limitations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and the description adds meaning by explaining the significance of eps_trend_est and eps_surprise_pct, and the default limit covers 2 years. The as_of_date parameter is reinforced for backtesting. This adds value beyond the schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool reports earnings results and a model-derived earnings-trend signal, specifying actual EPS, trailing-trend estimate, surprise percentage, revenue, and growth. It uniquely identifies the resource (company earnings by fiscal period) and distinguishes itself from siblings by highlighting its deterministic estimation from SEC EDGAR rather than analyst consensus.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use the tool (track earnings/revenue trajectory and momentum) and warns against using it to claim consensus beats or misses. It also notes point-in-time safety for backtesting with as_of_date. However, it does not explicitly compare to sibling tools or state when not to use it, but the context from the sibling list shows no direct competitors.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_financial_ratiosFinancial RatiosARead-onlyIdempotentInspect
Get pipeline-computed financial ratios from ratio.parquet. Served categories: profitability (margins, ROE, ROA, ROIC), liquidity (current ratio, quick ratio), leverage (D/E, interest coverage, net debt/EBITDA), efficiency (asset turnover, inventory days), per_share (EPS, BVPS, FCF/share), owner_earnings (Buffett FCF, owner yield), valuation (pe_ratio, pb_ratio, ev_ebitda, market_cap, dividend_yield), and the pipeline-emitted forensic, growth, and rank (cross-sectional *_sector_pctile) categories. NOT every category exists for every ticker — omit categories to get whatever this ticker has, or read available_categories in the CATEGORY_NOT_AVAILABLE envelope. valuation is LIVE (schema 2.18.0): price-derived multiples from EOD prices period-end-aligned — pipeline-derived, NOT strictly PIT (no accepted_at column on these rows). Includes TTM rows alongside annual; each row's is_calendar_aligned is TRUE only when period_end sits on the fiscal-year boundary (±7 days) — filter to TRUE when joining ratios to fact-table fundamentals on (entity, fiscal_year). For historical cuts use as_of_date (PIT by accepted_at when present, else by period_end — see the param). Use this instead of get_valuation_metrics when you only need ratios (no DCF wiring); use get_valuation_metrics when you also need DCF/DDM. Each ratio is a {value, unit, category, reason} entry with a response-level lineage (DerivedLineage) pointing to get_company_fundamentals / verify_fact_lineage for filing-level provenance; a null value carries a reason (e.g. INPUT_MISSING) so missing is never a real zero. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Number of distinct period_end dates to return (1–20). Defaults to 5. Within each period, all matching ratio_names are included. | |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT | |
| as_of_date | No | Historical cutoff (canonical cross-tool date param). PIT by SEC accepted_at when the ratio data carries it (latest value knowable on/before the date, zero look-ahead, _meta.pit_safe=true), else by ratio.period_end (pit_safe=false). For guaranteed accepted_at PIT use get_company_fundamentals. | |
| categories | No | Ratio categories to include (see the enum). Omit to return every category this ticker has. `valuation` (pe_ratio, pb_ratio, ev_ebitda, market_cap, dividend_yield) is LIVE since schema 2.18.0 — price-derived, period-end-aligned, not strictly PIT. Availability is per-ticker (the envelope lists this ticker's available_categories). | |
| fiscal_period | No | Filter to a specific fiscal period type. Use 'TTM' for trailing twelve months. Omit to return both annual (FY) and TTM rows. | |
| period_end_before | No | Alias of as_of_date (as_of_date preferred — the canonical name). Returns ratios with period_end on or before this date. |
Output Schema
| Name | Required | Description |
|---|---|---|
| data | Yes | |
| note | Yes | |
| plan | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| lineage | No | Provenance for pipeline-derived values (ratio.parquet / factor_scores.parquet): source table + pipeline computed_at, plus a pointer to the tools that return filing-level lineage. NOT point-in-time (recomputed on each pipeline run). |
| periods_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. Description adds rich behavioral context: return structure with value/unit/category/reason, lineage pointing to get_company_fundamentals, null value handling with reasons, PIT behavior for as_of_date, and LIVE nature of valuation categories. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is long but front-loads core purpose and sibling differentiation. Every sentence adds necessary nuance. Could be slightly trimmed but remains focused.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 6 parameters, output schema exists, and tool complexity, description is comprehensive: covers return format, missing values, lineage, PIT behavior, category handling, alignment field, plan availability. Leaves minimal gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for all 6 parameters. Description adds extra meaning: explains categories behavior in detail (availability, omission), as_of_date PIT logic, period_end_before as alias, limit range. Adds value beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool retrieves pipeline-computed financial ratios and lists categories. It distinguishes from sibling tool get_valuation_metrics with explicit guidance on when to use each.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use and when-not-to-use advice, including differentiation from get_valuation_metrics. Explains category availability and suggests omitting categories to get all available. Discusses as_of_date behavior and is_calendar_aligned for joins.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_insider_sentimentInsider Sentiment (composite)ARead-onlyIdempotentInspect
Role-weighted insider sentiment score on a fixed [-100, +100] scale for a single issuer over a lookback window. Role weights: CEO/CFO = 3.0 (via officer_title pattern), other NEO Officer = 2.0, 10%-Owner = 1.5, Director = 1.0. P = +1, S = -1; option exercises, grants, and tax withholdings are neutralised. Cluster flag = TRUE when ≥3 distinct insiders transacted within any 30-day window inside the lookback. Institutional tier only.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Issuer ticker symbol. | |
| lookback_days | No | Days back from today to scan transactions for. Default 180. | |
| cluster_window_days | No | Sliding window for the cluster_flag detection. Default 30 days. |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| buy_count | Yes | |
| sell_count | Yes | |
| cluster_flag | Yes | |
| company_name | Yes | |
| lookback_days | Yes | |
| total_buy_usd | Yes | |
| total_sell_usd | Yes | |
| sentiment_score | Yes | |
| top_contributors | Yes | |
| total_buy_shares | Yes | |
| total_sell_shares | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations (readOnlyHint, idempotentHint, destructiveHint) already indicate safe read behavior. The description adds significant algorithmic detail: role weights, P/S coding with neutralization of certain transaction types, and cluster detection logic, providing context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single paragraph that efficiently conveys the core purpose first, then algorithmic details. It is front-loaded and reasonably concise, though could be slightly shorter.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema and annotations, the description fully covers the tool's behavior including formula details, constraints (institutional tier), and cluster flag logic. No gaps remain for agent invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for each parameter. The description adds context on lookback window and cluster detection but does not add new meaning beyond what the schema already provides, thus baseline score applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool computes a role-weighted insider sentiment score on a fixed [-100, +100] scale for a single issuer over a lookback window. It details role weights, transaction coding, and cluster flag, clearly distinguishing it from siblings like get_insider_transactions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description does not explicitly state when to use this tool vs. alternatives like get_insider_transactions. It mentions 'Institutional tier only' implying access restrictions, but lacks exclusions or comparative guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_insider_transactionsInsider TransactionsARead-onlyIdempotentInspect
Form 3 / 4 / 5 / 144 line items for a US public company. Returns each transaction (or initial holding / proposed sale) with the insider's name, role, transaction code, share count, price, and notional. Filters by lookback window, transaction code (P=purchase, S=sale, A=grant, M=option exercise, F=tax withholding, etc.), insider role, and minimum share threshold. Institutional tier only — sample / sp500 / pro return ENTITLEMENT_DENIED with an upgrade link.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum rows to return. Default 100, max 500. | |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT, BRK.B | |
| roles_in | No | Insider roles to keep. Omit to include any role. | |
| as_of_date | No | Point-in-time date (YYYY-MM-DD). Only returns transactions with accepted_at <= this date — eliminates look-ahead bias. When set, lookback_days is ignored. | |
| min_shares | No | Minimum |shares| per transaction. Omit for no floor. | |
| lookback_days | No | How many days back from today to scan transactions for. Ignored when as_of_date is set. | |
| lineage_detail | No | Per-row provenance envelope. 'compact' (default) returns source_filing + source_url. 'full' adds accepted_at. 'off' omits lineage. | compact |
| transaction_codes | No | SEC transaction codes to keep (uppercase, single-letter): P=purchase, S=sale, A=grant, M=option exercise, F=tax withholding, G=gift, J=other. Unknown codes are rejected. Omit to include all codes. |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| rows | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| company_name | Yes | |
| data_age_days | Yes | |
| staleness_warning | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only and idempotent behavior. Description adds important behavioral details: ENTITLEMENT_DENIED for insufficient tier, as_of_date eliminating look-ahead bias, and lineage_detail options. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single dense paragraph with front-loaded purpose sentence. Every sentence adds value: core function, returned fields, filters, tier restriction. No fluff or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 8 parameters with full schema coverage and an output schema, the description covers all necessary aspects: purpose, filters, behavior, tier constraints. Agent has sufficient information to correctly select and invoke the tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% description coverage, so baseline is 3. Description adds meaningful context beyond schema, such as explaining that as_of_date overrides lookback_days and clarifying transaction code meanings. It does not duplicate schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states tool returns insider transactions (Form 3/4/5/144 line items) for a US public company. It specifies the data fields and filtering options, distinguishing it from sibling tools like get_insider_sentiment.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description explicitly mentions tier restriction (Institutional tier only) and explains that lower tiers return ENTITLEMENT_DENIED. Provides filtering guidance but does not directly compare with alternatives or state when not to use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_institutional_holdingsInstitutional Holdings (by issuer)ARead-onlyIdempotentInspect
Returns top-N institutional holders of a US public company at a specific period_end (latest by default), with aggregate institutional shares, total market value, holder count, and HHI concentration (sum of squared share-of-total percentages). Sourced from Form 13F-HR via the by-issuer partition. Institutional tier only. 13F filings carry a ~45-day reporting lag — staleness_warning fires when latest data is older than 90 days.
| Name | Required | Description | Default |
|---|---|---|---|
| top_n | No | Maximum holders to return, ranked by market_value_usd. Default 25, max 200. | |
| ticker | Yes | Stock ticker symbol of the issuer. | |
| as_of_date | No | Point-in-time cutoff (YYYY-MM-DD): only 13F filings ACCEPTED by SEC on or before this date are considered, applied BEFORE the latest period is resolved. A 13F/A amendment or late filing accepted after this date is excluded (zero look-ahead) — use this for survivorship-free backtests. Omit for the latest knowable book. | |
| period_end | No | Quarter-end of the 13F reporting period (YYYY-MM-DD). Omit to use the latest period available. This is a REPORTING period, NOT a point-in-time cutoff — use as_of_date for that. | |
| lineage_detail | No | Per-row provenance envelope. compact / full / off. | compact |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| rows | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| aggregate | Yes | |
| as_of_date | Yes | The point-in-time cutoff actually applied (echo of the as_of_date input). Null when no PIT cut was requested — never confuse this with the reporting period_end. |
| period_end | Yes | The 13F REPORTING period the rows belong to — NOT a point-in-time cutoff. |
| company_name | Yes | |
| data_age_days | Yes | |
| holders_count | Yes | |
| hhi_concentration | Yes | |
| staleness_warning | Yes | |
| total_market_value_usd | Yes | |
| options_positions_count | Yes | Option positions (put_call set) excluded from totals/HHI/rows. rows[] are common-stock 13F holdings only. |
| total_institutional_shares | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses important behavioral traits beyond annotations: ~45-day reporting lag, staleness_warning at 90 days, use of as_of_date for backtesting with zero look-ahead, and data sourcing from 'by-issuer partition'. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with the main output, followed by source, tier, and lag details. Every sentence adds value without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all key aspects: what is returned, data source, lag, tier, and staleness warning. Output schema handles return details, so description is complete for selection and invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed descriptions for each parameter. The description adds context such as 'latest by default' for period_end and the meaning of as_of_date for backtesting, which enhances understanding beyond what the schema provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the tool returns top-N institutional holders for a US public company with specific metrics (aggregate shares, market value, holder count, HHI concentration). Distinguishes from siblings by specifying 'Institutional tier only' and 'Form 13F-HR via the by-issuer partition', making its scope unique among related tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context for use by describing data source (13F filings) and the staleness warning. Implicitly differentiates from other holdings tools (e.g., insider, blockholders) but does not explicitly name alternatives or provide when-not-to-use guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_manager_portfolioManager Portfolio (13F by filer)ARead-onlyIdempotentInspect
Returns a 13F filer's full portfolio at a specific period_end (latest by default), with QoQ deltas vs the prior quarter (new / increased / decreased / exited / unchanged). Specify the filer either by filer_cik (preferred) or filer_name (fuzzy match against entity.name; multiple matches raise an ambiguity error so you can disambiguate by CIK). Institutional tier only.
| Name | Required | Description | Default |
|---|---|---|---|
| top_n | No | Maximum positions to return, ranked by market_value_usd. Default 25. | |
| filer_cik | No | CIK of the 13F filer (1-10 digits; will be zero-padded to 10). Preferred over filer_name when known. | |
| as_of_date | No | Point-in-time cutoff (YYYY-MM-DD): only 13F filings ACCEPTED by SEC on or before this date are considered, applied BEFORE the latest + prior periods (and the QoQ basis) are resolved. A 13F/A amendment or late filing accepted after this date is excluded (zero look-ahead). Omit for the latest knowable portfolio. | |
| filer_name | No | Filer name to fuzzy-match against entity.name. Case-insensitive substring match. Multiple matches raise INVALID_ARGUMENT — use filer_cik in that case. | |
| period_end | No | Quarter-end (YYYY-MM-DD). Omit to use latest available. This is a REPORTING period, NOT a point-in-time cutoff — use as_of_date for that. | |
| lineage_detail | No | Per-row provenance envelope. | compact |
Output Schema
| Name | Required | Description |
|---|---|---|
| rows | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| aggregate | Yes | |
| filer_cik | Yes | |
| as_of_date | Yes | The point-in-time cutoff actually applied (echo of the as_of_date input). Null when no PIT cut was requested — never confuse this with the reporting period_end. |
| filer_name | Yes | |
| period_end | Yes | The 13F REPORTING period the positions belong to — NOT a point-in-time cutoff. |
| data_age_days | Yes | |
| positions_count | Yes | |
| prior_period_end | Yes | |
| staleness_warning | Yes | |
| total_market_value_usd | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Aligns with annotations (readOnly, idempotent, non-destructive) and adds details about fuzzy matching, error handling, and the role of as_of_date vs period_end, which annotations do not cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences covering purpose and key parameter guidance. Front-loaded with primary action. No redundancy, but could be slightly more structured with bullet points or separated sections.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of output schema, the description adequately covers input parameters and constraints. Additional context like 'Institutional tier only' is useful. No missing critical information for a read-only tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage, the description adds value by explaining preferred parameter usage, fuzzy match behavior, and the distinction between as_of_date and period_end.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it returns a 13F filer's full portfolio with QoQ deltas, using specific verb 'Returns' and resource. It distinguishes from siblings like get_institutional_holdings by specifying 'by filer' and including deltas.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear guidance on specifying filer via CIK or name, disambiguation, and parameter defaults. Lacks explicit when-not-to-use or alternatives, but the context is sufficient for correct invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_morning_briefGet Morning BriefARead-onlyIdempotentInspect
Read the caller's Morning Brief — a daily BYO-LLM-generated market digest covering overnight moves across the customer's own watchlists and theses, produced by the Workspace. Omit day to get the most recent brief available (not necessarily today's); pass a specific day (YYYY-MM-DD) to fetch that day's brief. It is normal for no brief to exist yet if the customer hasn't set up or recently generated one — that returns found: false, not an error. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| day | No | Specific day to fetch (YYYY-MM-DD). Omit to get the most recent brief available for this customer. |
Output Schema
| Name | Required | Description |
|---|---|---|
| day | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| found | Yes | |
| model | No | |
| status | No | |
| provider | No | |
| created_at | No | |
| body_markdown | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds context beyond annotations: explains that omitting day returns most recent (not necessarily today's), that no brief returns found:false (not error), and mentions tier restriction. Annotations already indicate idempotent/read-only, so this adds value without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with clear structure. Front-loaded with purpose, then parameter logic, then edge cases. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, parameter behavior, expected output for edge case, and access tier. With output schema present, return values need not be detailed. Complete for a read-only retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with parameter description, but the tool description adds nuance: explains the behavior of omitting vs providing day, and clarifies that no brief is a normal state. This extends the schema's meaning.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool reads the caller's Morning Brief, a daily market digest. It specifies the content (overnight moves across watchlists and theses) and distinguishes it from other read/get tools like get_report or get_company_fundamentals.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on parameter usage: omit day for most recent, pass specific day with format. Also explains expected behavior when no brief exists. Does not explicitly compare to alternative tools but covers the primary use case well.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_peer_comparablesPeer ComparablesARead-onlyIdempotentInspect
Get ratio-based peer comparison for a company and its closest competitors. Peers are selected by matching 2-digit SIC industry code. Returns pipeline-computed ratios from up to 10 peers alongside the subject company for direct benchmarking. Ratio categories: profitability, liquidity, leverage, efficiency, per_share, owner_earnings, valuation. TTM (trailing twelve months) ratios are used when available for the most current view. Use as_of_date to compare peers at a specific historical date. PIT semantics for the figure leg are data-driven: when the ratio data carries an SEC accepted_at timestamp, as_of_date filters point-in-time by accepted_at (zero look-ahead, _meta.pit_safe=true); when it does not (today's data), the cut is by ratio.period_end (_meta.pit_safe=false). NOTE: peer SELECTION still uses CURRENT S&P 500 membership as a size/relevance ranking proxy regardless of as_of_date (W3-G2). Available on every plan — sample returns the subset covered by the sample bucket.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of peers to return alongside the subject company (1–10). Defaults to 5. | |
| ticker | Yes | Subject company ticker, e.g. AAPL. Peers are auto-selected by SIC code. | |
| as_of_date | No | Historical cutoff (canonical cross-tool date param) for the FIGURE leg: PIT by ratio accepted_at when present (latest-knowable, zero look-ahead, _meta.pit_safe=true), else by ratio.period_end (pit_safe=false). Peer SELECTION still uses current S&P 500 membership as a ranking proxy regardless of as_of_date (W3-G2). | |
| categories | No | Ratio categories to include in the comparison. Defaults to profitability, valuation, and leverage. | |
| period_end_before | No | Alias of as_of_date (as_of_date preferred — the canonical name). Only include ratios with period_end on or before this date. |
Output Schema
| Name | Required | Description |
|---|---|---|
| data | Yes | One row per company (subject + peers): ticker, cik, name, sector, industry, is_subject, ratios |
| note | No | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| lineage | No | Provenance for pipeline-derived values (ratio.parquet / factor_scores.parquet): source table + pipeline computed_at, plus a pointer to the tools that return filing-level lineage. NOT point-in-time (recomputed on each pipeline run). |
| subject | Yes | Subject ticker the peer set is built around |
| as_of_date | No | |
| categories | Yes | Ratio categories included in each peer panel |
| peers_returned | Yes | |
| subject_ratios | No | The subject company's ratio panel |
| period_end_before | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Describes detailed behavioral traits beyond annotations: peer selection methodology (SIC code, S&P 500 proxy), TTM ratios preference, PIT semantics with pit_safe flags, and as_of_date impact. Annotations already declare read-only, but description adds critical nuance.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Concise paragraph of ~100 words with front-loaded purpose. Every sentence adds non-redundant information: peer selection, categories, TTM, as_of_date behavior, PIT semantics, and plan availability. No waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensive for a read-only comparison tool with 5 parameters and output schema. Covers peer count, ratio categories, historical option, data freshness methodology, and a notable limitation (S&P 500 membership). Output schema handles return structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, baseline is 3. Description adds meaning by explaining as_of_date's dual role (PIT for figure leg, but not for peer selection), listing ratio categories, and noting limit bounds. Does not fully cover period_end_before alias nuance but adds value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'Get ratio-based peer comparison for a company and its closest competitors', with specific verb and resource. Explains peer selection via SIC code, distinguishing from sibling tools like get_financial_ratios which focus on single company ratios.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context for usage: when to use (benchmarking with peers), how to set historical date via as_of_date, and availability mention. Lacks explicit exclusions or direct alternative naming, but context is strong.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_pit_universePoint-in-Time UniverseARead-onlyIdempotentInspect
Use this tool to answer questions about historical index membership — e.g. "Was Company X in the S&P 500 on date Y?" or "Which companies were in the Russell 2000 on 2010-01-01?" Use this INSTEAD OF search_companies when the question involves a specific historical date or whether a company was an index member in the past — search_companies only returns current membership and cannot answer historical questions.
Returns a survivorship-free universe valid on a given as_of_date (only companies that existed and were members on that exact date — no hindsight). Supports SP500, RUSSELL1000, RUSSELL2000, RUSSELL3000 via index_membership.parquet (accurate join/leave dates, [) interval semantics). To check one company, pass its ticker + the target date: present = was a member, absent = was not.
Returns per company: CIK, ticker, name, sector, industry, SIC code, and per-row confidence (high/medium/low). _meta.pit_safe is true only when every matched row is high-confidence — treat low-confidence rows with caution. sector is SIC-derived (GICS-aligned, not licensed GICS) — a screening bucket, not an authoritative label.
Use as the first step of a quantitative backtest before get_compute_ready_stream. Returns an empty array (with error detail) if the date is out of range or has no coverage. Available on every plan — sample returns the subset covered by the sample bucket.
| Name | Required | Description | Default |
|---|---|---|---|
| index | No | Index filter. 'sp500' (~500 large caps), 'russell1000' (~1000 large/mid), 'russell2000' (~2000 small caps), 'russell3000' (~3000 broad market). Omit for no index filter (sector-only or full universe queries). | |
| limit | No | Maximum companies to return (1–3500). Defaults to 100. Universe is deduped to one row per CIK, so set near the index size (SP500 ~505, Russell 3000 ~3050). | |
| offset | No | Zero-based row offset for paging a large universe. At most 250 rows are inlined per call; when more match, the response carries a `truncation` envelope — pass its `next_offset` here (keeping the same `limit`) to fetch the next page. Defaults to 0. | |
| sector | No | Sector filter (case-insensitive substring) over the SIC-derived, GICS-aligned label (not licensed GICS — see tool description). E.g. 'Technology', 'Energy'. | |
| is_active | No | Filter to active (currently trading) companies only. Omit to include all. WARNING: setting this to true on a HISTORICAL query reintroduces survivorship bias — companies that were active on as_of_date but later went bankrupt or got acquired will be filtered out. Leave unset for true PIT backtests. | |
| as_of_date | No | Historical date (YYYY-MM-DD) for survivorship-free construction. Index queries use index_membership join/leave dates (entrants after the date excluded, later-removed members kept); sector queries use security valid_from/valid_to. Omit for the current universe. | |
| as_of_basis | No | Which date column drives historical construction. 'effective' (default) = effective_date/removal_date (first trading day; passive replication). 'announcement' = announcement_date/removal_announcement_date (S&P's public-announcement day; for inclusion-arb backtests) — rows with NULL announcement_date (mostly pre-2015) are skipped. | effective |
| include_share_classes | No | false (default) collapses to one row per CIK (index-provider convention — BRK counts once, not BRK-A + BRK-B). true returns every share-class row (GOOG and GOOGL separately) — for security-level analysis only. |
Output Schema
| Name | Required | Description |
|---|---|---|
| note | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| index | Yes | |
| sector | Yes | |
| coverage | Yes | |
| companies | Yes | |
| as_of_date | Yes | |
| truncation | No | Present only when the inline-row cap withheld rows. Page with `next_offset` (keep the same `limit`) or pull the full set via get_compute_ready_stream. |
| as_of_basis | Yes | |
| coverage_gap | Yes | |
| universe_size | Yes | |
| survivorship_free | Yes | |
| confidence_summary | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=true and destructiveHint=false, which align with the read-only nature. The description adds substantial behavioral detail: it returns a survivorship-free universe with accurate join/leave dates using `[)` interval semantics, provides per-row confidence and `_meta.pit_safe` flag, explains sector is SIC-derived (not licensed GICS), and documents edge cases like out-of-range dates returning empty arrays. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is quite long but well-structured and front-loaded with the main use case. Every sentence adds value, and the information is logically organized (use case, behavioral details, parameter caveats, integration hints). Could be slightly more concise, but it remains efficient given the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has 8 optional parameters, a output schema exists, and the topic is complex (PIT universe with survivorship-free indexing, multiple indices, and hidden bias risks), the description covers all critical aspects: what is returned, how to use, caveats (survivorship bias with `is_active`, date range limitations), and integration with other tools. Nothing important is omitted.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, but the description adds meaningful context beyond the schema. For example, it explains the survivorship bias introduced by setting `is_active=true` on historical queries, the difference between 'effective' and 'announcement' for `as_of_basis`, and the rationale for `include_share_classes` defaulting to false. These enrich the understanding of parameters, though the schema already covers basic descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool answers historical index membership questions with specific examples (e.g., 'Was Company X in the S&P 500 on date Y?'). It distinguishes itself from the sibling tool `search_companies` by explicitly noting that `search_companies` only returns current membership. The verb 'answer questions about historical index membership' and additional use cases are specific and unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly instructs when to use this tool ('Use this INSTEAD OF `search_companies` when the question involves a specific historical date') and provides clear exclusions (e.g., 'WARNING: setting this to true on a HISTORICAL query reintroduces survivorship bias'). It also states its role as a first step in a quantitative backtest before `get_compute_ready_stream`, giving clear context and alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_pit_valuation_ratiosPoint-in-Time Valuation RatiosARead-onlyIdempotentInspect
Returns a full snapshot of valuation multiples for a company on a specific historical date — zero look-ahead bias (the 'Compustat + CRSP merge' pattern). The EOD close is sourced from stock_price_daily.parquet at as_of_date (or the nearest prior trading day), and all financial figures come from SEC filings with accepted_at ≤ as_of_date so no future information is used. TTM financials are computed by summing the four most recent standalone-quarter values (or using the most recent FY filing when no quarterly series is available). Returns: price snapshot (close, price_date, is_exact_date_match), TTM P&L (revenue, gross_profit, operating_income, EBITDA, net_income, OCF, CapEx, FCF), balance sheet snapshot (shares, cash, debt, book equity), derived market values (market_cap, enterprise_value), valuation multiples (P/E, P/S, P/B, EV/EBITDA, EV/Revenue, FCF yield %), and TTM margins (gross, operating, net). Use for: historical valuation screens, backtesting entry-point multiples, forensic audit of peak / trough valuations, comparing a company's current multiples to its own history. Coverage follows your plan tier: full = all companies & full history, pro = all companies & last 15 years, sp500 = S&P 500 only, sample = S&P 500 & last 5 years. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT | |
| as_of_date | Yes | The historical date for the valuation snapshot (YYYY-MM-DD). The EOD close on the nearest prior trading day will be used. All financials are PIT-filtered to filings accepted on or before this date. Use a date in the recent past (within the last year) to get current-ish multiples; use any historical date back to 1993 (subject to your plan's history window) to get the multiples as they would have been observable on that date. |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| cash | Yes | |
| note | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| close | Yes | |
| ticker | Yes | |
| ttm_fcf | Yes | |
| ttm_ocf | Yes | |
| currency | Yes | |
| net_debt | Yes | |
| pb_ratio | Yes | |
| pe_ratio | Yes | |
| ps_ratio | Yes | |
| ev_ebitda | Yes | |
| ttm_capex | Yes | |
| as_of_date | Yes | |
| ev_revenue | Yes | |
| market_cap | Yes | |
| price_date | Yes | |
| total_debt | Yes | |
| ttm_ebitda | Yes | |
| book_equity | Yes | |
| ttm_revenue | Yes | |
| company_name | Yes | |
| fcf_yield_pct | Yes | |
| net_margin_pct | Yes | |
| shares_diluted | Yes | |
| ttm_net_income | Yes | |
| ttm_period_end | Yes | |
| enterprise_value | Yes | |
| gross_margin_pct | Yes | |
| ttm_gross_profit | Yes | |
| is_exact_date_match | Yes | |
| operating_margin_pct | Yes | |
| ttm_operating_income | Yes | |
| financials_accepted_at | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description goes well beyond annotations by explaining the zero look-ahead bias, data sourcing (stock_price_daily.parquet and SEC filings filtered by accepted_at), TTM computation method, and the resulting output. It adds significant behavioral context beyond readOnlyHint and idempotentHint.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured, starting with the main function, then methodology, then detailed returns, then use cases, and finally plan coverage. It is front-loaded with the most important information. Every sentence is informative and no redundant verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description includes all necessary context: purpose, methodology, output details, use cases, and plan-dependent coverage. Given the presence of an output schema, the description sufficiently explains what the tool returns without needing to document return values.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds meaningful context to the 'as_of_date' parameter, explaining how the EOD close is determined (nearest prior trading day) and the history range. 'ticker' has examples. This adds value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns a full snapshot of valuation multiples for a company on a specific historical date with zero look-ahead bias. It distinguishes itself from sibling tools like get_valuation_metrics by emphasizing the point-in-time nature and lack of bias.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit use cases (historical valuation screens, backtesting, forensic audit) and mentions plan coverage. It does not explicitly state when not to use the tool or list alternatives, but the use cases are clear and contextual.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_price_historyPrice History (date range)ARead-onlyIdempotentInspect
Daily EOD bar series (OHLCV + adjusted_close) for a company over a date range. Returns up to 252 trading-day bars oldest-first. Each bar carries: open / high / low / close (raw, unadjusted), adjusted_close (vendor split/dividend-adjusted — retroactively restated on each corporate action; use for total-return backtests; NOT PIT-immutable), volume (shares traded), div_cash (ex-dividend cash per share on that date, 0 on non-dividend days), and split_factor (1.0 on non-split days). Omit start_date for the trailing year before end_date. Omit end_date for the latest available close. Coverage follows your plan's tier slice: full = all companies & all history, pro = all companies & last 15 years, sp500 = S&P 500 only, sample = S&P 500 & last 5 years. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of bars to return (1–252; default 252 ≈ 1 trading year). When the range contains more bars than `limit`, the most recent `limit` bars within the range are returned. | |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT | |
| end_date | No | Inclusive end of the date range (YYYY-MM-DD). Defaults to today (the latest available close). Weekends and holidays resolve to the last trading close on or before this date. | |
| start_date | No | Inclusive start of the date range (YYYY-MM-DD). Bars on or after this date are returned (up to `limit`). Omit to receive the `limit` most-recent bars before end_date. |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| bars | Yes | |
| note | Yes | |
| plan | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| end_date | Yes | |
| bar_count | Yes | |
| start_date | Yes | |
| company_name | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already mark readOnly and idempotent. Description adds crucial behavioral details: adjusted_close is retroactively restated (NOT PIT-immutable), returns oldest-first, limit behavior when range exceeds limit. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is well-structured, front-loaded with key information, uses bullet points for return fields. Every sentence adds value without unnecessary verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all necessary aspects: return values, edge cases for omitted dates, coverage tiers, and includes an output schema. Complete for a tool of moderate complexity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed parameter descriptions. Description adds value with extra context (e.g., tie-breaking for limit, default date resolution) that goes beyond schema, justifying a 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns daily EOD bar series (OHLCV + adjusted_close) over a date range. It distinguishes from siblings like get_stock_price (single point) and get_pit_universe, making the purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear usage context: omit start_date for trailing year, omit end_date for latest close, explains coverage tiers. Lacks explicit alternatives for when not to use, but the description sufficiently guides typical usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_reportGet Research ReportARead-onlyIdempotentInspect
Fetch the current HEAD of a report by id. format=markdown returns the rendered body, format=json returns the full structured payload (sections + citations + report-type-specific data), format=preview returns abstract-only. Authors see any of their own reports; non-authors only get preview of listed reports and need the report's required tier for full bodies. Sample-tier non-authors are downgraded to preview regardless of input. For an archived prior version use get_report_version, not this tool.
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | Response shape. Defaults to markdown. | markdown |
| report_id | Yes | Id from `create_report` or `list_my_reports`. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| format | Yes | |
| report | Yes | |
| markdown | Yes | |
| sections | Yes | |
| citations | Yes | |
| structured | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds significant behavioral context: the permission model (authors vs non-authors, sample-tier downgrades), which is crucial and not covered by annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is dense with information but every sentence serves a purpose. It is front-loaded with the core action, and no unnecessary words are present.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite the complexity of the tool (multiple format modes and permission rules), the description covers all necessary details: what it does, when to use alternatives, parameter semantics, and behavioral constraints. An output schema exists, so return values need not be described further.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description enriches meaning: it explains each format enum value ('rendered body', 'full structured payload', 'abstract-only') and specifies the origin of `report_id` (from `create_report` or `list_my_reports`). This goes beyond the schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with 'Fetch the current HEAD of a report by id' – a clear verb+resource statement. It distinguishes itself from sibling `get_report_version` by explicitly noting that this tool is for the current version only.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool versus `get_report_version` for archived versions. It also details permission rules, making it easy for the agent to decide when this tool is appropriate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_report_versionGet Report VersionARead-onlyIdempotentInspect
Author-only fetch of a specific archived version of one of your reports, by positive-integer version. Returns metadata + the full payload (sections, citations, structured, markdown) — enough to render a diff against the current HEAD in the workspace editor. Use after list_report_versions identifies the version number you want; for the current HEAD use get_report instead.
| Name | Required | Description | Default |
|---|---|---|---|
| version | Yes | Version number to fetch (from list_report_versions). | |
| report_id | Yes | Identifier of the report whose archived version to fetch, as returned by create_report or list_my_reports. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| payload | Yes | |
| version | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds context beyond annotations: 'Author-only' (auth requirement), 'Returns metadata + the full payload... to render a diff'. Annotations already provide readOnlyHint, idempotentHint, destructiveHint.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, zero waste. Front-loaded with purpose, then usage guidance, then return value. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Complete for a 2-param tool with output schema. Covers purpose, usage sequence, auth, and return contents. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% with good descriptions. Description adds minimal extra: 'by positive-integer version' and 'from list_report_versions', but baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb+resource: 'fetch of a specific archived version of one of your reports'. Distinguishes from siblings: 'for the current HEAD use get_report instead' and mentions list_report_versions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'Use after list_report_versions identifies the version number you want', and when not to: 'for the current HEAD use get_report instead'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_sec_filing_linksSEC Filing LinksARead-onlyIdempotentInspect
Get direct links to original SEC EDGAR filings for any US public company. Returns four per-filing deep links: sec_url (the EDGAR filing-index page listing every document), viewer_url (the cgi-bin Financial-Report viewer for the specific accession), inline_viewer_url (the SEC Inline-XBRL viewer opened on the rendered primary document — the strongest provenance link, null when the filing is not Inline-XBRL), and document_url (a direct link to the rendered primary document itself — opens the actual filing, never the index page, null only when primary_document is unknown). Prefer inline_viewer_url ?? document_url ?? viewer_url ?? sec_url. Supported form_types (enum): 10-K, 10-Q, 8-K, 20-F, 40-F, 10-K/A, 10-Q/A, 20-F/A, 40-F/A. Other forms (6-K, DEF 14A, Form 4, 13F) are NOT yet exposed by this tool — use describe_schema to confirm the parquet has them, then read raw via the SDK. 8-K item codes are filterable via event_types (e.g. ['2.02'] for earnings, ['1.01'] for material agreements, ['5.02'] for officer changes). PIT-safe — filings are filtered by accepted_at, never by report_date alone. Use this instead of verify_fact_lineage when you want a list of filings; use verify_fact_lineage when you want one specific fact-to-filing trace. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of filings to return (1–50). Defaults to 10. | |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT | |
| end_date | No | Inclusive upper bound on filing_date (YYYY-MM-DD). E.g. '2023-12-31'. | |
| form_types | No | Filing form types to include. Defaults to 10-K and 10-Q. | |
| start_date | No | Inclusive lower bound on filing_date (YYYY-MM-DD). E.g. '2023-01-01'. | |
| event_types | No | 8-K item codes to filter by. E.g. ['1.01'] for material agreements, ['2.01'] for asset acquisitions, ['5.02'] for director/officer changes. Only relevant when form_types includes '8-K'. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| filings | Yes | |
| filings_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description provides extensive behavioral details beyond annotations: it explains the four returned links with null conditions, the preference order for links, and that the tool is PIT-safe. It does not contradict any annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with the core purpose. Every sentence adds value, covering return types, fallback logic, supported forms, and usage notes, without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multiple parameters, output schema), the description is complete: it explains return values, fallback logic, PIT-safety, and unsupported forms. The output schema exists but the description enriches understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds context beyond schema: it explains the preference order for URLs, enumerates form_types with notes, and clarifies event_types usage. This adds meaningful value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description starts with a clear verb+resource: 'Get direct links to original SEC EDGAR filings for any US public company.' It explicitly distinguishes from sibling tools like verify_fact_lineage, and provides specific details on the output links.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description gives explicit when-to-use guidance: 'Use this *instead of* verify_fact_lineage when you want a list of filings; use verify_fact_lineage when you want one specific fact-to-filing trace.' It also notes unsupported form types and suggests alternative approaches.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_smart_money_flowSmart Money Flow (composite)ARead-onlyIdempotentInspect
Composite flow score on [-100, +100] aggregating insider transactions, 13F institutional Δ-shares vs the prior quarter, and SC 13D/13G blockholder changes over a lookback window. Each component normalised independently, then combined with configurable weights (default: institutional 0.4, blockholder 0.4, insider 0.2). Returns per-component attribution so an agent can see WHY the score is what it is — not just the headline number. NOTE: the institutional component is a QoQ share-change signal computed over the top-5 13F filers on a MATCHED current-vs-prior basis (a filer only counts when its prior-quarter book is observable), NOT the issuer's complete institutional book — treat the score as a directional signal, not an exact flow. coverage.coverage_confidence (0–1) reports how much of that basis had a real prior quarter; when it is 0 the institutional component is forced to 0 so a 13F ingestion gap can never surface as a false max-conviction buy. See the coverage block for holder coverage + staleness. The score is a unitless composite, not a dollar figure. Institutional tier only.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Issuer ticker symbol. | |
| as_of_date | No | Point-in-time cutoff (YYYY-MM-DD) applied to all three legs (institutional, insider, blockholder) via SEC accepted_at — filings accepted after this date are excluded so the composite is computed with zero look-ahead. Omit for the latest knowable signal. | |
| lookback_days | No | Lookback window for insider + blockholder components. Default 90. | |
| weight_insider | No | Weight applied to the insider component (0–1). | |
| weight_blockholder | No | Weight applied to the blockholder component (0–1). | |
| weight_institutional | No | Weight applied to the institutional component (0–1). |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| weights | Yes | |
| coverage | Yes | Honesty block: the institutional signal is computed from a top-N 13F slice with a top-5-filer matched basis. Surfaces holder coverage + staleness so the composite is never read as the issuer's complete book. |
| as_of_date | Yes | The point-in-time cutoff actually applied (echo of the as_of_date input). Null when no PIT cut was requested — never the reporting period_end fabricated as a cutoff. |
| components | Yes | |
| period_end | Yes | The institutional 13F REPORTING period — NOT a point-in-time cutoff. |
| company_name | Yes | |
| composite_score | Yes | |
| insider_component | Yes | |
| blockholder_component | Yes | |
| institutional_component | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnly, idempotent, not destructive), the description reveals how each component is normalized and weighted, the lookback window, and crucially warns about the institutional component being top-5 matched filers. It also explains coverage confidence and staleness, adding significant behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is dense but well-structured, starting with the core purpose and then drilling into details. Every sentence adds necessary information; however, the note about institutional component is quite long but essential.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a composite tool with output schema present, the description covers the score range, component attribution, coverage metrics (confidence, staleness), limitations (directional rather than exact), and unitless nature. It is complete and self-contained.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed parameter descriptions. The description adds value by explaining the role of weights (defaults) and the interplay of aspects like lookback_days applying only to insider and blockholder components. Some redundancy exists but overall enriches understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it produces a composite flow score within [-100, +100] by aggregating insider, institutional, and blockholder data. It explicitly lists the three components and their normalization, distinguishing it from sibling tools like get_insider_sentiment, get_institutional_holdings, etc.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use this composite (e.g., for a holistic smart money flow signal) and notes critical caveats (e.g., directional signal, not exact flow, institutional component limitations). It lacks explicit 'when not to use' or direct alternatives, but the context is clear enough.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_stock_priceStock Price (as-of date)ARead-onlyIdempotentInspect
End-of-day closing price for a company AS OF any calendar date. Pass date to get the close on that day; if the date falls on a weekend or market holiday, it resolves backward to the most recent prior trading day's close (the price_date field tells you which day was actually used, and resolved_backward flags when it stepped back). Omit date for the latest available close. Closes are RAW (not split/dividend-adjusted); div_cash and split_factor carry the corporate-action factors for query-time total-return adjustment. This is EOD market data (not a SEC filing fact), so it carries a price_date rather than a fact_id. Coverage follows your plan's tier slice: full = all companies & all history, pro = all companies & last 15 years, sp500 = S&P 500 only, sample = S&P 500 & last 5 years. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| date | No | As-of calendar date (YYYY-MM-DD). Returns the close of the most recent trading day on or before this date — a weekend/holiday resolves to the prior trading close. Omit to get the latest available close. | |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| note | Yes | |
| plan | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| close | Yes | |
| ticker | Yes | |
| currency | Yes | |
| div_cash | Yes | |
| price_date | Yes | |
| company_name | Yes | |
| split_factor | Yes | |
| requested_date | Yes | |
| resolved_backward | Yes | |
| is_exact_date_match | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnlyHint, etc.), the description adds critical behaviors: backward date resolution for weekends/holidays, raw (unadjusted) closing prices, div_cash and split_factor for total-return adjustment, price_date vs fact_id, and plan-based coverage tiers. This fully informs the agent of edge cases and constraints.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and concise. It front-loads the core purpose, then efficiently covers behaviors, adjustments, and coverage in a few sentences with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (as indicated by context), the description needn't explain return values. It fully covers input semantics, edge cases, and plan limitations, making it complete for agent usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds some value: explains backward resolution for date, and omission behavior. However, this is relatively minor beyond the schema's own descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it provides 'end-of-day closing price for a company AS OF any calendar date,' with a specific verb (get) and resource (stock price). It distinguishes itself from siblings by focusing on EOD market data, unlike other tools for fundamentals, alerts, or reports.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use it: for historical or latest closing prices. It provides context on date resolution and plan coverage but does not explicitly contrast with sibling tools like get_company_fundamentals, though the unique purpose is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_thesisGet Saved ThesisARead-onlyIdempotentInspect
Fetch a single saved thesis by its id. Returns the full record including outcome (if scored). Returns NOT_FOUND if the id is unknown or belongs to another user. For the claims composing a thesis use list_claims_for_thesis; for an individual claim use get_claim. Tier: paid + free (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| thesis_id | Yes | Id returned by `save_thesis` or `list_theses`. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| thesis | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds valuable context about NOT_FOUND behavior and tier restriction (paid + free sample rejected), which goes beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with core action, no superfluous words. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple fetch tool with one parameter and an output schema, the description covers purpose, error handling, alternatives, and constraints, making it fully informative.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and schema describes thesis_id as 'Id returned by save_thesis or list_theses.' Description implies validity but doesn't add significant new meaning beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'Fetch a single saved thesis by its id', specifies the resource, and distinguishes from siblings by mentioning list_claims_for_thesis and get_claim for related operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly provides when to use (fetch a saved thesis), when not (for claims use alternatives), and includes error conditions (NOT_FOUND for unknown or other user's id) and tier information.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_top_holdersTop Holders (composite, classified)ARead-onlyIdempotentInspect
Classification-aware UNION across insider transactions (latest post_transaction_shares per insider), 13F institutional holdings, and SC 13D / 13G blockholder filings for one issuer. Each row carries holder_class ∈ {insider, institutional, blockholder_13D, blockholder_13G}. Dedupes overlapping filers by precedence (13D > 13G > institutional > insider). One call, classified cap table — Bloomberg charges separately for INSIDER, OWNER, and HDS; this consolidates them.
| Name | Required | Description | Default |
|---|---|---|---|
| top_n | No | Maximum holders to return, ranked by shares. Default 25. | |
| ticker | Yes | Issuer ticker symbol. | |
| as_of_date | No | Point-in-time cutoff (YYYY-MM-DD): only filings ACCEPTED by SEC on or before this date are considered across all three sources (institutional via accepted_at, insider via accepted_at, blockholders via accepted_at). Excludes amendments/late filings accepted after this date (zero look-ahead). Omit for the latest knowable cap table. | |
| period_end | No | 13F REPORTING period_end. Omit for latest. NOT a point-in-time cutoff — use as_of_date. |
Output Schema
| Name | Required | Description |
|---|---|---|
| cik | Yes | |
| rows | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | Yes | |
| staleness | Yes | Each source has its own as-of date and lag (13F ~45-day lag; 13D/G snapshots can be years old). Percentages from different-dated denominators are NOT directly comparable. |
| as_of_date | Yes | The point-in-time cutoff actually applied (echo of the as_of_date input). Null when no PIT cut was requested. NEVER equal to period_end unless explicitly supplied — a reporting period is not a knowable-as-of date. |
| period_end | Yes | The institutional 13F REPORTING period — NOT a point-in-time cutoff. |
| company_name | Yes | |
| sources_breakdown | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=true and destructiveHint=false, consistent with a read-only query. The description adds significant behavioral details: deduping by precedence (13D > 13G > institutional > insider), the meaning of as_of_date (cutoff based on accepted_at, excludes amendments after date), and that period_end is for 13F reporting period. This goes well beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is dense but not overly long: three sentences covering purpose, output format, deduping, and Bloomberg comparison. It is front-loaded with the core definition. Minor redundancy (e.g., 'Classification-aware' and 'classified cap table' could be merged) but generally efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (4 parameters, 1 required, output schema present), the description covers the union logic, classification, deduping, and parameter semantics. It does not need to explain the output schema because that is provided separately. The description is complete for an AI agent to understand the tool's behavior and when to invoke it.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
All four parameters have descriptions in the schema (100% coverage). The description adds value by explaining the behavioral semantics of as_of_date (point-in-time cutoff, zero look-ahead) and period_end (13F reporting period, not a cutoff). This enhances the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it is a 'Classification-aware UNION' across three distinct sources (insider transactions, 13F institutional holdings, SC 13D/13G blockholder filings) for one issuer, with output rows carrying a 'holder_class' field. This distinguishes it from siblings like get_insider_transactions, get_institutional_holdings, and get_blockholders.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains that this tool consolidates three separate data sources into one call, contrasting with Bloomberg's separate charges. However, it does not explicitly state when to use this tool versus its sibling tools (e.g., using get_blockholders for only blockholders). The guidance is strong but lacks direct exclusion statements.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_valuation_metricsValuation MetricsARead-onlyIdempotentInspect
Get comprehensive valuation and profitability metrics for a US public company. Returns per-period data combining computed ratios (gross_margin, operating_margin, net_margin, ROE, ROA, ROIC, debt_to_equity, FCF, FCF margin), price-derived valuation_multiples (current_price, market_cap, pe_ratio, pb_ratio, ev_ebitda, dividend_yield), and optional pre-computed DCF model inputs (WACC, fcf_base_per_share, stage1_growth_rate, terminal_growth_rate, dcf_value_per_share, ddm_value_per_share). Profitability/cash-flow/leverage fields come from fact.parquet (PIT-safe via accepted_at). valuation_multiples are LIVE (schema 2.18.0): they come from ratio.parquet's valuation category + stock_price.parquet period-end close (per-period current_price for every fiscal year), derived from EOD prices period-end-aligned. Each multiple is a {value, unit} pair (unit varies: x / USD / percent); a null value carries a null_reasons[field] PRICE_NOT_AVAILABLE code (no period-end-aligned close). DCF/DDM fields come from valuation.parquet (pipeline-computed, recomputed each run — NOT strictly PIT-safe) and are commonly null (newer tickers, transition periods, or before the valuation pipeline runs). Each null carries a null_reasons[field] code — ALWAYS check it before assuming zero (null != 0). For strict-PIT DCF, use the SDK or compute from get_company_fundamentals. Use this instead of get_financial_ratios when DCF/intrinsic value or price multiples matter; use get_financial_ratios when you only need the raw ratio table. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of periods to return (1–40). Defaults to 5. | |
| period | No | Filing period granularity. Annual uses 10-K; quarterly uses 10-Q. | annual |
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT | |
| as_of_date | No | Point-in-time date (YYYY-MM-DD). Only returns data with accepted_at on or before this date. Eliminates look-ahead bias for backtesting. | |
| fiscal_year | No | Fiscal year (YYYY). Omit to return most recent periods. |
Output Schema
| Name | Required | Description |
|---|---|---|
| data | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| period | Yes | |
| ticker | Yes | |
| dcf_pit | No | Present only when as_of_date is supplied. The DCF/DDM leg comes from valuation.parquet, which is filtered by created_at (the pipeline computation timestamp), NOT the SEC accepted_at — so even with an as_of_date cut the DCF figures are BEST-EFFORT point-in-time, not strictly look-ahead-free. The profitability/cash-flow/leverage legs ARE strictly PIT-safe (fact.parquet, accepted_at). |
| as_of_date | Yes | |
| periods_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint, idempotentHint, destructiveHint=false. The description adds critical behavioral details: data sources (fact.parquet PIT-safe, valuation_multiples LIVE, DCF fields not strictly PIT-safe), null handling with null_reasons codes, and recomputation behavior of DCF fields. This exceeds annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is thorough but somewhat long; however, it is well-structured with clear sections and front-loaded purpose. Every sentence adds necessary detail given the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's high complexity (multiple data sources, PIT considerations, null handling, sibling differentiation), the description covers all essential context. The presence of an output schema further reduces the burden.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage for all 5 parameters, each described. The description adds value by explaining the meaning of returned data (per-period, live vs PIT) and null handling, providing context beyond schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get comprehensive valuation and profitability metrics for a US public company.' It lists the categories of returned data and explicitly distinguishes from sibling tools like get_financial_ratios and compute_dcf.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance: 'Use this instead of get_financial_ratios when DCF/intrinsic value or price multiples matter; use get_financial_ratios when you only need the raw ratio table.' It also notes availability on all plans.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_watchlistGet WatchlistARead-onlyIdempotentInspect
Fetch a single watchlist (full ticker set + criteria) by its name, not an id (case-insensitive). NOT_FOUND if the name is unknown to this user. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Watchlist name to fetch (case-insensitive, 1–80 chars). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| watchlist | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate the tool is read-only and idempotent. The description adds extra transparency by detailing the case-insensitive lookup, error handling (NOT_FOUND), and a tier restriction ('sp500+ (sample rejected)'), which are not covered by annotations. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise, consisting of three short sentences that cover the main action, error case, and tier restriction. Every sentence adds value and the most important information is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter, full schema coverage, output schema exists, annotations present), the description is complete. It covers the lookup key, error condition, and access restrictions, providing sufficient context for an AI agent to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter 'name' is fully described in the schema (100% coverage). The description reiterates case-insensitivity and adds context about error when name is unknown, but does not add new semantic meaning about the parameter itself beyond the schema. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool fetches a single watchlist by name, specifies the return includes full ticker set and criteria, and distinguishes from sibling tools like list_watchlists by emphasizing the lookup by name rather than id. It also notes case-insensitivity and an error condition for unknown names.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides guidance by stating the input (name, not id) and an error condition when the name is unknown. However, it does not explicitly contrast with sibling tools (e.g., list_watchlists) to guide when to use this tool over alternatives, though the purpose implicitly differentiates.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
link_claim_to_thesisLink Claim to ThesisAIdempotentInspect
Attach a claim to a thesis with a role: 'supports' (the claim, if true, strengthens the thesis), 'refutes' (if true, weakens it — track disconfirming evidence first-class), or 'context' (relevant but not directional). Idempotent — re-linking updates the role. A claim can support one thesis and refute another.
This composes theses from claims; it does NOT make the thesis score a function of claim scores (they're scored independently). Tier: paid + free (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| role | Yes | Relational role of the claim toward the thesis. | |
| claim_id | Yes | Id of the claim (from save_claim/list_claims). | |
| thesis_id | Yes | Id of the thesis (from save_thesis/list_theses). |
Output Schema
| Name | Required | Description |
|---|---|---|
| link | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description expands on annotations by confirming idempotency (updates role on re-link), clarifying that claims can have multiple relationships (support one thesis, refute another), and stating the tool is non-destructive. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise yet informative, with three clear paragraphs. The first sentence front-loads the core action. Every sentence adds value, and the structure logically groups purpose, behavior, and constraints.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (three roles, idempotency, relational constraints), the description fully covers its functionality, side effects, and tier restrictions. The presence of an output schema means return values need no further explanation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the description adds meaning by explaining the role semantics, idempotent behavior for parameters, and the source of claim and thesis IDs. This goes beyond the schema's basic descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: attaching a claim to a thesis with one of three roles. It uses a specific verb ('attach') and resource ('claim to thesis'), and distinguishes it from the sibling 'unlink_claim_from_thesis'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use each role ('supports', 'refutes', 'context') and clarifies what the tool does not do (thesis scoring independence). It also mentions idempotency and tier restrictions, providing clear guidance for appropriate usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_agent_runsList Agent RunsARead-onlyIdempotentInspect
List the caller's own standing-agent runs, newest first — status, goal, cost, and timing for each. A run may have been kicked off by this same agent (e.g. via create_rule's run_team action or a schedule_task wake) OR by the customer's own Workspace UI; this tool lets any MCP client check on ANY run belonging to the authenticated customer regardless of what triggered it. Filter by an exact status match (e.g. "completed", "failed", "running"). Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max runs to return (1-50, default 10). | |
| status | No | Filter to an exact status match. |
Output Schema
| Name | Required | Description |
|---|---|---|
| runs | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, idempotentHint, and destructiveHint as safe. The description adds valuable context: the run list is sorted newest first, includes specific fields, and runs can be triggered by various sources. It also mentions a tier restriction. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (about 80 words) and front-loaded with the core purpose. It delivers key information in a single paragraph without unnecessary details, making it easy for an AI agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema (mentioned in context signals), the description does not need to detail return values. It covers the main semantics: list content, filtering, and ownership. The annotations and schema already handle safety and parameters, so the description is complete for its role.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% with both parameters described. The description provides examples for the status parameter (e.g., 'completed', 'failed', 'running'), adding value beyond the schema. For the limit parameter, no additional context is given, but the schema already specifies min, max, and default. Overall, the description augments the schema well.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists the caller's own standing-agent runs, newest first, with status, goal, cost, and timing. It distinguishes itself by specifying it covers any run belonging to the authenticated customer regardless of trigger, and mentions filtering by exact status match. This is specific and differentiates from sibling tools like get_agent_run.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context: it lists the caller's own runs, allows filtering by status, and can be used by any MCP client to check runs. However, it does not explicitly state when to avoid using this tool or mention alternatives like get_agent_run for a single run. The guidance is clear but lacks exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_alert_inboxList Alert InboxARead-onlyIdempotentInspect
Newest-first listing of the caller's in-app inbox. Items are alert FIRES with a dashboard channel — written by the cron evaluator (or test_alert) — plus platform notifications written by the edge-gateway (agent run completions, morning briefs, skipped runs); use list_alerts instead for the alert definitions themselves. By default dismissed items are hidden and read items are included. Cursor-paginated by fired_at. Sample tier rejected — alerts are a paid-tier feature (sp500+).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of inbox items to return (1–100). Defaults to 20. | |
| cursor | No | Pagination cursor — the `fired_at` of the last item on the previous page. | |
| unread_only | No | When true, return only items where read_at IS NULL. | |
| include_dismissed | No | When true, also return items the caller previously dismissed. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| items | Yes | |
| next_cursor | Yes | |
| unread_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint, idempotentHint, destructiveHint. The description adds valuable behavioral context beyond annotations: pagination by fired_at, default filter behavior, and sample tier restriction, making tool behavior transparent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Every sentence provides unique, essential information. The description is front-loaded with the main purpose, then details sources, default filters, pagination, and tier restriction without any redundant or wasteful wording.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (4 parameters, output schema, rich annotations), the description fully covers what the tool does, when to use it, behavioral traits, and default filters. No gaps remain for an agent to misunderstand.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear descriptions for all 4 parameters. The description adds value by explaining that cursor is paginated by fired_at, which helps understand the cursor parameter's semantics beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists the caller's in-app inbox items in newest-first order, specifies the sources (alert fires with dashboard channel, platform notifications), and explicitly distinguishes from sibling tool 'list_alerts' which deals with alert definitions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use (viewing inbox items) and when-not-to-use (for alert definitions, use list_alerts). Also describes default behavior (dismissed hidden, read included) and tier restriction, giving clear context for appropriate invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_alertsList AlertsARead-onlyIdempotentInspect
Paginated newest-first listing of the caller's alerts (id, condition, channel, status, trigger_count, evaluator health). Filter by status (active/paused/deleted/all). Use the returned alert id with delete_alert or test_alert. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of alerts to return (1–100). Defaults to 20. | |
| cursor | No | Opaque pagination cursor from a previous response; omit for the first page. | |
| status | No | Filter by lifecycle state; defaults to `active`. Use `all` to include paused and soft-deleted alerts. | active |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| alerts | Yes | |
| next_cursor | Yes | |
| total_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds pagination order (newest-first), filter behavior, and tier info, providing useful context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with key info, no filler. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With output schema present, description covers purpose, usage, filtering, pagination, and downstream actions. Fully adequate for a list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema descriptions cover all 3 parameters (100% coverage). Description repeats filter options but adds no significant new meaning beyond schema, warranting baseline 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states 'Paginated newest-first listing of the caller's alerts' with specific fields, clearly distinguishing it from sibling tools like list_alert_inbox and list_rules.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides filter options by status, directs to use returned alert id with delete_alert or test_alert, and mentions tier restriction. Lacks explicit when-not-to-use but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_citation_overridesList Citation OverridesARead-onlyIdempotentInspect
Author-only newest-first listing of the caller's citation corrections. Filterable by ticker (e.g. all AAPL corrections) or by a single fact_id (returns 0 or 1 row). Pair with save_citation_override and delete_citation_override. Sample tier rejected.
Agent use: call with ticker to introspect what corrections the user has previously applied on that ticker — useful for system prompts that respect prior corrections during regeneration.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of citation overrides to return (1–100). Defaults to 20. | |
| cursor | No | Cursor from the previous response's `next_cursor` — the updated_at of the last row on that page. Omit for first page. | |
| ticker | No | Optional ticker filter, case-insensitive. Uppercased internally. | |
| fact_id | No | Optional fact_id filter — returns at most one row. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| overrides | Yes | |
| next_cursor | Yes | |
| total_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral details beyond this: 'Author-only newest-first listing' and 'returns 0 or 1 row' for fact_id. This provides useful context without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is fairly concise, with a clear two-paragraph structure: core functionality in the first, agent guidance in the second. The phrase 'Sample tier rejected' is unclear and could be removed, but overall the description is well-organized and front-loads key information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists and schema coverage is 100%, the description adequately covers purpose, filtering, and use case. It does not explicitly explain pagination (cursor/limit) but this is already in the schema. The tool is simple enough that the description is nearly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds minimal extra meaning beyond schema descriptions: it provides an example for ticker ('e.g. all AAPL corrections') and repeats the fact_id behavior. This meets the baseline but does not significantly enhance understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists citation overrides, with author-only and newest-first ordering. It specifies filtering by ticker or fact_id, and explicitly references sibling tools save_citation_override and delete_citation_override, making its purpose distinct.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance: 'Agent use: call with ticker to introspect what corrections the user has previously applied on that ticker'. It also mentions pairing with save/delete tools. While it doesn't enumerate when not to use, the context is clear given the filtering options and sibling tool names.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_claimsList ClaimsARead-onlyIdempotentInspect
List the caller's saved claims, most-recent-first, with AND-composed filters and cursor pagination. Filter by ticker, claim_type (assertion/prediction/judgment), tag, or lifecycle status (open/confirmed/refuted/expired/stale/needs_review). Archived claims are excluded unless include_archived is set.
Tier: all paid + free tiers (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| tag | No | Filter to claims carrying this topical tag. | |
| limit | No | Page size (max 100). | |
| cursor | No | Pagination cursor from a previous page's next_cursor. | |
| status | No | Filter by lifecycle status, or 'all'. | all |
| ticker | No | Filter to claims referencing this ticker. | |
| claim_type | No | Filter by epistemic type. | |
| include_archived | No | Include soft-deleted claims. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| claims | Yes | |
| next_cursor | Yes | |
| total_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds ordering (most-recent-first), AND-composed filters, cursor pagination, archived exclusion, and tier restrictions. This provides meaningful behavioral context beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three sentences, front-loaded with the core purpose and key features. Every sentence adds value without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists and annotations cover safety, the description sufficiently explains ordering, filtering, pagination, archived behavior, and tier constraints. No critical gaps remain for a list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with each parameter described. The description summarizes filtering parameters but does not add new meaning or syntax details beyond what the schema already provides. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'List the caller's saved claims, most-recent-first, with AND-composed filters and cursor pagination.' This provides a specific verb and resource, and the scope ('caller's saved') differentiates it from sibling tools like list_claims_for_thesis and list_public_claims_by_user.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage by detailing filtering and pagination capabilities, but it does not explicitly contrast with sibling tools or provide when-to-use vs when-not-to-use guidance. The tier restriction is mentioned but no direct alternatives are given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_claims_for_thesisList Claims for ThesisARead-onlyIdempotentInspect
List the claims composing a thesis, each with its role (supports/refutes/context). This is how you read a thesis as the structured argument it is — its supporting and disconfirming claims with their current statuses. Archived claims are omitted. Tier: paid + free (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| thesis_id | Yes | Id of the thesis whose claims to list. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| items | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate read-only, idempotent, non-destructive; description adds that archived claims are omitted and provides tier constraints, providing useful behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences conveying purpose, behavior, and constraints; no wasted words, but tier info might be placed elsewhere.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given simple tool with good annotations and output schema, description covers main behavior, exclusions, and usage constraints adequately.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Only one parameter 'thesis_id', which has a clear schema description (100% coverage). The tool description does not add additional semantic value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states verb 'list' and resource 'claims composing a thesis', specifies roles and statuses, and distinguishes from sibling 'list_claims' by focusing on a specific thesis.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Describes when to use the tool (to read a thesis as a structured argument) and mentions omission of archived claims; no explicit alternatives but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_my_reportsList My Research ReportsARead-onlyIdempotentInspect
Cursor-paginated newest-first listing of the caller's own reports (owner-scoped). Filters compose with AND; status defaults to 'ready' so pass status='draft' or 'all' to see drafts. Use cursor from the previous response's next_cursor to fetch the next page (limit max 100). Sample tier rejected (no per-author state).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Page size. | |
| cursor | No | Cursor from previous `next_cursor`. | |
| status | No | Filter by status. Default 'ready' (excludes drafts + delisted). | ready |
| ticker | No | Filter to a single ticker (case-insensitive). | |
| report_type | No | Filter by report type. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| reports | Yes | |
| next_cursor | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds rich behavioral details beyond annotations: cursor-pagination, filtering with AND, status default, limit max, and limitation about per-author state. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences front-loaded with key purpose, each sentence adds essential information with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Complete for a listing tool: covers pagination, filtering, ordering, default behavior, and limitations. Output schema exists, so return format is covered.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds value by explaining status defaults, cursor usage, and filter composition, enhancing understanding beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'list reports' with specific scoping ('caller's own', 'owner-scoped') and ordering ('newest-first'). It distinguishes from sibling tools like list_alerts or get_report.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context: owner-scoped, status default behavior, pagination with cursor, and filter composition. Does not explicitly name alternatives or when-not-to-use, but the purpose implicitly differentiates.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_pending_approvalsList Pending ApprovalsARead-onlyIdempotentInspect
List the caller's own staged actions still awaiting a human decision (status='proposed'), newest-first. Use this to check what an autonomous run has queued up before you approve or reject it with approve_staged_action / reject_staged_action. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Page size (max 100). | |
| cursor | No | Pagination cursor from a previous page's next_cursor. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| next_cursor | Yes | |
| total_count | Yes | |
| staged_actions | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description adds behavioral details: only lists staged actions with status 'proposed', newest-first order, and the tier restriction. No contradictions, and it supplements the annotations well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences: first states purpose and ordering, second provides usage guidance with tool references and tier restriction. No extraneous information, very concise and front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists and annotations cover safety, the description is complete. It covers what the tool does, when to use it, and access restrictions. Pagination is handled by schema parameters, so no need to elaborate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so the baseline is 3. The description does not add parameter-specific details beyond what the schema already provides (limit and cursor). The implicit filtering by caller and status is not parameter-related.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'list' and the resource 'pending approvals' (staged actions awaiting human decision). It specifies the caller's own actions, status 'proposed', and ordering 'newest-first', making the tool's purpose distinct from siblings like approve_staged_action.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly tells when to use: 'check what an autonomous run has queued up before you approve or reject it'. Mentions the related tools approve_staged_action and reject_staged_action, and notes the tier restriction 'Tier: sp500+ (sample rejected)', providing clear usage guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_public_claims_by_userList Public Claims by UserARead-onlyIdempotentInspect
Return the PUBLIC claims + claim-accuracy reputation for a user identified by Stripe customer_id. Used by the /[handle] profile to render an analyst's claim-level track record — a separate signal from thesis-outcome accuracy. Only visibility='public' claims surface; private state never leaks. Accuracy is confirmed/(confirmed+refuted) over resolved claims; null when n < 5. Sample tier rejected; sp500+ only.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max public claims to return. Defaults to 20. | |
| customer_id | Yes | Target user's Stripe customer_id (resolved by the frontend from the handle). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| claims | Yes | |
| reputation | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint, idempotentHint, destructiveHint. Description adds behavioral details: only public claims surface, accuracy formula (confirmed/(confirmed+refuted) over resolved claims), null when n<5, and tier restriction. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences, front-loaded with purpose and key constraints. Every sentence adds value: purpose, usage, behavioral guarantee, accuracy calculation, and tier restriction. No redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given complexity (accuracy calculation, tier restriction, public-only filtering) and presence of output schema, description covers essential behavioral and usage aspects. Sufficient for an AI agent to understand when and how to invoke.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers 100% of parameters. Description adds context: customer_id is 'resolved by the frontend from the handle,' clarifying how the parameter is used. Limit parameter also clearly explained. Adds meaning beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states it returns public claims and claim-accuracy reputation for a user by Stripe customer_id, used for profile rendering. Distinct from siblings like list_claims (generic) and list_claims_for_thesis (thesis-specific) by focusing on per-user public claims with accuracy metric.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly mentions when to use: for rendering analyst's claim-level track record. Also includes a restriction: 'Sample tier rejected; sp500+ only,' guiding appropriate context and excluding non-qualifying users.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_public_theses_by_userList Public Theses by UserARead-onlyIdempotentInspect
Return the PUBLIC theses + reputation aggregate for a user identified by Stripe customer_id. Used by the /[handle] profile page to render an analyst's track record. Only entries with visibility='public' are surfaced — private theses never leak. Reputation is correct/(correct+wrong) over graded theses; null when n < 5 (sample too small). Sample tier rejected; sp500+ only.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max public theses to return. Defaults to 20. | |
| customer_id | Yes | Target user's Stripe customer_id (resolved by the frontend from the handle). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| theses | Yes | |
| reputation | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so safe. Description adds valuable behavioral details: reputation calculation formula (correct/(correct+wrong)), null condition (n<5), and scope restriction (sp500+ only). No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four well-structured sentences, no fluff. First sentence states purpose, second gives usage context, third clarifies privacy, fourth explains reputation calculation. Front-loaded and efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity and existence of output schema, the description fully covers the necessary context: purpose, usage scenario, security (public only), reputation logic, and sample scope. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. The description does not add new semantic meaning beyond what is in the schema. It mentions 'Stripe customer_id' but schema already describes it. No extra guidance on values or constraints.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it returns public theses and reputation aggregate for a user, with explicit verb 'Return' and resource 'public theses + reputation aggregate'. It distinguishes from siblings like list_theses (likely all) and list_public_claims_by_user (different resource).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states it is used for the /[handle] profile page to render an analyst's track record. Provides exclusion criteria: private theses never leak. Does not explicitly name alternative tools for private data, but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_report_versionsList Report VersionsARead-onlyIdempotentInspect
Author-only newest-first listing of a report's archived version history. Each entry summarises what changed (sections edited, etc.) so the workspace UI can render a clickable history without loading every artifact. Pair with get_report_version to fetch a specific version's content for diffing against HEAD.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of archived versions to return (1–100). Defaults to 20. | |
| cursor | No | Cursor from the previous response's `next_cursor` — the smallest version number on the previous page. Omit for the first page. | |
| report_id | Yes | Identifier of the report whose version history to list, as returned by create_report or list_my_reports. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| versions | Yes | |
| next_cursor | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, idempotent, non-destructive. The description adds that it is author-only and newest-first, and that entries summarize changes. This adds behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, front-loaded with the core purpose, and contains no unnecessary words. It efficiently conveys the tool's function and companion tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema and clear annotations, the description provides sufficient context: it explains what entries contain (change summaries), how it is used (UI clickable history), and how to get more detail (pair with get_report_version).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with good descriptions for all parameters. The description does not add significant extra meaning to parameter usage beyond what the schema provides, so baseline of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'list' and resource 'archived version history', clearly distinguishing from siblings like list_my_reports (lists reports) and get_report_version (fetches specific version). It also states ordering ('newest-first') and audience ('author-only').
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description states it is for author-only use and suggests pairing with get_report_version for diffing, giving clear context. It does not explicitly mention when not to use, but the purpose is well-defined.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_rulesList RulesARead-onlyIdempotentInspect
Paginated newest-first listing of the caller's own rules. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| cursor | No | Pagination cursor from a previous response's next_cursor. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| rules | Yes | |
| next_cursor | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds pagination behavior (newest-first) and tier restriction (sp500+), which are useful beyond annotations. No contradictions found.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, each adding distinct value: purpose/ordering and tier restriction. No fluff or redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple read-only list tool with an output schema, the description covers the key aspects: scope, ordering, and tier. It does not detail error handling or permissions, but these are minor given the tool's simplicity and annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 50% (cursor has description, limit does not). The description only mentions 'Paginated' but does not elaborate on parameter meaning, defaults, or constraints. It adds no value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'listing of the caller's own rules' with specific verb (listing) and resource (rules), and adds ordering (newest-first). It distinguishes from sibling tools like create_rule or delete_rule by focusing on listing personal rules.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for the caller's own rules but does not explicitly state when to use this tool versus alternatives (e.g., create_rule, test_rule). No exclusions or guidance on when not to use are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_scheduled_tasksList Scheduled TasksARead-onlyIdempotentInspect
Paginated newest-first listing of the caller's own scheduled (deferred) tasks — transparency into what an agent has queued for the future. Filter by status (pending/completed/cancelled/cancelled_owner_inactive/all). Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| cursor | No | Pagination cursor from a previous response's next_cursor. | |
| status | No | Filter by lifecycle state; defaults to `pending`. | pending |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| tasks | Yes | |
| next_cursor | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds meaningful behavioral context: newest-first ordering, filtering by status, and a tier restriction ('sp500+'). No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences plus a tier note. No wasted words; key information is front-loaded and easily parsed.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Output schema is present, so return values need not be described. The description covers filtering, ordering, and access tier. It lacks explicit mention of pagination behavior beyond 'paginated', but schema covers cursor. Overall sufficient for a read-only list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 67% (2 of 3 parameters described). Description mentions the status filter and enumerates values, which duplicates schema but does not add new meaning. The 'limit' and 'cursor' parameters are not elaborated beyond schema defaults.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states a specific verb ('list') and resource ('scheduled tasks'), and adds important qualifiers: caller's own tasks, newest-first ordering, pagination. It distinguishes from sibling tools like cancel_scheduled_task and schedule_task.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description implies usage for inspecting queued future tasks, but does not explicitly state when to use this vs alternatives (e.g., cancel_scheduled_task). Filter options are mentioned, but no explicit when-not guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_thesesList Saved ThesesARead-onlyIdempotentInspect
Return the caller's saved theses, newest-first. Filters: ticker (exact), view, status. Cursor-based pagination — pass next_cursor from the previous response to fetch the next page. Sample tier rejected (no per-user state).
| Name | Required | Description | Default |
|---|---|---|---|
| view | No | Filter to a single view. | |
| limit | No | Page size, 1–100. Defaults to 20. | |
| cursor | No | Pagination cursor returned by the previous `list_theses` call's `next_cursor`. | |
| status | No | 'active' (default) hides archived theses; pass 'all' to include them. | active |
| ticker | No | Filter to theses on this ticker (case-insensitive). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| theses | Yes | |
| next_cursor | Yes | |
| total_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only and idempotent. Description adds pagination behavior (cursor-based, newest-first) and sample tier limitation, which are beyond the annotations. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences plus a brief note. No wasted words. Main purpose is front-loaded. Efficient and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all essential aspects: purpose, ordering, filters, pagination mechanics, and a key limitation (sample tier). With an output schema present, no need to detail return fields. Complete for a list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. Description summarizes filter parameters and clarifies pagination cursor usage, adding value over the schema's individual parameter descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it returns the caller's saved theses newest-first, with filters and pagination. Distinguishes from sibling list tools by specifying the resource (theses) and user scope. The sample tier note adds clarity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides usage context (filters, pagination) but lacks explicit guidance on when not to use it or alternatives like get_thesis for a single thesis. The sample tier rejection is useful but not a comprehensive usage guide.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_watchlistsList WatchlistsARead-onlyIdempotentInspect
Paginated newest-first listing of the caller's watchlists (id, name, tickers, status, counts). Filter by status (active/archived/all). Returns metadata only — use get_watchlist for one list's full ticker set, or watchlist_diff for new filings across a list. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of watchlists to return (1–100). Defaults to 20. | |
| cursor | No | Opaque pagination cursor from a previous response; omit for the first page. | |
| status | No | Filter by state; defaults to `active`. Use `all` to include archived watchlists. | active |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| watchlists | Yes | |
| next_cursor | Yes | |
| total_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnly, idempotent, non-destructive. Description adds pagination behavior, newest-first ordering, and metadata-only return. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, no wasted words. Front-loaded with key verb and resource, then filtering, then alternatives and tier.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, usage, alternatives, pagination, filtering, tier constraint. Annotations and output schema (implied) cover the rest. Complete for a listing tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed parameter descriptions. Description summarizes pagination and filtering but adds no new semantics beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Paginated newest-first listing of the caller's watchlists' with specific fields, and distinguishes from siblings like get_watchlist and watchlist_diff.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly specifies filter by status, and provides alternatives: 'use get_watchlist for one list's full ticker set, or watchlist_diff for new filings across a list.' Also mentions tier restriction.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
mark_inbox_readMark Inbox Item ReadAIdempotentInspect
Set read_at on a single inbox item by its id (from list_alert_inbox or the alerts feed resource) — not an alert id. Idempotent — re-marking does NOT reset the first-read timestamp; there is no unmark. Returns the new unread_count so the agent/UI can update its badge without a follow-up call. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| inbox_id | Yes | Identifier of the inbox item to mark read, as returned by list_alert_inbox or the alerts feed resource. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| inbox_id | Yes | |
| marked_read | Yes | |
| unread_count | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds behavioral details beyond annotations, like re-marking not resetting timestamp and no unmark. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Very concise; every sentence provides essential information without waste, including idempotency, return value, and tier.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers key aspects: action, id source, idempotency, return value. With output schema existing, return value explanation is sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds meaningful context about the parameter source ('from list_alert_inbox or alerts feed resource') and clarifies it is not an alert ID.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it sets 'read_at' on a single inbox item and distinguishes from alert IDs, but does not explicitly contrast with similar sibling tools like 'dismiss_inbox_item'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides context such as idempotency and return value, but lacks explicit when-to-use or alternatives for other inbox-related tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
publish_claimPublish ClaimAIdempotentInspect
Make a saved claim discoverable by flipping its visibility: public (default) surfaces it on the author's /[handle] profile and counts toward their claim-accuracy reputation; unlisted makes it reachable at a known direct link but keeps it off the profile. Use AFTER save_claim to promote an existing claim. Idempotent. Pair with unpublish_claim to revert to private. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| claim_id | Yes | Id returned by `save_claim` or `list_claims`. | |
| visibility | No | `public` (default) → profile + reputation; `unlisted` → direct-link-only, off the profile. To revert to private, use unpublish_claim. | public |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| claim | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Describes idempotency explicitly ('Idempotent'), explains visibility effects (profile vs direct-link, reputation impact). Aligns with annotations (idempotentHint=true) and adds valuable context beyond structured fields.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with main action, then details. Every sentence adds value with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple tool (2 params, output schema exists), the description is fully complete. It covers purpose, usage sequence, visibility effects, idempotency, and tier restriction. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Adds meaning beyond schema: explains that `claim_id` comes from specific tools, and details the behavioral difference between `public` and `unlisted`. Schema coverage is 100%, but description enriches understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the verb ('make discoverable') and resource ('saved claim'), and distinguishes from sibling tools like 'unpublish_claim' and 'save_claim'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says to use after 'save_claim' and pairs with 'unpublish_claim'. Also notes tier restriction ('sp500+'). Provides clear when-to-use and when-not-to-use guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
publish_reportPublish Report (free)AIdempotentInspect
Publish a report for FREE at listed or unlisted visibility to build your public author profile. listed makes it discoverable via search_reports (keyword catalog search); unlisted keeps it out of the catalog but accessible by direct id (shareable link). Author can set a tier_required no higher than their own plan. All listings are free today (omit price_cents or set it to 0); paid listings are a future capability.
| Name | Required | Description | Default |
|---|---|---|---|
| report_id | Yes | ||
| visibility | No | listed | |
| price_cents | No | Currently must be omitted or 0 — all listings are free. A non-zero value is rejected until paid listings ship. | |
| tier_required | No | Minimum subscriber tier to read the full body. Defaults to the author's plan. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| report | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate idempotentHint=true and non-destructive. The description adds that publishing is free, visibility affects discoverability, and tier constraints. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with no waste. Front-loaded with main action, followed by parameter details. Every sentence serves a purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With an output schema present, the description does not need to detail return values. It covers purpose, parameter behavior, and usage context. Could mention that publishing makes the report accessible, but overall complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 50% (only price_cents and tier_required have descriptions). The description compensates by explaining visibility options, price_cents constraint (must be 0), and tier_required defaulting to author's plan, adding significant value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it publishes a report for free with visibility options (listed/unlisted), distinguishing it from siblings like create_report or update_report. The verb 'publish' is explicit and matches the tool's function.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use listed vs unlisted, that tier_required cannot exceed the author's plan, and that all listings are currently free. However, it does not explicitly compare to related tools like update_report or unpublish_report.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
publish_thesisPublish ThesisAIdempotentInspect
Make a saved thesis discoverable by flipping its visibility: public (default) surfaces it on the author's /[handle] profile and counts toward their reputation aggregate; unlisted makes it reachable at a known direct link but keeps it off the profile. Use AFTER save_thesis to promote an existing thesis (save_thesis sets visibility only at creation). Idempotent. Pair with unpublish_thesis to revert to private. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| thesis_id | Yes | Id returned by `save_thesis` or `list_theses`. | |
| visibility | No | `public` (default) → profile + reputation; `unlisted` → direct-link-only, off the profile. To revert to private, use unpublish_thesis. | public |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| thesis | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses idempotency (matching the idempotentHint annotation), explains the behavioral effects of visibility options (public vs unlisted), and adds context beyond annotations. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three well-structured sentences, front-loaded with the main action, no redundant information, and every sentence adds essential context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has few parameters, good schema coverage, and an output schema. The description covers purpose, usage context, effects, idempotency, and sister tools, providing a complete picture for correct invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and the description adds value by specifying the source of thesis_id (returned by save_thesis or list_theses) and elaborating on the visibility parameter's implications, which goes beyond the schema's enum descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool makes a saved thesis discoverable by flipping visibility, distinguishes between public (profile + reputation) and unlisted (direct link only), and explicitly mentions sibling tools save_thesis and unpublish_thesis, making the purpose highly specific and differentiated.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance: use after save_thesis to promote an existing thesis, and pair with unpublish_thesis to revert to private. It also explains the two visibility options and their effects, leaving no ambiguity about when to use this tool versus alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
reject_staged_actionReject Staged ActionAIdempotentInspect
Reject a staged action by id. Terminal — the underlying tool is NEVER called, and a rejected (or otherwise already-decided) action can never be flipped back by a later approve/reject call; transitioned tells you whether THIS call is what moved it to 'rejected' or whether it was already decided. An id belonging to a different customer's token is indistinguishable from an unknown id (returns NOT_FOUND). Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| reason | No | Optional free-text reason recorded on the staged action. | |
| staged_action_id | Yes | Id of the staged action to reject. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| transitioned | Yes | True if this call moved the action proposed→rejected; false if it was already decided. |
| staged_action | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses beyond annotations: terminal nature, no underlying tool call, impossibility of flipping back, and indistinguishable ID for other customers, all of which are critical behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Every sentence is essential, no redundancy. Information is front-loaded with the main purpose, followed by critical behavioral details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers terminality, idempotency, security, and state management. Given the presence of an output schema, no need to explain return format.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and schema descriptions exist; description adds no extra semantic value beyond noting optionality of 'reason' and security behavior of 'staged_action_id', but baseline is adequate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Reject' and the resource 'staged action by id', distinguishing it from the sibling 'approve_staged_action' and other actions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states terminality, irreversibility, and the behavior for already-decided actions via the 'transitioned' flag, providing clear when-to-use and what-to-expect guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
render_reportRender Report Download URLAIdempotentInspect
Return a 15-minute presigned download URL for a report in the requested binary format.
format=md presigns the cached markdown — instant, no compute. format=docx returns a branded Word document with a cover page (logo, title, ticker, tier badge), the report body (abstract, sections, citations table with clickable SEC EDGAR links), and a back page (methodology, sources, disclaimer). The DOCX is cached in R2 alongside the markdown after first build so repeat downloads are instant; pass force_regenerate: true to bust the cache (e.g. right after update_report).
Tier gate mirrors get_report: authors always see their own reports; non-authors below the report's required tier get an upgrade prompt.
| Name | Required | Description | Default |
|---|---|---|---|
| format | Yes | md = raw markdown (the same body the editor renders). docx = branded Word document with cover + back page. | |
| report_id | Yes | Id from create_report or list_my_reports. | |
| force_regenerate | No | If true, ignore the cached DOCX and re-render. No effect on md (markdown is canonical). |
Output Schema
| Name | Required | Description |
|---|---|---|
| url | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| format | Yes | |
| filename | Yes | |
| expires_at | Yes | |
| from_cache | Yes | |
| size_bytes | Yes | |
| content_type | Yes | |
| expires_in_seconds | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already mark the tool as idempotent and non-destructive. The description adds critical behavioral details: the 15-minute URL expiry, R2 caching of DOCX, the effect of force_regenerate, and that markdown is canonical. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured: front-loaded with the core function, then details on formats and caching. Every sentence adds value, and the length is appropriate for the complexity (3 parameters, caching, tier gate). No fluff or repetition.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has an output schema, so the description does not need to explain return values. It covers all necessary aspects: parameter behavior, caching, tier gate, and idempotency implications. Given the complexity, the description is fully complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema covers 100% of parameters with descriptions. The description reinforces these and adds extra context, such as the precise content of the DOCX (cover page with logo, title, etc.) and the effect of force_regenerate. This goes beyond the schema's defaults but does not introduce new parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The first sentence clearly states the verb ('Return'), resource ('report'), and output ('presigned download URL'). It distinguishes from siblings like get_report by focusing on download URL generation and format options. The description explicitly covers two formats and their behaviors, leaving no ambiguity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use each format (md vs docx) and the caching mechanism. It also notes the tier gate behavior inherited from get_report. However, it does not explicitly compare to siblings like generate_research_brief_docx or state when not to use this tool, missing some contextual exclusion guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_backtestRun Bounded Factor BacktestARead-onlyIdempotentInspect
A SMALL, BOUNDED, in-Worker sanity-check backtest — NOT a full-universe backtesting engine. Answers a quick question like 'does this factor actually work on these 5 names over the last year' inline, mid-conversation, without leaving MCP. Composes two existing tools (get_pit_universe + get_pit_valuation_ratios) across up to 10 tickers x 12 rebalance dates (120 cells): for each rebalance date, checks which requested tickers were in the survivorship-free PIT universe on that date (dropping — never erroring on — a ticker not yet listed or already delisted), then pulls each surviving ticker's point-in-time valuation multiples and computes the forward return to the NEXT rebalance date from the raw (unadjusted) close. Returns a flat {rebalance_date, ticker, factor_values, forward_return_pct} grid plus a small factor<->forward-return correlation per requested factor — a quick cross-sectional signal check, NOT a transaction-cost-aware portfolio simulation or a statistically validated backtest result. If the requested grid exceeds 120 cells, this tool does NOT silently truncate — it returns a stream_fallback response (signed Parquet download URLs, same shape as get_compute_ready_stream) and tells you to use those URLs. For a REAL full-universe, multi-date, survivorship-free backtest, use the Python SDK's AlphaEngine (pip install valuein-sdk) looped over as_of dates client-side — this tool is explicitly the small complement to that, not a replacement for it. Available on every plan; coverage follows your plan tier same as the two tools it composes.
| Name | Required | Description | Default |
|---|---|---|---|
| factors | No | Which of get_pit_valuation_ratios's own output fields to include as factor_values. One or more of: pe_ratio, ps_ratio, pb_ratio, ev_ebitda, ev_revenue, fcf_yield_pct, gross_margin_pct, operating_margin_pct, net_margin_pct. Omit to include all of them. | |
| tickers | Yes | 1-10 stock ticker symbols, e.g. ["AAPL","MSFT"]. | |
| rebalance_dates | Yes | 1-12 historical dates (YYYY-MM-DD) to snapshot valuation multiples on. Order does not matter — the tool sorts them chronologically. Forward return is computed from each date to the NEXT one in the sorted list. |
Output Schema
| Name | Required | Description |
|---|---|---|
| note | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| cells | No | |
| capped | Yes | |
| method | Yes | |
| caveats | Yes | |
| dropped | Yes | |
| factors | Yes | |
| streams | No | |
| summary | No | |
| tickers | Yes | |
| pit_safe | Yes | |
| next_step | No | |
| grid_cells | Yes | |
| cells_computed | Yes | |
| rebalance_dates | Yes | |
| source_tools_used | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations declare readOnlyHint, idempotentHint, and destructiveHint, which the description aligns with and extends. It discloses key behaviors: no silent truncation (returns stream_fallback), drops tickers not in PIT universe without error, sorts rebalance dates, and computes forward returns to next date. This adds substantial value beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is lengthy but each sentence contributes value. It front-loads the core purpose and constraints, then details behavior and edge cases. While well-organized, it could be slightly more concise without losing completeness, so a 4 is appropriate.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity and the existence of an output schema, the description thoroughly covers all necessary aspects: the return format (flat grid plus correlation), the stream_fallback for large grids, error handling (dropping tickers), and the composability with other tools. No gaps remain for an agent to misinterpret.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds meaningful context: explains that factors come from get_pit_valuation_ratios, tickers are limited to 10, rebalance dates are sorted automatically, and forward return is computed to the next date. This enhances understanding beyond the schema alone, earning a 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it is a 'SMALL, BOUNDED, in-Worker sanity-check backtest' and explicitly distinguishes it from a full-universe backtesting engine. It specifies the tool's precise operation: composing two existing tools across limited tickers and dates, performing survivorship-free checks, and returning a grid with forward returns and correlations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use ('inline, mid-conversation, without leaving MCP') and when not to use ('For a REAL full-universe... use Python SDK's AlphaEngine'). It also mentions alternatives (get_pit_universe + get_pit_valuation_ratios) and the stream_fallback for large grids, giving clear context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_workflowRun Saved WorkflowARead-onlyIdempotentInspect
Resolve a saved workflow by id and return a structured execution plan for a single ticker. Each plan entry names a real MCP tool or SOP plus its ticker-substituted arguments; the calling agent invokes them in order, applying any skip_if predicate against the previous step's output.
This tool does NOT execute the steps server-side. It plans; the agent runs. Iterate through plan[] in order, call the named tool/SOP with args, accumulate outputs, and apply each step's skip_if (skip the step when the previous output's path equals equals).
Workflows are private state owned by the calling user. Sample-tier callers are rejected. Pair with list_workflows (frontend) to discover available workflow_ids.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | No | US-listed ticker the workflow should run against, e.g. 'AAPL'. Pass either `ticker` (single) or `tickers` (batch up to 50). Exactly one is required. | |
| tickers | No | Batch mode — array of US-listed tickers, up to 50. When provided, the response has `plans[]` (one plan per ticker) instead of `plan`. Parity with the frontend batch-runner so agents can request 'plan over my watchlist' in a single call. | |
| workflow_id | Yes | Workflow id returned by the frontend workflow builder. |
Output Schema
| Name | Required | Description |
|---|---|---|
| meta | Yes | Provenance envelope — data lineage for every MCP response |
| plan | Yes | |
| plans | Yes | |
| ticker | Yes | |
| workflow | Yes | |
| instructions | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already mark it as read-only, idempotent, non-destructive. The description adds crucial behavioral details: the tool does not execute steps, returns a plan for the agent to execute, includes skip_if logic, and notes state and tier restrictions. These add value beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and well-structured. The first sentence conveys purpose, the bolded section highlights the critical non-execution fact, and the rest adds necessary context. Every sentence serves a purpose without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (planning with batch mode and skip_if logic), the description covers the returned plan structure, agent instructions, state ownership, tier restriction, and pairing with list_workflows. Output schema exists but is not shown; the description provides enough context for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage, so baseline is 3. The description adds minimal extra meaning beyond the schema, such as mentioning 'single ticker' vs batch mode and parity with frontend batch-runner, but these are not major enhancements.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Resolve a saved workflow by id and return a structured execution plan for a single ticker.' It explicitly distinguishes from execution by emphasizing 'This tool does NOT execute the steps server-side. It plans; the agent runs.' It also differentiates from siblings by mentioning pairing with list_workflows and contrasts with direct execution tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear usage context: it plans, not executes; workflows are private state; sample-tier callers are rejected; and it suggests pairing with list_workflows. However, it does not explicitly state when not to use this tool versus alternatives, though its unique planning function makes that implicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
save_citation_overrideSave Citation OverrideAIdempotentInspect
Persist a correction of a citation value. The correction is keyed on the canonical fact_id (a stable hash of CIK + accession + concept + period) so it applies to every report that references that same fact — including agent-regenerated reports. Re-saving the same fact_id replaces the prior correction in place (no duplicate row).
The fact_id is VERIFIED against live SEC data (scoped to ticker) before the correction is stored — a fact_id that doesn't resolve to a real fact is rejected with FACT_NOT_FOUND and nothing is persisted. You therefore must supply the ticker the fact belongs to.
Use this when the user notices an inaccuracy in an AI-generated report and wants the fix to persist. Provide notes for the rationale (≤500 chars) and source_report_id for provenance. Tier caps: sp500=500, pro=5000, full=50000.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Optional free-form rationale, ≤500 chars. | |
| ticker | Yes | Ticker the fact belongs to (REQUIRED) — scopes fact_id resolution against live SEC data and denormalises the row for fast filtering (the workspace UI's 'my corrections on AAPL' view). | |
| fact_id | Yes | Canonical fact identifier — usually returned in a citation's `fact_ids` array by get_report or any compute tool. Stable across report regenerations. Verified against live SEC data (scoped to `ticker`) before persistence — a fabricated or unresolvable fact_id is rejected. | |
| corrected_value | Yes | User-corrected value, stringified. The frontend interprets it based on the fact's known datatype (number, string, ISO date). | |
| source_report_id | No | Optional report id the user was viewing when they applied the correction (provenance). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| created | Yes | |
| capacity | Yes | |
| override | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Disclosures beyond annotations include verification of fact_id, rejection on failure, idempotency explained, tier caps. No contradiction with annotations (readOnlyHint=false, idempotentHint=true).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured, front-loaded with purpose, then behavior, then usage. Slightly long but every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all aspects: purpose, behavior, parameter roles, limits (tier caps), and alternative tool implied. Output schema exists, so return details are not needed in description.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds context: fact_id as canonical hash, verification scoped by ticker, purpose of notes and source_report_id, denormalization for filtering.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'persist' and the resource 'correction of a citation value', and distinguishes it from siblings like delete_citation_override by explaining re-saving behavior and key fact_id.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'when the user notices an inaccuracy...' and implies when not (e.g., if fact_id is invalid). No explicit when-not statement, but the context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
save_claimSave ClaimAIdempotentInspect
Persist a single falsifiable, evidence-backed CLAIM — the atomic unit of the research graph. Use this for each discrete assertion an analysis produces (e.g. 'NVDA gross margin stays above 70% through FY2026'), then compose claims into a thesis with link_claim_to_thesis. Claims are scored independently of theses, so claim accuracy is tracked as its own track record.
Pick claim_type by HOW it's judged, not what it's about: assertion = true now, checked against data; prediction = resolves at horizon_days via verifiable_condition; judgment = qualitative, not auto-scored. Use tags for the topic (financial, valuation, macro, …). Set eval_mode: 'auto' + a verifiable_condition for deterministic grading, else 'agent'/'manual'.
Tier: all paid + free tiers (sample rejected — guest has no customerId). Verifiable claims must cite evidence.
| Name | Required | Description | Default |
|---|---|---|---|
| tags | No | Topical labels (controlled vocab). Multi-valued; drives filtering + learning segmentation, not scoring. | |
| tickers | Yes | Entities referenced (uppercased). 1 for most claims; 2+ for a comparison claim. | |
| evidence | No | Evidence grounding the claim. | |
| direction | Yes | Directional polarity of the claim. | |
| eval_mode | No | How the outcome is resolved: auto (deterministic grade vs data via verifiable_condition), agent (an LLM judges at resolution), manual (a human marks it). | agent |
| statement | Yes | The atomic, falsifiable statement. One claim, not a paragraph. | |
| antecedent | No | Scenario precondition — the claim only resolves when this holds. Null = unconditional. | |
| claim_type | Yes | Epistemic type — drives scoring. assertion=true now (verified vs data); prediction=future (resolves at horizon via verifiable_condition); judgment=qualitative (not auto-scored). | |
| confidence | Yes | Author confidence in [0,1]. Used as the Brier/log-loss weight when scored. | |
| visibility | No | 'private' (default) owner-only; 'unlisted' visible at a direct URL; 'public' surfaces on the author's profile and contributes to the claim-accuracy reputation. | private |
| horizon_days | No | Resolution horizon in days (predictions). Null for assertions/judgments. | |
| idempotency_key | No | Optional client key for at-most-once semantics from a retrying agent. | |
| source_report_id | No | Optional id of a report that contains the supporting analysis. | |
| verifiable_condition | No | Machine-evaluable condition for eval_mode='auto'. Null otherwise. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| claim | Yes | |
| capacity | Yes | |
| deduplicated | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate idempotentHint=true and destructiveHint=false. The description adds that fact_ids are 'verified against live SEC data at save time (rejected if unresolvable)' and that source_links are enriched 'on-read.' These details go beyond annotations, though the core safety profile is already clear.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficient relative to the 14-parameter complexity, but some sections (e.g., claim_type explanation) are slightly repetitive. The key purpose is front-loaded, and every sentence adds value, though minor trimming could improve conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complex input schema (14 parameters, nested objects, enums) and the presence of an output schema, the description covers all necessary aspects: purpose, usage, parameter selection, behavioral notes, and tier restrictions. It is complete enough for an agent to invoke the tool correctly without additional context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3, but the description adds significant meaning beyond the schema. It explains how to choose claim_type by 'HOW it's judged, not what it's about,' clarifies the role of tags ('drives filtering + learning segmentation, not scoring'), and provides concrete examples (e.g., 'NVDA gross margin stays above 70% through FY2026'). This substantially aids parameter selection.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Persist a single falsifiable, evidence-backed CLAIM — the atomic unit of the research graph.' It distinguishes from sibling tools like 'link_claim_to_thesis' by explaining that claims are composed into theses separately. This is specific and unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly says when to use this tool: 'Use this for each discrete assertion an analysis produces... then compose claims into a thesis with link_claim_to_thesis.' It provides guidance on selecting claim_type and eval_mode, and notes tier restrictions ('all paid + free tiers — sample rejected — guest has no customerId'). This is thorough and actionable.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
save_freeform_reportSave Markdown as a Draft ReportAIdempotentInspect
Save free-form markdown (e.g. a chat synthesis) as a DRAFT report you can refine in the editor and export to Word/PDF. Unlike create_report (which computes a structured reverse_dcf or thesis report), this accepts raw markdown and splits it into sections. No compute, so no citations/lineage — add citations later via update_report. Tier: sample rejected (reports are per-author state). Idempotency-key → stable report id.
| Name | Required | Description | Default |
|---|---|---|---|
| title | Yes | Report title. | |
| ticker | No | Optional ticker for context/catalog. Case-insensitive. | |
| abstract | No | Optional 1–2 sentence summary. | |
| markdown | Yes | Free-form markdown body (≤100k chars). Headings become sections. | |
| idempotency_key | No | Optional key for at-most-once semantics. Same key from the same user always yields the same report id. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| report | Yes | |
| status | Yes | |
| version | Yes | |
| report_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate idempotent and non-destructive behavior. The description adds useful context: no compute, no citations/lineage, per-author state ('Tier: sample rejected'), and idempotency-key mapping. It does not contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is only 4 sentences, all information is front-loaded, and every sentence adds value (purpose, comparison, limitations, idempotency). No redundancy or verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the description adequately covers tool behavior, constraints, and relationship to siblings. It lacks mention of error conditions or format requirements but is sufficient for intended use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds that headings in markdown become sections and idempotency_key ensures stable report IDs, which is marginally helpful but not essential beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it saves free-form markdown as a draft report, contrasting with the sibling `create_report` which produces structured reports. The verb 'save' and resource 'freeform markdown draft report' are specific and distinct.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly defines when to use (raw markdown, e.g., chat synthesis) and when not (structured reports, use `create_report`). It also notes that no citations/lineage are generated, directing to `update_report` for later additions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
save_thesisSave Investment ThesisAIdempotentInspect
Persist a directional investment thesis (bull / bear / neutral) on a ticker. The thesis becomes part of the caller's private research diary; pair with list_theses + score_thesis_outcome to track conviction-vs-outcome over time. Pass idempotency_key for at-most-once semantics from a retrying agent.
Use this AFTER the agent has finished its analysis, not before — the thesis records the conclusion, not the question. Pair with source_report_id to link the thesis back to a published report so the buyer's thesis-tracking carries provenance.
Tier: all paid + free tiers (sample tier rejected — sample is guest access with no customerId binding). Per-tier cap on # of stored theses: sp500=100, pro=500, full=10,000.
| Name | Required | Description | Default |
|---|---|---|---|
| view | Yes | Directional view: bull (expect outperformance), bear (under), neutral (mean-revert). | |
| notes | No | Free-form rationale, ≤4000 chars. Stored verbatim; trim before submitting. | |
| ticker | Yes | US-listed ticker. Case-insensitive — normalised to upper. E.g. 'AAPL'. | |
| conviction | Yes | 1 = low conviction (gut feel) → 5 = high conviction (deep analysis). | |
| visibility | No | Phase 3: 'private' (default) is owner-only; 'unlisted' is visible at a known direct URL; 'public' surfaces on the author's /[handle] profile and contributes to their reputation score. | private |
| horizon_days | Yes | Investment horizon in days. 1 day–5 years (1825d). The grader uses this to pick the as-of period. | |
| idempotency_key | No | Optional client-supplied key. If a previous `save_thesis` from the same user used this key, the existing thesis is returned instead of creating a duplicate. | |
| source_report_id | No | Optional id of a report (from `create_report` / `publish_report`) that contains the supporting analysis. | |
| thesis_at_price_cents | No | Optional snapshot of the ticker's market price (integer cents) at thesis creation. Used by future versions of the grader that mix in price returns; null for now is fine. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| thesis | Yes | |
| capacity | Yes | |
| deduplicated | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds significant behavioral context beyond annotations. It discloses idempotency via 'idempotency_key', tier-based storage caps, and that 'horizon_days' is used by the grader. It also mentions that the thesis is part of a private research diary, which is not captured in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured, starting with the core purpose, followed by usage guidance, pairing suggestions, and tier information. Every sentence adds value, and it is concise given the amount of information conveyed.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers all essential aspects: purpose, when to use, pairing with other tools, tier limitations, idempotency, parameter semantics, and behavioral constraints. It is complete for a tool with 9 parameters and a complex usage scenario.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Although input schema descriptions cover all parameters, the tool description adds meaning beyond the schema. For example, it explains the purpose of 'idempotency_key' for at-most-once semantics, the role of 'source_report_id' for linking to reports, and the visibility options with additional context about phase 3 and reputation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Persist a directional investment thesis (bull / bear / neutral) on a ticker.' It also distinguishes itself from siblings by specifying that the thesis becomes part of a private research diary and pairing with 'list_theses' and 'score_thesis_outcome', providing a clear usage context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidance: 'Use this AFTER the agent has finished its analysis, not before.' It also suggests pairing with 'source_report_id' for provenance. However, it does not explicitly state when not to use the tool or list alternatives, but the guidance is clear and context-rich.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
save_watchlistSave WatchlistAIdempotentInspect
Upsert a named watchlist with a list of tickers. Replace semantics — the full ticker list is the source of truth for that name. Use this for both creation AND modification (delete + recreate is not required for edits). 500-ticker cap per list. Names are case-insensitive uniqueness.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Unique-per-user display name. | |
| tickers | Yes | US tickers. Normalised to uppercase, deduped. | |
| criteria | No | Optional free-form screening criteria description. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| created | Yes | |
| watchlist | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses replace semantics (full ticker list is source of truth), 500-ticker cap, and case-insensitive uniqueness beyond annotations. No contradiction with idempotentHint=true or destructiveHint=false.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences, each adding necessary information. Front-loaded with main action and key behavioral traits. No redundant or filler content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, usage, constraints, and behavioral nuances. Output schema exists, so return value details are unnecessary. Complete for a 3-parameter upsert tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptive parameter descriptions. Description adds value by explaining replace semantics and case-insensitive uniqueness, which are not in schema. Baseline 3, but extra context raises to 4.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Upsert a named watchlist with a list of tickers' with clear verb and resource, and distinguishes from siblings like delete_watchlist, get_watchlist, and list_watchlists by specifying it handles both creation and modification with replace semantics.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says 'Use this for both creation AND modification' and that delete+recreate is not needed. Provides constraints (500-ticker cap, case-insensitive uniqueness). Could be more explicit about when not to use, but context from sibling names implies alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
schedule_taskSchedule TaskAInspect
Defer a follow-up task ("re-check AAPL margin compression in 30 days") for up to 90 days. This is an AGENT-facing primitive — call it mid-conversation/mid-run when you decide something is worth re-checking later; it is NOT a human-authorable "new task" form (use the Workspace's standing-agent scheduler for recurring, human-configured monitoring instead). On wake, an inbox item ALWAYS lands for the owner ("scheduled task due: …"). Optionally pass context: {managed: true, team_id: "<standing_agent id>"} to ALSO attempt a managed re-run at wake time — this integration is not yet fully wired end-to-end (see list_scheduled_tasks's status field to confirm what actually happened) and always degrades gracefully to the inbox notice alone. Persisted durably in D1 — never lost on a Worker recycle. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| task | Yes | Human-readable description of the deferred work. | |
| context | No | Saved thesis/claim/report ids and any other state needed to reconstitute a fresh prompt at wake time. Set `managed: true` + `team_id: "<standing_agent id>"` to also attempt a managed re-run (best-effort — see description). | |
| wake_in_days | Yes | How many days from now this task becomes due (0 < n <= 90). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| status | Yes | |
| task_id | Yes | |
| wake_at | Yes | ISO 8601 timestamp when this task becomes due. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnlyHint=false, etc.), description discloses durable persistence in D1, wake behavior (inbox item always lands), best-effort managed re-run, and tier restrictions. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (5-6 sentences) and well-structured, front-loading purpose and usage. While dense, every sentence adds value without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity and the presence of an output schema (not shown), the description adequately covers usage, parameters, persistence, tier, and integrations. Minor gaps like exact return format are handled by the output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and description adds value for the 'context' parameter by explaining its structure and managed re-run usage. For other parameters, schema descriptions suffice, so description provides supplementary but not essential depth.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool defers a follow-up task for up to 90 days, with specific verb and resource. It distinguishes itself from human-authorable task forms and recurring monitoring tools, showing clear differentiation from siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly specifies when to use (mid-conversation/mid-run for re-checking) and when not to use (not for human-authorable tasks, use standing-agent scheduler for recurring). Also provides guidance on the optional managed re-run feature and its limitations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
score_claimScore ClaimAIdempotentInspect
Resolve a claim's outcome. By default auto-grades an auto claim by evaluating its verifiable_condition against SEC fundamentals (confirmed/refuted), or marks it needs_review when it can't be resolved deterministically (judgment, antecedent, or missing data). To record a human/agent judgment instead, pass manual_status (+ optional score/reason). Idempotent — re-scoring the same resolution is a no-op.
Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| as_of | No | Snapshot date for the fundamentals window (auto mode). Defaults to today UTC. | |
| claim_id | Yes | Id of the claim to resolve. | |
| manual_score | No | Outcome score in [-1,1] for a manual resolution. Null for non-scored statuses. | |
| manual_reason | No | Explanation for a manual resolution. | |
| manual_status | No | Provide to record a human/agent outcome instead of auto-grading. |
Output Schema
| Name | Required | Description |
|---|---|---|
| mode | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| basis | Yes | |
| claim | Yes | |
| score | Yes | |
| reason | Yes | |
| deduplicated | Yes | |
| resolved_status | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate idempotent and non-destructive behavior. The description adds context on auto-grading behavior (evaluating verifiable_condition against SEC fundamentals, falling back to needs_review) and manual mode details, going beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences plus a tier note, highly concise. Every sentence adds value: main action, auto vs manual, idempotency, tier restriction. No redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the description covers key scenarios (auto/manual, idempotency, tier, fallback conditions). It could briefly mention what is returned but is sufficiently complete for an agent to use correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% so baseline is 3. Description adds meaningful context: explains that manual_status triggers manual mode, as_of is snapshot date, and manual_score/reason are for manual resolution. This enhances understanding beyond schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly specifies that the tool resolves a claim's outcome, differentiating between auto-grading and manual resolution. It contrasts with sibling tools like score_due_claims by focusing on a single claim and providing explicit modes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Guidelines are provided for when to use auto versus manual mode, including conditions for manual (manual_status) and auto defaults. However, it does not explicitly exclude alternatives like score_due_claims for batching, though the tier restriction (sp500+) hints at access limitations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
score_due_claimsScore Due Claims (bulk auto-grader)BInspect
Find every auto-gradable claim that is due (assertions in open/needs_review/stale; predictions whose horizon has passed) and resolve each against fundamentals. Operates on the caller's own claims by default; can target any user when called with customer_id from a full-tier (admin) token. Returns a summary + per-claim results. Idempotent — re-calling only re-resolves what changed.
| Name | Required | Description | Default |
|---|---|---|---|
| max | No | Soft cap on claims scored per call (default 100). | |
| as_of | No | Snapshot date for the fundamentals window. Defaults to today UTC. | |
| customer_id | No | Target user's Stripe customer_id. Defaults to caller; requires `full` tier to target others. |
Output Schema
| Name | Required | Description |
|---|---|---|
| due | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| errors | Yes | |
| scored | Yes | Resolved to confirmed/refuted. |
| results | Yes | |
| scanned | Yes | |
| skipped | Yes | |
| needs_review | Yes | Could not be auto-resolved; flagged for review. |
| target_customer_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description claims the tool is idempotent ('Idempotent — re-calling only re-resolves what changed.'), but the annotation sets idempotentHint to false, creating a direct contradiction. This undermines trust in the behavior description. No other behavioral traits are disclosed beyond idempotency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is four sentences with no wasted words. It front-loads the core purpose. However, it could be slightly better structured by separating the idempotency claim from usage notes.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has an output schema (not shown) and annotations (though contradictory). The description covers operation, scope, and return format (summary + per-claim results). The contradiction with idempotentHint creates a gap, and the description does not explain 'resolve against fundamentals' in detail. Overall, adequate but incomplete due to contradiction.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and parameter descriptions in the schema are identical to those in the tool description. The description adds no new meaning or usage guidance beyond what is already in the schema. Baseline 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool finds auto-gradable due claims and resolves them, with scope specification (caller's own or any user via admin token). The verb 'score due claims' matches the title and name. However, it does not explicitly differentiate from the sibling tool 'score_claim' (single claim), so it lacks full sibling distinction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context on when to use the tool (auto-gradable due claims) and who can target which users (admin token for others). It mentions idempotency but does not state when not to use or list alternatives. The guidance is adequate but not exhaustive.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
score_due_thesesScore Due Theses (bulk auto-grader)AInspect
Find every thesis past its horizon with no outcome yet, and grade each via score_thesis_outcome. Operates on the caller's own theses by default; can target any user when called with customer_id from a full-tier (admin) token. Returns a summary + per-thesis results. Idempotent — a re-call only re-grades anything not already graded.
| Name | Required | Description | Default |
|---|---|---|---|
| max | No | Soft cap on theses scored per call. Defaults to 100. The frontend cron walks users serially so a low cap per user keeps each MCP request bounded. | |
| as_of | No | Snapshot date for the 'current' fundamentals window. Defaults to today UTC. | |
| customer_id | No | Stripe customer_id of the target user. Defaults to the caller's own; required to be the caller's own unless the caller's plan is `full` (admin / internal cron). |
Output Schema
| Name | Required | Description |
|---|---|---|
| due | Yes | Subset that were past their horizon AND ungraded. |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| errors | Yes | Per-thesis errors caught + logged. |
| scored | Yes | Successfully scored + persisted. |
| results | Yes | |
| scanned | Yes | Total active theses inspected. |
| skipped | Yes | Skipped because already graded or not yet due. |
| target_customer_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description states 'Idempotent — a re-call only re-grades anything not already graded,' but annotation idempotentHint is false. This is a direct contradiction, rendering the behavioral disclosure unreliable.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is three sentences with no extraneous text. Each sentence adds value, but the idempotency line is factually wrong, which reduces trust in the structure.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists and parameters are well-documented, the description adequately covers the bulk operation scope, defaults, and admin override. Missing explicit mention of failure handling or rate limits, but overall complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds default values and authorization conditions for all three parameters, exceeding schema-only information.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb-resource pair ('find and grade due theses') and clearly distinguishes from siblings like score_claim and score_thesis_outcome by describing the bulk, horizon-based scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit context for default behavior (caller's own theses) and admin override via customer_id, plus notes on cron usage. Lacks explicit when-not-to-use directives but is clear enough.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
score_thesis_outcomeScore Thesis OutcomeAInspect
Grade a saved thesis against fundamental momentum since its creation. Pulls revenue / operating-margin / EPS / OCF deltas and aggregates into a score in [-1, +1]. Bull theses are graded by directional alignment, bear by inverse, neutral by closeness-to-flat. The grade is persisted back to the thesis row; re-call to refresh once new fundamentals land.
Note (PR 2): scoring is fundamental-only — does NOT yet include market-price returns. Phase 2 will mix in price data via a partner feed; the response shape is stable.
| Name | Required | Description | Default |
|---|---|---|---|
| as_of | No | Snapshot date for the 'current' fundamentals window. Defaults to today UTC. The scorer picks the fiscal period closest to this date. | |
| thesis_id | Yes | Id returned by `save_thesis` or `list_theses`. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| thesis | Yes | |
| outcome | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses that the grade is persisted back to the thesis row, that it is not read-only (annotations indicate readOnlyHint=false), and that it can be re-called for updates. The note about future phases adds transparency without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two paragraphs, front-loaded with core action. The note about Phase 2 is slightly extraneous but does not detract significantly. Could be more concise by omitting future plans.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers what, how, and persistence; score range and refresh behavior are explained. Output schema exists, so return details are not needed. Slight gap: no mention of error conditions or prerequisites.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%; the description adds context for thesis_id (source) and as_of (defaults to today, picks closest fiscal period), providing usage nuance beyond schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Describes grading a saved thesis against fundamental momentum since creation, specifying the metrics used (revenue, operating-margin, EPS, OCF deltas) and the output range [-1,+1]. Clearly distinguishes from general scoring tools by focusing on individual thesis evaluation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies use for an individual saved thesis and suggests re-calling for refresh, but lacks explicit comparisons to sibling tools like score_claim or score_due_theses. No when-not-to-use guidance is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
screen_universeScreen Universe by Factor ScoresARead-onlyIdempotentInspect
Rank companies by cross-sectional factor scores from factor_scores.parquet. Returns the underlying factors (roe, gross_margin, operating_margin, net_profit_margin, revenue_growth_yoy, fcf_to_assets, debt_to_equity, asset_turnover, current_ratio, piotroski_f_score) plus their percentile ranks (1.0 = best in universe, 0.0 = worst). composite_rank (the default sort) is a one-number multi-factor shortcut; sort by a specific *_rank column for a single factor. Two modes: full-universe (omit ticker) or single-entity (ticker set — spot-check ONE company's factor profile). Sector filter is SIC-derived (GICS-aligned, not licensed GICS — see get_pit_universe). Use this instead of get_financial_ratios when you want CROSS-SECTIONAL comparison (rank vs peers); use get_financial_ratios when you want one company's ratios over time. Supports survivorship-free POINT-IN-TIME screening via as_of_date (see the param). Full-universe screens omit rows that don't join to a company (null symbol); pass exclude_outliers=true to also drop shell-company rows with implausible factors. Available on every plan — sample returns the subset covered by the sample bucket.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Number of results to return (1-100). Defaults to 25. | |
| offset | No | Zero-based row offset for paging within the requested `limit` window. At most 250 rows are inlined per call; if the response carries a `truncation` envelope, pass its `next_offset` here. Defaults to 0. | |
| sector | No | Filter to a specific sector (case-insensitive partial match). E.g. 'Technology', 'Healthcare'. | |
| ticker | No | If provided, show only this ticker's factor scores (single-entity mode). Omit to screen the full universe. | |
| sort_by | No | Which factor rank to sort by (see the enum). Defaults to composite_rank. An unrecognized column is rejected with INVALID_ARGUMENT (no silent fallback). | composite_rank |
| as_of_date | No | Point-in-time cutoff (YYYY-MM-DD). When set, the screen is reconstructed as of this date via factor_scores.accepted_at — each entity ranked at its latest-knowable period, zero look-ahead, survivorship-free. Omit for the latest snapshot. | |
| exclude_outliers | No | Optional data-quality guard (default false). When true, additionally drops rows with implausible raw factor values (non-finite, or e.g. asset_turnover > 50x, |FCF/assets| > 10) from shell companies with near-zero denominators. Rows that do not join to a company (null symbol) are ALWAYS omitted in full-universe mode, regardless of this flag. |
Output Schema
| Name | Required | Description |
|---|---|---|
| data | Yes | Ranked factor-score rows for the screened universe |
| note | No | |
| plan | Yes | Caller's data plan used to scope the screen |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| ticker | No | Present only when a single-ticker lookup was requested |
| lineage | No | Provenance for pipeline-derived values (ratio.parquet / factor_scores.parquet): source table + pipeline computed_at, plus a pointer to the tools that return filing-level lineage. NOT point-in-time (recomputed on each pipeline run). |
| sort_by | Yes | |
| pit_safe | No | Present (and true) only when as_of_date was supplied — the screen was filtered by factor_scores.accepted_at with zero look-ahead |
| as_of_date | No | Present only when a point-in-time as_of_date was supplied |
| truncation | No | Present only when the inline-row cap withheld rows. Page with `next_offset` (keep the same `limit`) or pull the full set via get_compute_ready_stream. |
| sector_filter | No | Present only when a sector filter was applied |
| results_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description supplements annotations (readOnlyHint, idempotentHint, destructiveHint) with behavioral details: pagination via truncation envelope, omission of null symbol rows, exclude_outliers logic for shell companies, and sort column rejection handling. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is well-structured, front-loading key purpose and output, then covering modes, sorting, filtering, and usage comparisons. Every sentence adds value, though it is somewhat lengthy; could be slightly more concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex tool with 7 parameters, multiple modes, point-in-time screening, and output schema (inferred), the description is thorough. It covers edge cases like null symbols, outlier exclusion, pagination, and sort behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for all 7 parameters. The description adds value beyond schema by explaining the modes (ticker presence triggers single-entity), default sorting, point-in-time feature, and outlier filtering logic.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool ranks companies by cross-sectional factor scores, describes output factors and percentile ranks, and distinguishes modes (full-universe vs single-entity). It explicitly differentiates from sibling get_financial_ratios.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use and when-not-to-use guidance: 'Use this instead of get_financial_ratios when you want CROSS-SECTIONAL comparison...' Also explains sector filter limitations and survivorship-free screening via as_of_date.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
search_companiesSearch CompaniesARead-onlyIdempotentInspect
Search for US public companies by name, ticker symbol, CIK (SEC identifier), or SIC industry code. Returns ticker, company name, sector, industry, exchange, and current S&P 500 membership status. Use this tool to resolve a company name to ticker/CIK before calling get_company_fundamentals, get_valuation_metrics, or other tools that require a ticker — they do not fuzzy-match company names.
Use this tool — NOT get_pit_universe — when the user asks about CURRENT S&P 500 members. To list current S&P 500 members, call search_companies({ is_sp500: true }) (the is_sp500 filter is itself a valid search parameter, so no other input is required). This returns the live snapshot as of query time. Example: "List 5 current S&P 500 members" → call search_companies({ is_sp500: true, limit: 5 }).
Use get_pit_universe ONLY when the user explicitly needs a survivorship-free historical universe as of a specific past date (e.g. "S&P 500 members as of March 2018"). If the user says "current," "today," "now," or gives no date, use search_companies instead.
Data details: sic_code is the 4-digit SIC; industry is the human-readable label. sector is SIC-derived with GICS-style labels — NOT licensed GICS, so industrial conglomerates may map differently from official GICS (e.g. 3M → 'Health Care' by SIC vs Industrials by GICS). S&P 500 membership is sourced from index_membership.parquet (current SP500 = index_name='SP500' AND removal_date IS NULL). Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| cik | No | SEC CIK identifier (exact match). E.g. '0000320193' for Apple. | |
| limit | No | Maximum number of results to return (1–50). Defaults to 25. | |
| query | No | Free-text search over company name and ticker. Case-insensitive. E.g. 'Apple', 'AAPL', 'Microsoft', 'semiconductor'. | |
| is_sp500 | No | Filter to current S&P 500 members only. | |
| sic_code | No | 4-digit SIC industry code. E.g. '7372' for Prepackaged Software. | |
| is_active | No | Filter to active (currently trading) companies only. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| query | Yes | |
| companies | Yes | |
| results_returned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare read-only, idempotent, and non-destructive. The description adds valuable context about return fields, data sourcing (SIC, S&P 500 membership), and schema nuances (e.g., sector mapping differences).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is long but well-organized with clear sections (purpose, usage, data details). It is front-loaded and every sentence adds value, though slightly verbose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (6 params, diverse use cases, overlapping siblings), the description is thorough, covering all aspects including examples, caveats, and data provenance.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, and the description supplements with usage examples (e.g., is_sp500 usage) and clarifies search behavior (e.g., case-insensitive query).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it searches for US public companies by name, ticker, CIK, or SIC code, and explicitly distinguishes from sibling tools like get_company_fundamentals and get_pit_universe.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool (e.g., resolving company names to tickers before other calls, listing current S&P 500 members) and when not to (e.g., using get_pit_universe for historical universes), with concrete examples.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
search_reportsSearch Published ReportsARead-onlyIdempotentInspect
Search the catalog of published research reports. All listings are free to read. Filters: free-text (matches title + abstract), ticker, report_type. Sort: newest (default) or oldest. Tier-gated: callers only see reports their plan tier can read.
| Name | Required | Description | Default |
|---|---|---|---|
| sort | No | newest | |
| limit | No | ||
| query | No | Free-text query over title + abstract (case-insensitive). | |
| cursor | No | ||
| ticker | No | Filter to a single subject ticker. | |
| report_type | No | Filter by report type. | |
| price_max_cents | No | Reserved for future paid listings; currently ignored (all reports are free). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| reports | Yes | |
| next_cursor | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (readOnlyHint, idempotentHint), the description adds that all reports are free and tier-gated access, disclosing important behavioral traits. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with purpose, then key details. No wasted words; every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers main features (filters, sort, tier-gating) but omits pagination behavior (cursor) and limit parameter details. Given the presence of an output schema and annotations, it is largely sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 57% schema description coverage, the description adds meaning by explaining free-text matches title+abstract, default sort order, and tier-gating. It does not explain limit, cursor, or price_max_cents (though the latter is marked as reserved). Overall, moderate added value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it searches the catalog of published research reports, differentiating from sibling tools like list_my_reports or get_report. The verb 'Search' and resource 'catalog of published research reports' are specific and unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description lists available filters (free-text, ticker, report_type) and sort options, providing clear context for use. However, it lacks explicit guidance on when to use this tool versus similar siblings like find_similar_reports, and does not mention when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
stage_actionStage ActionAInspect
Propose an MCP tool call for human approval BEFORE running it. Call this — instead of calling the tool directly — whenever an autonomous or unattended caller (a scheduled standing agent, an unattended agent-runner run, or any MCP client operating without a human watching) is about to perform a write it knows or suspects is risky. The target tool's OWN registered risk hints (readOnlyHint/destructiveHint) decide the tier: GREEN (read-only) tools are never staged — this call is then a no-op passthrough (result: 'not_required') and the caller should just invoke the tool directly. AMBER (reversible write to the caller's own state) and RED (destructive or outward-facing) tools ARE staged: this call does NOT execute anything — it only records the proposal and returns a staged_action_id. A human (or any client acting on the human's behalf) later calls approve_staged_action or reject_staged_action to decide it. Tier: sp500+ (sample rejected — guest has no saved state).
| Name | Required | Description | Default |
|---|---|---|---|
| origin | Yes | Free-form label identifying who/what is proposing this action — e.g. 'agent-runner:managed', 'claude-connector', 'cursor', or any caller-supplied identifier. Lets a human distinguish which session/agent proposed a given write. | |
| tool_args | No | The exact arguments to replay through that tool if/when a human approves. | |
| tool_name | Yes | The MCP tool this action would call once approved (e.g. 'save_thesis', 'create_alert', 'publish_report'). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| result | Yes | |
| risk_tier | Yes | |
| staged_action | Yes | |
| staged_action_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses that stage_action does not execute anything; it only records a proposal and returns a staged_action_id. It explains the tier decision based on the target tool's registered hints (readOnlyHint/destructiveHint), which adds significant behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is verbose and contains some extraneous details (e.g., 'Tier: sp500+ ...') that may not be universally relevant. While clear, it could be more concise by removing the sample rejection note and tightening the flow.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists and the description explains the approval flow, tier behavior, and when to use, it is largely complete. However, the mention of a sample rejection ("sp500+ – sample rejected") is unclear and may introduce confusion, preventing a perfect score.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds useful context: origin is a free-form identifier for the proposing entity, tool_args are exact arguments to replay, and tool_name is the tool to call. It does not merely repeat schema descriptions but adds usage semantics.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: to propose an MCP tool call for human approval before execution. It distinguishes itself from direct tool calls and explains its role for autonomous callers performing risky writes. The tier system (GREEN/AMBER/RED) further clarifies scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly specifies when to use this tool: instead of direct invocation when an autonomous caller is about to perform a risky write. It explains that GREEN (read-only) tools are never staged and should be invoked directly, providing clear conditional logic.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
submit_artifact_feedbackSubmit Artifact FeedbackAIdempotentInspect
File EXPLICIT, structured feedback about a specific artifact you (or the model) produced — a chat message, a report, a thesis, a claim, a tool call, or the schema. Use this (not submit_feedback) when you can name WHAT was judged and HOW: pass target_type + target_id + a sentiment (positive/negative/correction), and optionally a structured reason (e.g. wrong_number, bad_citation, hallucinated_fact), the request_id of the turn, the disputed fact_id, and an expected_value (the value it SHOULD have been, in your words). Available on EVERY tier including guest/sample. This is a one-way intake channel — it records your assertion, it NEVER computes or validates a number, and expected_value is stored verbatim, never trusted as data. Retried submissions of the same judgement on the same request_id file exactly once. Returns the recorded feedback id.
| Name | Required | Description | Default |
|---|---|---|---|
| reason | No | Optional structured error-mode: 'wrong_number', 'bad_citation', 'missing_data', 'wrong_company', 'formatting', 'hallucinated_fact', 'tool_error', 'coverage_gap', or 'other'. | |
| fact_id | No | Optional disputed `fact_id` (most useful for wrong_number / bad_citation). | |
| message | No | Optional free-text detail (≤4000 chars). What you expected and what happened. | |
| sentiment | Yes | REQUIRED. How you judge the artifact: 'positive' (it was right/useful), 'negative' (it was wrong/unhelpful), or 'correction' (you are supplying the right value via `expected_value`). | |
| target_id | Yes | REQUIRED. The id of the artifact this feedback targets (a report id, thesis id, claim id, message id, tool-call id, or table/schema name). | |
| request_id | No | Optional `_meta` request id of the turn that produced the artifact. Folded into the idempotency key so a retried submission of the same judgement files once. | |
| target_type | Yes | REQUIRED. The kind of artifact this feedback is about: 'chat_message', 'report', 'thesis', 'claim', 'tool_call', 'schema', or 'other'. | |
| expected_value | No | Optional: what the value SHOULD have been, in your own words. Stored verbatim for triage — NEVER computed, restated, or trusted as data by Valuein. | |
| idempotency_key | No | Optional explicit dedupe key (1–64 chars). Used to dedupe when no `request_id` is supplied; safe to retry on a network error. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| status | Yes | |
| feedback_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (idempotentHint, readOnlyHint, destructiveHint), the description discloses key behaviors: one-way intake, no computation or validation, `expected_value` stored verbatim, idempotency via `request_id`, and returns feedback id. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is thorough but slightly lengthy; however, it is well-structured with front-loaded purpose, clear usage guidance, and parameter details. Every sentence adds value, maintaining efficiency.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (9 parameters, 3 required, 3 enums), the description comprehensively covers usage, idempotency, non-computation, return value, and distinguishes from siblings. Output schema exists but description still adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, but the description adds contextual meaning beyond schema, e.g., `expected_value` stored verbatim, idempotency key behavior, and when to use `reason`, `fact_id`. This enhances understanding of parameter roles.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: filing explicit, structured feedback about an artifact. It distinguishes from the sibling tool 'submit_feedback' by specifying to use this when you can name what was judged and how, making the purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly guides when to use this tool vs. 'submit_feedback', stating 'Use this (not `submit_feedback`) when you can name WHAT was judged and HOW'. It also mentions availability on every tier, providing clear context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
submit_feedbackSubmit FeedbackAInspect
File product feedback to the Valuein team — a bug, feature request, experience note, or data-quality issue — directly from the agent surface. Available on EVERY tier including guest/sample (no token required), so an agent can report a rough edge in-band without the human leaving the conversation. Provide a category and a message (other fields optional — see params). Authenticated callers can pass an idempotency_key so a retried submission files exactly once (the same key from the same account); guest/sample callers are never deduplicated. Returns a friendly acknowledgment you can relay to the user. Do NOT use this to query data; it is a one-way report channel.
| Name | Required | Description | Default |
|---|---|---|---|
| reason | No | Optional structured error-mode reason: 'wrong_number', 'bad_citation', 'missing_data', 'wrong_company', 'formatting', 'hallucinated_fact', 'tool_error', 'coverage_gap', or 'other'. | |
| context | No | Optional free-form context object (stored as JSON), e.g. { tool: 'get_company_fundamentals', ticker: 'AAPL', request_id: 'abc123' }. Avoid secrets. | |
| fact_id | No | Optional disputed `fact_id` (for wrong_number / bad_citation feedback). | |
| message | Yes | The feedback body (1–4000 chars). Be specific: what you expected, what happened, and any reproduction steps. May contain the user's own words — it is stored for triage and never used for arithmetic. | |
| subject | No | Optional short title (≤140 chars) summarizing the feedback. | |
| surface | No | Optional product surface the feedback concerns: 'mcp', 'workspace', 'sdk', 'dashboard', or 'api'. | |
| category | Yes | What kind of feedback this is: 'bug' (something broke), 'feature_request' (something missing), 'experience' (UX / clarity / docs), 'data_quality' (a wrong/missing/stale figure), or 'other'. | |
| severity | No | Optional impact classification: 'low', 'medium', or 'high'. | |
| sentiment | No | Optional sentiment of this feedback: 'positive' (worked well), 'negative' (something was wrong), or 'correction' (you are supplying the right value). | |
| target_id | No | Optional id of the artifact this feedback targets (e.g. a report or thesis id). | |
| request_id | No | Optional `_meta` request id of the turn that produced the artifact, for correlation. | |
| target_type | No | Optional kind of artifact the feedback targets: 'chat_message', 'report', 'thesis', 'claim', 'tool_call', 'schema', or 'other'. | |
| expected_value | No | Optional caller-asserted correct value, in your own words. Stored verbatim for triage — NEVER computed or trusted as data. | |
| idempotency_key | No | Optional client-supplied key (1–64 chars). For authenticated callers, reusing the same key files the feedback exactly once — safe to retry on a network error. Ignored for guest/sample callers (no account to scope dedup to). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| feedback | Yes | |
| acknowledgment | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses behavioral traits beyond annotations: it explains that it is a one-way report channel (mutation), clarifies idempotency behavior for authenticated vs. guest/sample callers, and notes the return value ('friendly acknowledgment'). This adds value beyond the annotations, which only set boolean hints.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single paragraph but well-structured: purpose first, then availability, key fields, idempotency, return value, and a warning. It is front-loaded with the most important information. Slightly long but every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 14 parameters with full schema coverage and an output schema (implied by 'Returns a friendly acknowledgment'), the description covers all essential aspects: purpose, usage constraints, availability, return value, and behavior across authentication states. No gaps are evident.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the baseline is 3. The description adds meaning by highlighting required fields ('Provide a `category` and a `message`'), explaining the idempotency_key behavior, and summarizing other fields as optional. This contextual guidance helps the agent understand which parameters are essential and their effect.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs and resources: 'File product feedback', listing types (bug, feature request, etc.). It clearly identifies the tool's purpose and distinguishes from siblings by stating it is a 'one-way report channel' and not for querying data.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description states when to use the tool (report rough edges in-band) and explicitly warns 'Do NOT use this to query data'. It mentions availability on all tiers including guest/sample. However, it does not explicitly name sibling tools as alternatives for different use cases.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
test_alertTest Alert (synthetic fire)AInspect
Fire a synthetic notification through the alert's configured channel. Use this immediately after create_alert to verify the channel (email address valid / webhook URL reachable + HMAC verification on the receiver). The synthetic fire is logged as attempt=1 channel='test' so it doesn't affect the real fire counter — the next genuine match still fires normally.
| Name | Required | Description | Default |
|---|---|---|---|
| alert_id | Yes | Identifier of the alert to fire a synthetic test notification through, as returned by create_alert or list_alerts. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| outcome | Yes | |
| alert_id | Yes | |
| status_code | Yes | |
| channel_type | Yes | |
| error_message | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations, the description discloses that the synthetic fire is logged as attempt=1 channel='test' so it doesn't affect the real fire counter, and the next genuine match still fires normally. This adds critical behavioral context for a test tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences: first states the action, second provides usage context and behavioral guarantee. No unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With one simple parameter, an output schema, and thorough annotations, the description fully covers purpose, usage, and behavioral traits, leaving no gaps for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, but the tool description adds value by noting that alert_id comes from create_alert or list_alerts, reinforcing the parameter's origin and use.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Fire a synthetic notification through the alert's configured channel.' It specifies the verb 'Fire', the resource 'alert's configured channel', and distinguishes it from related tools like create_alert by noting it's used for verification.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says 'Use this immediately after create_alert to verify the channel.' It explains when to use it (after creating an alert) and what it checks (email validity, webhook reachability, HMAC verification), providing clear context and alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
test_ruleTest Rule (dry run)ARead-onlyIdempotentInspect
Dry-run a rule's condition_expr against a SYNTHETIC trigger payload — reports whether it WOULD have fired, but NEVER dispatches the action (no report generated, no team run, no message sent, no inbox write). Use this immediately after create_rule to sanity-check the condition before it starts evaluating against real events. Pass sample_payload_override to test against specific field values (e.g. {price_change_pct: 12}).
| Name | Required | Description | Default |
|---|---|---|---|
| rule_id | Yes | Identifier of the rule to dry-run, from create_rule or list_rules. | |
| sample_payload_override | No | Merged over the built-in synthetic payload for this rule's trigger_type — lets you test a specific value. |
Output Schema
| Name | Required | Description |
|---|---|---|
| note | Yes | |
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| reason | Yes | |
| rule_id | Yes | |
| would_fire | Yes | |
| action_type | Yes | |
| synthetic_payload | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and destructiveHint=false, so the description's emphasis on no dispatch reinforces this. It adds specific context (no report, no message, no inbox write), which is helpful beyond annotations. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, each serving a distinct purpose: (1) core behavior, (2) when to use, (3) optional parameter guidance. No redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, usage timing, safety, and parameter usage. An output schema exists (not shown) but is not required in description. For a tool with two parameters and clear annotations, this is fully adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% (both parameters have descriptions). The description adds value by explaining that sample_payload_override is merged over the built-in synthetic payload and gives an example. This goes beyond the schema's generic 'Merged over...' phrasing.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: dry-run a rule's condition against a synthetic payload to check if it would fire, without dispatching any action. It distinguishes itself from siblings like 'test_alert' and 'create_rule' by emphasizing the non-destructive nature and synthetic payload.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly recommends use 'immediately after create_rule' for sanity-checking, and specifies that it never dispatches actions, preventing misuse. No alternative tool is named, but the context makes the when-to-use very clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
unlink_claim_from_thesisUnlink Claim from ThesisADestructiveIdempotentInspect
Remove the link between a claim and a thesis. Idempotent — succeeds whether or not the link existed. The claim and thesis themselves are untouched. Tier: paid + free (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| claim_id | Yes | Identifier of the claim to unlink, as returned by save_claim or list_claims. | |
| thesis_id | Yes | Identifier of the thesis to unlink the claim from, as returned by save_thesis or list_theses. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| unlinked | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds idempotency details and entity preservation beyond annotations. No contradiction; annotations' destructiveHint aligns with link removal. Tier info is valuable context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences plus tier note. Action is front-loaded. Every sentence conveys essential info: purpose, idempotency, side effects, access.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple two-parameter tool with annotations covering idempotency and destructiveness, the description covers purpose, side effects, and tier. No missing critical info.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds context that identifiers come from save/list functions, which provides provenance beyond schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (remove link), the objects involved (claim and thesis), and distinguishes from siblings like link_claim_to_thesis and delete_claim by specifying that the entities themselves are untouched.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides idempotency and tier restrictions, implying safe retry. It differentiates from delete operations by noting entities are untouched, but does not explicitly list alternatives or when to avoid use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
unpublish_claimUnpublish Claim (back to private)ADestructiveIdempotentInspect
Revert a published claim (public or unlisted) back to private — removes it from the author's /[handle] profile and excludes it from the public claim-accuracy aggregate. The inverse of publish_claim. Owner-only, idempotent. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| claim_id | Yes | Id returned by `save_claim` or `list_claims`. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| claim | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate idempotent and destructive hints. The description adds behavioral details: removal from profile and aggregate, owner-only restriction, and idempotency. No contradiction with annotations. Adequate disclosure beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three sentences, front-loading the main action and effects. Every sentence provides essential information: action, effects, inverse relationship, and constraints. No fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the single parameter, clear schema, and output schema, the description covers all necessary aspects: action, effects, constraints, and inverse relationship. Fully adequate for agent invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, and the schema provides a clear description for claim_id ('Id returned by save_claim or list_claims'). The tool description does not add extra meaning to the parameter, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool reverts a published claim to private, removes it from the author's profile, and excludes it from the public accuracy aggregate. It explicitly identifies as the inverse of publish_claim, distinguishing it from siblings like delete_claim or publish_claim.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context on when to use (to revert a published claim) and mentions ownership constraint ('Owner-only') and idempotency. The 'inverse of publish_claim' implies when not to use (i.e., for publishing). No explicit alternatives listed, but context suffices.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
unpublish_reportUnpublish Report (back to private)ADestructiveIdempotentInspect
Revert a published report (listed or unlisted) back to private visibility, removing it from the public catalog. Author-only. Idempotent.
| Name | Required | Description | Default |
|---|---|---|---|
| report_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| report | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate destructive and idempotent nature. Description adds value by specifying 'Author-only' and reinforcing the effect of removal from public catalog, providing additional behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise: two sentences covering all essential points. No superfluous information, front-loaded with the primary action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, access restriction, idempotency, and effect. Output schema exists, so return values need not be described. Lacks mention of error conditions but is largely complete for a simple mutation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, so description must compensate. However, the description does not explain the report_id parameter at all, leaving its purpose inferred. While the parameter is simple, the description fails to add explicit meaning.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the verb 'revert' and resource 'published report', specifying the change to 'private' visibility and removal from public catalog. Distinguishes from siblings like publish_report and delete_report.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides context that it is author-only and idempotent, clarifying who can use it and that it's safe to retry. Does not explicitly mention when not to use or alternatives like delete_report, but the context is clear enough.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
unpublish_thesisUnpublish Thesis (back to private)ADestructiveIdempotentInspect
Revert a published thesis (public or unlisted) back to private — removes it from the author's /[handle] profile and excludes it from the public reputation aggregate. The inverse of publish_thesis. Owner-only, idempotent. Tier: sp500+ (sample rejected).
| Name | Required | Description | Default |
|---|---|---|---|
| thesis_id | Yes | Id returned by `save_thesis` or `list_theses`. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| thesis | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds behavioral context beyond annotations: reverting to private, removing from profile, excluding from reputation aggregate, and idempotency. Annotations already include destructiveHint and idempotentHint, and the description aligns and enriches.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise: two sentences and a short note. It is front-loaded with the main effect and covers essential information without superfluous content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (one parameter, output schema present, good annotations), the description fully covers the behavior, including side effects on profile and reputation, making it complete for an agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter thesis_id is described in the schema as 'Id returned by save_thesis or list_theses', which is clear. The tool description does not add extra parameter details, so with 100% schema coverage, the baseline of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the action ('Revert a published thesis... back to private'), identifies the resource ('thesis'), and notes it is the inverse of publish_thesis, clearly distinguishing it from its sibling.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions 'Owner-only, idempotent' and implies usage via the inverse relationship with publish_thesis. Although it could be more explicit about when not to use, it provides sufficient guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
update_reportUpdate Report SectionsAIdempotentInspect
Replace one or more sections of an existing report owned by the caller. Useful for authoring workflows where the agent's first draft (create_report) is refined by additional analysis before publishing. Bumps version. Does NOT change price / tier / visibility — use publish_report for those.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Optional new title. | |
| abstract | No | Optional new abstract. | |
| sections | Yes | Sections to replace. Sections not listed are preserved. Section ids must match the existing payload. | |
| report_id | Yes | Identifier of the report to update, as returned by create_report or list_my_reports. | |
| expected_version | No | Optimistic concurrency check. If supplied and the current HEAD version is different, the call returns a `version_conflict` error WITHOUT writing. Pass the version you loaded so a concurrent agent edit produces a 'conflict — review' UX instead of silently overwriting (eng review A3A). |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| report | Yes | |
| version | Yes | |
| archived | Yes | |
| previous_version | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds that it bumps version, but annotations set idempotentHint=true, contradicting the version bump (which changes state). No further behavioral details like auth needs or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three lean sentences: purpose, usage context, and limitation. No waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, usage, concurrency, and version bump. Output schema exists for return values, so not needed. Nearly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds significant meaning: sections preservation and ID matching, report_id source, and expected_version concurrency check explanation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool replaces sections of a report owned by the caller, uses specific verbs ('replace'), and distinguishes from siblings like create_report and publish_report.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
It explicitly says when to use (authoring workflows after draft) and when not to use (change price/tier/visibility), naming publish_report as the alternative.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_fact_lineageVerify Fact LineageARead-onlyIdempotentInspect
Use this tool when the user asks BOTH what a financial figure is AND which filing reported it — e.g. "What was Apple's most recently reported revenue, and which 10-Q filed it?" or "Show me the accession ID for Tesla's latest net income." Returns a single fact plus its complete filing provenance: entity, concept, period, value, accession ID, filing URL, and form type (10-K, 10-Q, etc.).
Use this INSTEAD OF search_companies when the user already names a company and wants a financial figure with its source filing — search_companies only resolves identifiers and returns no financial data. Use this INSTEAD OF get_company_fundamentals when the user explicitly wants the filing/form type or the accession ID — get_company_fundamentals returns metrics across periods but omits filing provenance.
Two lookup modes: (1) by fact_id (deterministic SHA-256 identity) or (2) by concept name plus a ticker (most recently reported fact). Optionally pin a point-in-time cutoff via as_of_date (YYYY-MM-DD) — returns the latest filing accepted by SEC on or before that date (no look-ahead); check _meta.pit_safe.
DURATION: a single 10-K tags BOTH a 12-month figure and a 3-month Q4 stub at the same period_end; on a tie this returns the longer (headline) window, and every result carries period_type and period_span_days so a 3-month stub is never mistaken for the annual figure.
Provide either fact_id or concept (required). Returns FACT_NOT_FOUND if no matching fact exists. Available on all plans.
| Name | Required | Description | Default |
|---|---|---|---|
| ticker | Yes | Stock ticker symbol, e.g. AAPL, MSFT, BRK.B | |
| concept | No | Standard concept to look up the most recently known fact for (see the enum for the full fundamentals + capital-allocation set). Use this when you don't have a fact_id. Provide either concept OR fact_id. | |
| fact_id | No | Deterministic fact identity hash: SHA-256(entity_id|accession_id|concept|period_end|unit). 64-char lowercase hex. Use this when you already have the hash from a previous query. Provide either fact_id OR concept (not both required, but at least one must be set). | |
| as_of_date | No | Point-in-time cutoff (YYYY-MM-DD) used with `concept` — returns the latest fact whose 10-K/10-Q was accepted by SEC on or before this date (true PIT, no lookahead; any calendar date works). Canonical name across the suite; supersedes the legacy `period_end`. | |
| period_end | No | [DEPRECATED — pass `as_of_date` instead.] Filing-acceptance cutoff (YYYY-MM-DD) used with `concept`; despite the name it filters on filing accepted_at, not the returned fact's period_end. Kept one release for back-compat. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| lineage | No | Full provenance: fact_id, concept, value, unit, period_end, source accession, SEC EDGAR URL, form_type, accepted_at, plus duration context — period_start, period_span_days, and period_type (instant | quarterly | half_year | nine_month | annual | duration) so a 3-month stub is never mistaken for the 12-month figure. |
| verified | Yes | True when the fact was located and its provenance resolved |
| lookup_by | Yes | How the fact was located: 'fact_id' or 'concept' |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description adds important behavioral details: duration handling for 10-K (Q4 stub vs annual), PIT cutoff behavior with as_of_date, and FACT_NOT_FOUND error, all beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured and front-loaded with use case and sibling differentiation. However, the duration explanation is somewhat verbose and could be tightened without loss of clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers all key aspects: when to use, alternatives, two modes, point-in-time cutoff, duration disambiguation, error case, plan availability. With output schema existing, no need to detail return fields.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds meaning: explains two modes (fact_id vs concept+ticker), fact_id as SHA-256 hash, as_of_date as true PIT cutoff, and deprecation of period_end. This enriches parameter understanding significantly.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns a single fact with complete filing provenance, and explicitly contrasts with siblings like search_companies (no financial data) and get_company_fundamentals (omits filing provenance).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly tells when to use: when user asks for both figure and source filing. Instructs to use instead of specific alternatives, and explains two lookup modes with guidance on when each applies.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
watchlist_diffWatchlist DiffARead-onlyIdempotentInspect
Return new SEC filings across the caller's watchlist tickers since a given date. Reads filing.parquet — does not call insider/ratio surfaces (use those tools separately if you need them). Concurrency-bounded; max 50 tickers per call.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Watchlist name. | |
| since | Yes | Cutoff date (YYYY-MM-DD); the diff returns SEC filings accepted on or after this date across the watchlist's tickers. | |
| form_types | No | Filing forms to include. Defaults to 10-K + 10-Q + 8-K. |
Output Schema
| Name | Required | Description |
|---|---|---|
| _meta | Yes | Provenance envelope — data lineage for every MCP response |
| since | Yes | |
| filings | Yes | |
| watchlist_name | Yes | |
| tickers_scanned | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint, openWorldHint, idempotentHint, destructiveHint=false. The description adds behavioral details: reads from filing.parquet (data source), does not call other surfaces, and is concurrency-bounded with a max of 50 tickers. These details complement the annotations without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two efficient sentences: the first states the core purpose, the second adds data source, exclusions, and concurrency limit. Every sentence adds essential information with zero redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple read-only diff tool with an output schema, the description covers return value nature, data source, constraints, and exclusions. It omits output format details, but the output schema presumably covers that. Overall, it provides sufficient context for correct usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. The schema already thoroughly describes the three parameters (name, since, form_types) with ranges, patterns, defaults, and enumerations. The description does not add new parameter semantics beyond what the schema provides, maintaining the baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns new SEC filings across the caller's watchlist tickers since a given date. It identifies the specific verb ('return'), resource ('new SEC filings'), and context ('watchlist tickers', 'since a given date'), and distinguishes it from sibling tools by noting it reads only from filing.parquet and does not call insider/ratio surfaces.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly tells users to use separate tools if they need insider/ratio data ('use those tools separately'). It also provides a usage constraint ('Concurrency-bounded; max 50 tickers per call'). While it does not exhaustively list when to use alternatives, this guidance is helpful.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!