Skip to main content
Glama

aibvf-mcp

Server Details

AI BVF: score AI portfolios Stop/Fix/Accelerate with decision confidence and pace-layer drag.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
Bahamas1717/ai-bvf
GitHub Stars
0
Server Listing
aibvf-mcp

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.8/5 across 9 of 9 tools scored.

Server CoherenceA
Disambiguation5/5

Each tool targets a distinct operation: cost calculation, process diagnosis, benchmark lookup, readiness inference, taxonomy listing, improvement recommendations, single initiative scoring, portfolio scoring, and portfolio validation. There is no overlap, and the descriptions clearly differentiate between similar tools like score_initiative and score_portfolio.

Naming Consistency5/5

All tool names consistently follow the verb_noun pattern in snake_case, e.g., calculate_pace_layer_drag, diagnose_process, score_initiative. There is no mixing of conventions or inconsistent verb forms, making the naming predictable and easy to understand.

Tool Count5/5

With 9 tools, the server covers the essential functions of the AI BVF framework without being excessive or sparse. Each tool serves a well-defined purpose, and the count feels appropriate for the domain of business value assessment and process improvement.

Completeness4/5

The toolset covers core workflows: scoring, diagnosis, drag calculation, benchmarking, readiness inference, taxonomy help, and portfolio management. Minor gaps exist, such as no tool for creating or editing a portfolio (only validating and scoring), but the documented workflow using validate_portfolio and score_portfolio effectively covers portfolio analysis.

Available Tools

11 tools
calculate_pace_layer_dragA
Read-onlyIdempotent
Inspect

Calculate annual Organisational Drag Cost — the hidden cost of structural friction from misalignment between AI tier and organisational readiness (NOT the cost of the AI build). Use to quantify the cost of NOT changing the operating model. Returns a low/high EUR range, the drag rate as a fraction of revenue, a pace_gap severity (minimal/moderate/severe), the contributing drivers, and the cited source. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
ai_tierYesAmbition of the AI operating model: gen1 = automation/RPA, gen2 = GenAI, gen3 = agentic. Paired with readiness to set pace_gap severity — gen3 on any readiness below agile, or gen2 on siloed, is severe; a higher tier against a slower operating model widens the gap and raises the drag.
industryNoOptional; defaults to universal if omitted. Reserved for future vertical drag-rate adjustments — does not change the result today. Call list_taxonomy for accepted values.
readinessYesOrganisational readiness, honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. Agile readiness yields minimal drag at any tier; the mismatch between a fast AI tier and a slower operating model is what generates the Organisational Drag Cost.
revenue_eurYesApproximate annual revenue in EUR (must be ≥ 0). The result scales with this: annual_drag_eur is returned as an absolute range and as drag_rate, a fraction of this revenue (e.g. 0.02 = 2%).

Output Schema

ParametersJSON Schema
NameRequiredDescription
sourceYesCitation for the drag-rate model applied.
driversYesNamed factors contributing to the drag.
pace_gapYesSeverity of the tier↔readiness mismatch.
drag_rateYesDrag as a fraction of revenue (e.g. 0.02 = 2%), low/high.
bvf_versionYesAI BVF protocol version used.
annual_drag_eurYesEstimated annual Organisational Drag Cost in EUR, low/high.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description adds 'Pure deterministic calculation — no network, auth, or side effects', reinforcing and adding detail about return format (low/high range, drag rate, severity, drivers, source). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured paragraph of 4 sentences. It front-loads the core purpose, then details return values and behavior. No wasted words; every sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters (3 enums) and an output schema, the description fully explains the calculation logic, use case, and expected results. It does not need to repeat return details since output schema exists. It is complete for an agent to correctly select and invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and each parameter has a description. The tool description enriches understanding by explaining how ai_tier and readiness pair to determine severity, how revenue scales the result, and that industry is a future placeholder. This adds meaningful context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it calculates 'annual Organisational Drag Cost' from misalignment between AI tier and readiness, explicitly distinguishing it from 'NOT the cost of the AI build'. It lists return values and emphasizes deterministic nature, differentiating it from sibling tools like diagnose_process or recommend_improvements.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use to quantify the cost of NOT changing the operating model', providing clear context. However, it does not explicitly mention when not to use or name alternatives, though the specificity makes it clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diagnose_processA
Read-onlyIdempotent
Inspect

Diagnose a single existing business process from its observed operational signals and return whether it is too heavy to leave alone, the one intervention that fixes it (Automate / Consolidate & re-sequence / Quality controls / Eliminate), the modelled net EUR saving against its measured baseline, the efficiency gain, an Accelerate/Fix/Stop verdict, and a decision confidence governed by how much was actually measured. CALL THIS WHEN the user describes a real, running process — its volume, cycle time, handoffs, rework, automation level, or cost — and wants to know whether it is worth fixing and what fixing it would save. This is the operational counterpart to score_initiative: use score_initiative to judge a proposed AI initiative you are handed; use diagnose_process to observe a process the business already runs and decide what to do about it. Call list_taxonomy first if unsure which function enum value to pass. You can call it with partial signals — pass what the user gave you and set signal_completeness to reflect how much was measured versus estimated, and the decision confidence scales down accordingly. Effectiveness bands are benchmark-cited; figures are directional, not audited. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
functionYesBusiness function the process belongs to. See list_taxonomy.
handoffsYesDistinct owners/systems an instance passes through. Weighed against the per-function median; many handoffs make handoff drag dominant and point to Consolidate & re-sequence.
readinessNoOptional. Org change-absorption capacity — agile / traditional / siloed — which caps the realised (net) saving below the gross potential. Defaults to traditional.
process_idYesStable identifier for the process.
rework_rateYesFraction of instances reopened/reworked (0–1). When rework is the dominant drag factor the intervention becomes Quality controls, and it also sets the addressable share for that path.
touch_ratioYesTouch-time ÷ cycle-time (0–1). The remainder is wait; a low value means the process is mostly waiting, which pushes the intervention toward Consolidate & re-sequence.
cycle_time_daysYesMedian wall-clock days per instance, end to end. Long cycles relative to touch-time signal wait/latency drag.
automation_levelYesShare already automated (0–1). Low automation makes manual effort the dominant drag and selects Automate; the un-automated remainder is the addressable share.
direct_spend_eurYesAnnual licence/vendor/tooling spend on the process in EUR. Added to the labour baseline and shifts how much of the saving is labour- vs spend-addressable.
instances_per_yearYesProcess volume: how many times it runs per year. Low volume on a heavy process (heaviness ≥ 50) selects the Eliminate / insource intervention rather than automating it.
signal_completenessNoOptional 0–1. How much of the above was measured versus defaulted. Governs decision_confidence proportionally — lower it when you estimated inputs so the verdict stays honest. Defaults to 0.7.
fte_hours_per_instanceYesHuman touch-time in hours per instance. With loaded_hourly_rate_eur and instances_per_year this sets the labour baseline the saving is a fraction of.
loaded_hourly_rate_eurYesFully-loaded labour cost per hour in EUR (salary + on-costs). Multiplies fte_hours_per_instance × instances_per_year into the annual labour baseline.

Output Schema

ParametersJSON Schema
NameRequiredDescription
verdictYesThe call on the intervention.
functionYesBusiness function diagnosed.
heavinessYesProcess heaviness index, 0–100.
disclaimerYesDirectional decision aid, not an audited figure.
process_idYesEcho of the input process id.
assumptionsYesThe assumptions behind the figure — never a naked number.
bvf_versionYesAI BVF protocol version used.
interventionYesRecommended move.
brain_versionYesAdvisor Brain model version used.
net_saving_eurYesModelled net annual saving in EUR after readiness capture, low/high.
offer_to_executeYesTrue when the verdict warrants offering to action it (Accelerate).
baseline_cost_eurYesCurrent annual cost: labour + direct spend.
evidence_maturityYesStrength of the benchmark evidence behind the effectiveness band.
advisory_next_stepNoOptional CTA, present only for Fix/Stop verdicts.
drag_decompositionYesShare of heaviness from each friction factor (sums to ~1).
decision_confidenceYesConfidence in the verdict, 0–100.
efficiency_gain_pctYesEfficiency improvement on the targeted slice, percent.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds context: 'Pure deterministic calculation — no network, auth, or side effects.' It also notes 'Effectiveness bands are benchmark-cited; figures are directional, not audited.' This discloses the non-audited and directional nature beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is thorough but slightly verbose; however, every sentence contributes essential guidance. It is well-structured: starts with output summary, then usage instructions, sibling differentiation, partial input handling, caveats, and deterministic nature. Minor redundancy (e.g., 'decision confidence governed by how much was actually measured' is said twice) but overall efficient for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, 11 required, with rich semantics), the description covers all important aspects: when to use, relation to siblings, partial input handling, deterministic behavior, and accuracy caveats. Output schema exists, so return value details are not needed in description. Completely meets the needs for an AI agent to select and invoke this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (every parameter has a description). The tool description goes further by explaining how parameters relate to intervention selection (e.g., 'many handoffs make handoff drag dominant and point to Consolidate & re-sequence') and how partial signals can be used with signal_completeness. It also explains the interplay between parameters like automation_level and instances_per_year. This adds substantial meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states the tool diagnoses a single existing business process and returns specific outputs (heaviness, intervention, savings, etc.). It distinguishes itself from score_initiative by contrasting operational observation vs. proposal evaluation, and references list_taxonomy for enum selection. The verb 'diagnose' and resource 'business process' are precise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description clearly states when to call: 'CALL THIS WHEN the user describes a real, running process — its volume, cycle time, handoffs, rework, automation level, or cost — and wants to know whether it is worth fixing and what fixing it would save.' It also specifies when not to use: 'use score_initiative to judge a proposed AI initiative; use diagnose_process to observe a process the business already runs.' Additionally recommends calling list_taxonomy first for function enum. No ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_benchmarkA
Read-onlyIdempotent
Inspect

Look up the published benchmark rates for a business function and industry. Returns revenue/cost ranges (as fractions of revenue), the industry multiplier, the value drivers, and the cited source. Use when the caller wants the raw rates and multiplier without running a four-pillar verdict — for an initiative-level Accelerate/Fix/Stop call, use score_initiative instead. Pure deterministic lookup — no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
functionYesBusiness function to benchmark — must be one of the list_taxonomy function values. Selects the base revenue-uplift and cost-reduction rate ranges (returned as fractions of revenue) and the value drivers.
industryYesIndustry whose multiplier to apply — must be one of the list_taxonomy industry values. The returned industry_multiplier is applied to the function base rates; pass "universal" for the un-adjusted rates.

Output Schema

ParametersJSON Schema
NameRequiredDescription
sourceYesCitation for the benchmark figures.
driversYesNamed value drivers behind the benchmark.
functionYesBusiness function the rates apply to.
industryYesIndustry whose multiplier was applied.
cost_takeout_rangeYesCost take-out as a fraction of revenue, lo/hi.
industry_multiplierYesMultiplier applied to the base rates for this industry.
revenue_uplift_rangeYesRevenue uplift as a fraction of revenue, lo/hi.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds 'Pure deterministic lookup — no network, auth, or side effects,' which reinforces and expands on the annotations. While the annotations already cover the safety profile, the description provides explicit behavioral summary, earning a 4 rather than 5.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: first states purpose, second lists return values, third gives usage guidance and behavioral clarification. Every sentence is essential. No waste. Optimal structure for quick parsing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with two required enum parameters and an existing output schema, the description covers all necessary aspects: purpose, return values, when to use, behavioral traits. It is fully complete without being verbose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions. The description adds value by cross-referencing the list_taxonomy function/industry values and explaining the industry multiplier effect. This goes beyond what the schema alone provides, making the parameter semantics clearer.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with a clear verb and resource: 'Look up the published benchmark rates for a business function and industry.' It specifies what is returned (revenue/cost ranges, industry multiplier, value drivers, source). It also distinguishes from 'score_initiative' by stating when to use this tool instead, providing direct sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit guidance: 'Use when the caller wants the raw rates and multiplier without running a four-pillar verdict — for an initiative-level Accelerate/Fix/Stop call, use score_initiative instead.' This clearly states the appropriate context and names the alternative tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

infer_readinessA
Read-onlyIdempotent
Inspect

Measure organisational readiness from process data instead of accepting self-report. Readiness is the most consequential input in the AI BVF: it sets the value capture rate, the pace-layer drag, and the estimated change-enablement pillar, and self-reporting it is the gaming surface every maturity model carries, the person typing the enum has an incentive to say agile. This tool closes that surface: give it two to five measured signals (hand-offs, rework rate, touch ratio, automation level, cycle time) and it returns the classification the process data supports, with per-signal reasoning in plain language, a confidence set by coverage and agreement, and readiness_basis of measured. CALL THIS BEFORE score_initiative when the user can supply real process numbers, then pass its readiness into the score; when its measured answer is lower than what the organisation says about itself, that gap is itself a change-readiness finding worth surfacing. Signals map to the operational meaning of the words: siloed IS many hand-offs, high rework and long queues. Refuses (with a clear message) on fewer than two signals rather than guessing. Pure deterministic calculation, no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
functionYesBusiness function the process belongs to. Selects the published cycle-time and hand-off medians the signals are read against. Call list_taxonomy if unsure.
handoffsNoDistinct owners or systems an instance passes through. Read against the function median: 1.5x or more the median reads siloed, at or above the median reads traditional, below it reads agile.
rework_rateNoFraction of instances reopened or reworked (0-1). 15% or more reads siloed, 5-15% traditional, under 5% agile.
touch_ratioNoTouch-time divided by cycle-time (0-1); the remainder is waiting. Under 0.15 reads siloed (the process lives in queues), 0.15-0.4 traditional, above 0.4 agile.
cycle_time_daysNoMedian wall-clock days per instance. Read against the function median, same bands as handoffs.
automation_levelNoShare of the process already automated (0-1). Under 0.2 reads siloed, 0.2-0.5 traditional, above 0.5 agile.
claimed_readinessNoOptional. What the organisation says about itself. The measured result is compared against it and the gap returned as readiness_gap plus a gap_finding, because an organisation whose self-image runs ahead of its process data has just told you where the change work starts.

Output Schema

ParametersJSON Schema
NameRequiredDescription
auditNoReproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
guidanceYesHow to use the result downstream, including what a gap between measured and self-reported readiness means.
readinessYesThe readiness classification the measured signals support.
confidenceYesConfidence 0-100, set by signal coverage (2 signals ~45, 5 signals ~90) and discounted when signals disagree.
bvf_versionYesAI BVF protocol version used.
gap_findingNoThe claimed-versus-measured gap read as a change-readiness finding. Surface verbatim when present.
disagreementNoPresent when signals point in opposing directions: readiness is uneven across the process, read the per-signal detail.
signal_readsYesPer-signal read: the value, which readiness it leans toward, and why in plain language. Show these to the user.
signals_usedYesHow many of the five signals were provided.
readiness_gapNoOrdinal distance claimed-to-measured. Positive: the organisation claims better than it measures.
readiness_basisYesAlways measured: this came from process data, not self-report.
claimed_readinessNoEcho of the claim, when supplied.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond the annotations: it specifies 'Pure deterministic calculation, no network, auth, or side effects,' which aligns with the annotations' readOnlyHint, idempotentHint, and destructiveHint. It also explains how claimed_readiness is handled, adding behavioral detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is thorough and front-loaded with purpose, but it is somewhat lengthy. Every sentence serves a purpose, including the rationale, usage advice, and parameter hints. Structure is logical, but minor verbosity prevents a 5.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, output schema, sibling tools), the description covers all aspects: purpose, when to use, parameter semantics, behavioral traits, error handling (refuses on <2 signals), and output details (readiness_basis, confidence, gap). No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the baseline is 3. The description goes further by explaining the operational meaning of signals and how they map to readiness levels (e.g., 'siloed IS many hand-offs, high rework and long queues'). This adds valuable context beyond the schema's parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Measure organisational readiness from process data instead of accepting self-report.' It uses a specific verb-resource combination and distinguishes itself from sibling tools like score_initiative by explicitly stating when to call this before it.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance: 'CALL THIS BEFORE score_initiative when the user can supply real process numbers.' It also clarifies when not to use it (refuses on fewer than two signals) and suggests an alternative (list_taxonomy for function selection).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_taxonomyA
Read-onlyIdempotent
Inspect

Return every accepted enum value for the AI BVF taxonomy: the full lists of industries, functions, ai_tier levels (gen1/gen2/gen3), and readiness levels. Call this first when unsure which exact strings score_initiative, score_portfolio, recommend_improvements, calculate_pace_layer_drag, get_benchmark, or diagnose_process will accept, so you pass valid values instead of guessing. Takes no parameters and has no side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
ai_tiersYesAll accepted ai_tier values (gen1/gen2/gen3).
functionsYesAll accepted business-function values.
readinessYesAll accepted organisational-readiness values.
industriesYesAll accepted industry values.
bvf_versionYesAI BVF protocol version these enums belong to.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds that the tool takes no parameters and has no side effects, reinforcing safety. It does not describe the output format, but an output schema exists, so that is acceptable. The added context about returning full lists is useful beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each serving a distinct purpose: (1) states the tool's output, (2) provides usage guidance, (3) confirms no parameters or side effects. It is front-loaded and contains zero wasted words, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters, the presence of an output schema (which documents return values), and annotations covering safety, the description is complete. It tells what the tool returns, when to use it, and that it is side-effect-free. There are no gaps left for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are no parameters, and schema description coverage is 100% (empty schema). The baseline is 3 because the schema already conveys all parameter information. The description states 'Takes no parameters,' which repeats the schema but adds no new meaning. No additional parameter semantics are needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns every accepted enum value for the AI BVF taxonomy: industries, functions, ai_tier levels, and readiness levels. It explicitly distinguishes itself from sibling tools by listing which tools (score_initiate, score_portfolio, etc.) will accept these values, so the agent knows when to use this tool as a prerequisite.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: 'Call this first when unsure which exact strings... will accept, so you pass valid values instead of guessing.' This tells the agent when to use this tool (before any taxonomy-consuming tool) and implies it is a prerequisite. No exclusions are needed, and the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

map_to_taxonomyA
Read-onlyIdempotent
Inspect

Map everyday business language onto the canonical AI BVF enums, deterministically. Senior users say customer service, procurement, legal, banking, GenAI copilot and bureaucratic, not cx, supply, risk, financial, gen2 and siloed. Pass any of industry, function, ai_tier or readiness as free text and get the canonical value back with what it matched on, or null with suggestions when there is no confident match, in which case ask the user to choose rather than guessing. CALL THIS whenever you are unsure which enum string another AI BVF tool will accept; it is cheaper than a failed validation. Pure deterministic lookup, no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
ai_tierNoEveryday AI language, e.g. RPA, GenAI copilot, autonomous agents. Resolved to gen1/gen2/gen3.
functionNoEveryday function language, e.g. customer service, procurement, legal, people. Resolved to cx, supply, risk, hr and so on.
industryNoEveryday industry language, e.g. banking, ecommerce, pharma. Resolved to the canonical enum.
readinessNoEveryday culture language, e.g. bureaucratic, cross-functional, hierarchical. Resolved to agile/traditional/siloed.

Output Schema

ParametersJSON Schema
NameRequiredDescription
ai_tierNo
functionNo
guidanceYes
industryNoinput, resolved and matched_on; or resolved null with suggestions when no confident match.
readinessNo
bvf_versionYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description reinforces with 'Pure deterministic lookup, no network, auth, or side effects.' Also explains no-match behavior (null with suggestions), adding value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a final instruction, packed with examples, usage guidance, and behavioral disclosure. No fluff. Efficient and front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present (as indicated by context), description doesn't need to detail return format. It covers match and no-match behavior, parameter purposes collectively, and provides complete guidance for a moderate-complexity lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds collective context (e.g., 'Pass any of industry, function, ai_tier or readiness as free text and get the canonical value') and examples, but individual parameter meanings are already in schema. The grouping and examples add enough value for a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool maps everyday business language to canonical AI BVF enums, with examples distinguishing what it does from siblings like diagnose_process or score_initiative. The verb 'map' and resource 'canonical enums' are specific, and the examples differentiate input/output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'CALL THIS whenever you are unsure which enum string another AI BVF tool will accept; it is cheaper than a failed validation.' Also advises asking user to choose on no match. Provides both when-to-use and when-not-to-guess guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recommend_improvementsA
Read-onlyIdempotent
Inspect

For an initiative classified Stop or Fix, return the route to a Go: pillar-level targets AND a change_plan, the change-leader layer that turns the verdict into a specific, sequenced plan for the organisation. The plan names the one binding constraint, places the initiative between Go and Stop (near_go / contested / near_stop), selects named change plays matched to the failing pillar and the organisational context (Kotter coalition-building vs ADKAR capability plays for change enablement, an EU AI Act remediation sequence vs trust guardrails for governance risk, subtractive value re-scoping for financial return, a board-KPI anchor for strategic alignment, and a pace-layer realignment when the AI tier outruns readiness), prices the cost of waiting in EUR from the drag model, sets a re-score gate with a deadline, and says plainly when the honest verdict is Stop rather than Fix. Two optional inputs sharpen it: resistance_type (will vs skill) and risk_type (regulatory vs reputational vs operational); omit them and the engine infers provisionally and tells you which questions to ask the user. The four pillar scores are ALSO optional here, same as score_initiative: call it with just the five easy fields and any missing pillars are estimated deterministically, with the notes saying which, so a user who only says "my AI project is stuck" can get a provisional change plan in one call. ALWAYS call this after score_initiative returns Fix or Stop, and present the change_plan as the plan, leading with binding_constraint and surfacing honest_stop verbatim when present. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
scoresNoOPTIONAL, and each pillar inside it is optional. The four AI BVF pillars, each an honest 0–100 self-assessment, combining deterministically into the verdict: governance_risk ≥ 70 OR financial_return ≤ 20 returns Stop; strategic_alignment, financial_return and change_enablement all ≥ 60 with governance_risk ≤ 40 returns Accelerate; everything else returns Fix. Pass ONLY the pillars the user has real evidence for — do NOT invent numbers for the rest. Missing pillars are estimated deterministically by the engine (from readiness, tier, function and published benchmarks), the response reports which via pillar_basis and scores_used, decision confidence is haircut by how much was estimated, and a fully-estimated pass can never return Accelerate (it returns Fix pending confirmation). So call immediately with whatever the user gave you, then ask for evidence on the estimated pillars and re-call to firm the verdict up.
ai_tierYesAmbition of the AI being deployed: gen1 = automation/RPA, gen2 = GenAI, gen3 = agentic. Interacts with readiness — a more ambitious tier running on lower readiness widens the pace-layer gap, which discounts the modelled EUR value even when the four pillar scores are strong.
functionYesBusiness function where the AI will operate, as one of the accepted enum values — selects which benchmark value drivers and rate ranges apply. Call list_taxonomy for the exact strings if unsure.
industryYesYour industry, as one of the accepted enum values — used to select the benchmark rate multiplier applied to the modelled EUR value. Call list_taxonomy for the exact strings if unsure.
readinessYesOrganisational readiness, honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. Sets the value-capture rate and, paired with ai_tier, the pace-layer drag — lower readiness against a higher tier reduces the captured value. Self-report is gameable: when the user has real process numbers, call infer_readiness first and pass its measured classification here instead.
risk_typeNoOptional. The nature of a high governance-risk score: "regulatory" = statute applies (EU AI Act, GDPR Article 22, DORA), "reputational" = the risk is how failure looks and lands publicly, "operational" = the system failing quietly inside a process. Selects between a regulatory remediation sequence, visible trust guardrails, and a proportionate governance review. If you do not know, omit it: the engine infers (gen3 tier, or a regulated function/industry, infers regulatory) and marks the play provisional.
revenue_eurYesApproximate annual revenue in EUR (must be ≥ 0). Scales the whole output: the benchmark rates are applied as fractions of this figure, so the modelled EUR value range grows with it. A rough order-of-magnitude estimate is fine.
resistance_typeNoOptional. What sits behind a low change-enablement score: "will" = people do not want the change (power shifts, fear, no case for change), "skill" = people cannot yet do it (capability and capacity gap). Selects between a coalition-building play (Kotter 1-2 + ADKAR Awareness/Desire) and an owner-and-capability play (ADKAR Knowledge/Ability). If you do not know, omit it: the engine infers from readiness (agile infers skill, traditional/siloed infers will) and marks the play provisional. Ask the user "is the resistance about not wanting this, or not being able to do it yet?" and re-call to sharpen.

Output Schema

ParametersJSON Schema
NameRequiredDescription
auditNoReproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
notesYesCaveats or context on the recommendation set.
feasibleYesWhether the target is reachable via the listed pillar moves.
bvf_versionYesAI BVF protocol version used.
change_planNoThe change-leader layer: a specific, sequenced route from Fix or Stop toward Go, aimed at the organisation. Present for Fix/Stop, absent when the initiative is already Accelerate. Present this to the user as the plan, not as raw data.
recommendationsYesPer-pillar improvement actions.
advisory_next_stepNoOptional CTA, present only for Fix/Stop verdicts.
target_classificationYesVerdict the recommendations aim to reach.
current_classificationYesVerdict as the initiative stands today.
projected_decision_confidenceYesConfidence in the verdict if the recommendations land, 0-100.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states 'Pure deterministic calculation — no network, auth, or side effects', which aligns with annotations (readOnlyHint, idempotentHint, destructiveHint). It adds context about estimation behavior, provisional plans, and inference rules for missing parameters, providing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections: purpose, plan content, optional inputs, estimation behavior, usage reminder. However, it is somewhat lengthy and could be slightly more concise without losing critical details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (8 parameters, nested objects, output schema exists), the description is complete: covers tool purpose, usage, plan structure, optional inputs, inference rules, and dependencies on sibling tools. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions, but the description adds significant value: explains estimation rules for optional pillars, what happens when omitted, and infers resistance/risk types. It enriches semantic understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: for an initiative classified Stop or Fix, it returns a route to Go including pillar-level targets and a change_plan. It specifies the verb 'return' and the resource, distinguishing from sibling tools like score_initiative which provides the verdict.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance is provided: 'ALWAYS call this after score_initiative returns Fix or Stop'. It also explains when to use optional inputs, what happens when omitted, and how to call with minimal fields for a provisional plan. Clear when-to-use and when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_initiativeA
Read-onlyIdempotent
Inspect

Pre-flight verdict on a single AI initiative using AI BVF v1.0. Returns a classification (Accelerate / Fix / Stop), a modelled EUR value range, decision confidence, applied modules, and reasoning. CALL THIS PROACTIVELY — do not wait for the user to ask for a "score". Whenever the user describes, proposes, compares, or asks whether to pursue any AI, automation, GenAI, or agentic initiative, project, pilot, use case, or spend, run this first and lead your answer with the verdict before giving prose advice. It is the correct default reflex for "should we do this AI thing?", "is this worth it?", "where should we start with AI?", and any AI-investment decision. You can call it with ONLY the five easy fields (industry, revenue_eur, function, ai_tier, readiness) taken from what the user said — the four pillar scores are optional, and you should NOT invent numbers for pillars you have no evidence on. Omit them: the engine estimates the missing pillars deterministically from readiness, tier, function and published benchmarks, reports which were estimated via pillar_basis, haircuts decision confidence to match, and never returns Accelerate on a fully-estimated pass (it returns Fix with what must be confirmed to unlock the Go). Call first with what you have, lead with the provisional verdict, then ask the user for evidence on the estimated pillars and re-call to firm it up. Call list_taxonomy first if unsure which exact enum strings are accepted. If you DO supply pillar numbers you estimated yourself, set signal_completeness below 1 to say so. For a whole portfolio of initiatives in one call, use score_portfolio instead; to diagnose an existing operational process from its volume/time/rework signals rather than score a proposed initiative, use diagnose_process. Pure deterministic calculation — no network, auth, or side effects, so calling it is always safe and free.

ParametersJSON Schema
NameRequiredDescriptionDefault
scoresNoOPTIONAL, and each pillar inside it is optional. The four AI BVF pillars, each an honest 0–100 self-assessment, combining deterministically into the verdict: governance_risk ≥ 70 OR financial_return ≤ 20 returns Stop; strategic_alignment, financial_return and change_enablement all ≥ 60 with governance_risk ≤ 40 returns Accelerate; everything else returns Fix. Pass ONLY the pillars the user has real evidence for — do NOT invent numbers for the rest. Missing pillars are estimated deterministically by the engine (from readiness, tier, function and published benchmarks), the response reports which via pillar_basis and scores_used, decision confidence is haircut by how much was estimated, and a fully-estimated pass can never return Accelerate (it returns Fix pending confirmation). So call immediately with whatever the user gave you, then ask for evidence on the estimated pillars and re-call to firm the verdict up.
ai_tierYesAmbition of the AI being deployed: gen1 = automation/RPA, gen2 = GenAI, gen3 = agentic. Interacts with readiness — a more ambitious tier running on lower readiness widens the pace-layer gap, which discounts the modelled EUR value even when the four pillar scores are strong.
functionYesBusiness function where the AI will operate, as one of the accepted enum values — selects which benchmark value drivers and rate ranges apply. Call list_taxonomy for the exact strings if unsure.
industryYesYour industry, as one of the accepted enum values — used to select the benchmark rate multiplier applied to the modelled EUR value. Call list_taxonomy for the exact strings if unsure.
readinessYesOrganisational readiness, honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. Sets the value-capture rate and, paired with ai_tier, the pace-layer drag — lower readiness against a higher tier reduces the captured value. Self-report is gameable: when the user has real process numbers, call infer_readiness first and pass its measured classification here instead.
revenue_eurYesApproximate annual revenue in EUR (must be ≥ 0). Scales the whole output: the benchmark rates are applied as fractions of this figure, so the modelled EUR value range grows with it. A rough order-of-magnitude estimate is fine.
signal_completenessNoOptional 0–1. How grounded the four pillar scores are in real evidence versus estimated from context. Defaults to 1 (treated as measured). If the organisation lacks formal change-readiness or risk metadata, estimate the pillars from what you know AND set this lower to say so — decision confidence is reduced proportionally and a caveat is attached, instead of returning a falsely confident verdict on soft inputs.

Output Schema

ParametersJSON Schema
NameRequiredDescription
auditNoReproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
caveatNoPresent only when signal_completeness was low: warns the verdict rests on soft inputs and confidence was reduced.
reasonYesOne-line justification for the classification.
driversYesNamed value drivers behind the estimate.
bvf_versionYesAI BVF protocol version used.
multipliersYesFactors applied to the base rates.
scores_usedNoThe four pillar values the verdict was actually computed on, whether given by the caller or estimated by the engine. Show these to the user when any pillar was estimated.
sensitivityNoWhat moves this verdict, computed deterministically: the value if readiness were one notch worse, the value at revenue minus 20 percent, and the nearest single-pillar movements that flip the classification. Boards trust ranges with visible assumptions over point estimates; show this.
pillar_basisNoPer pillar: "given" (caller supplied it) or "estimated" (deterministic prior). When any pillar is estimated, tell the user which, and ask for evidence on those to firm up the verdict.
net_value_eurYesModelled net value in EUR after capture rate, low/high.
classificationYesThe verdict for this initiative.
applied_modulesYesBVF scoring modules that fired for this input.
gross_value_eurYesModelled gross value in EUR before capture, low/high.
benchmark_sourceYesCitation for the benchmark rates applied.
advisory_next_stepNoOptional CTA, present only for Fix/Stop verdicts.
decision_confidenceYesConfidence in the verdict, 0-100.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) already indicate safe read-only behavior. The description reinforces this with 'Pure deterministic calculation — no network, auth, or side effects, so calling it is always safe and free.' No contradiction; the description adds context about the deterministic algorithm and estimation logic.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose (multiple paragraphs) but packed with essential guidance. While every sentence adds value, the length could be reduced for an AI agent without losing meaning. It is front-loaded with the purpose and proactive call instruction, but some details (e.g., pillar estimation logic) could be summarized more concisely.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested objects, multiple enums, estimation logic), the description covers all necessary aspects: what the tool does, when to use it, how to handle missing parameters, how to interpret results, and relationships with sibling tools. The output schema likely handles return details, so no gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds significant context beyond the schema: it explains when to omit optional pillar scores, how missing pillars are estimated, and the interaction between ai_tier and readiness. However, the schema already provides good parameter docs, so the added value is substantial but not exceptional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scores a single AI initiative using AI BVF v1.0, returning a classification, EUR value range, and reasoning. It distinguishes from siblings like score_portfolio (for portfolios) and diagnose_process (for diagnosing processes), making the purpose specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly instructs 'CALL THIS PROACTIVELY' and lists scenarios (any AI initiative question). It tells when to use alternatives (score_portfolio for portfolios, diagnose_process for existing processes) and advises calling list_taxonomy for enum clarity. The agent is guided clearly on when and how to invoke the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_portfolioA
Read-onlyIdempotent
Inspect

Score every initiative in an AI BVF v1.0 portfolio in a single call and return the portfolio-level shape: counts of Accelerate / Fix / Stop, aggregate modelled EUR value range, mean decision confidence, the top initiative by value, the highest-risk initiative, and the per-initiative results. Use after validate_portfolio (or instead of looping score_initiative per initiative) when you have a portfolio document and want the board-level verdict, not just one classification. Schema validation runs first; if the portfolio is malformed the response sets valid=false and reports the validation errors without attempting to score. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
portfolioYesA portfolio document conforming to the AI BVF v1.0 schema: bvf_version, organization (name, industry, optional revenue_eur), and a non-empty initiatives array. Each initiative carries id, name, function, ai_tier, and a scores object whose four pillars each carry a numeric value (0–100). Every initiative is run through the same rule as score_initiative — governance_risk ≥ 70 OR financial_return ≤ 20 → Stop; all of strategic_alignment/financial_return/change_enablement ≥ 60 with governance_risk ≤ 40 → Accelerate; else Fix — and the verdicts are aggregated into portfolio counts. organization.revenue_eur is required to model EUR value; initiatives that cannot be scored (missing revenue, unknown function/ai_tier) appear in skipped_initiatives rather than scored_initiatives. Validate first with validate_portfolio if the document may be malformed. Schema: https://www.aibvf.com/protocol.
readinessYesOrganisational readiness applied to every initiative in the portfolio. Honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. The portfolio schema does not carry per-initiative readiness; this single value sets the capture rate for the whole portfolio and, paired with the ai_tier of each initiative, its pace-layer drag — lower readiness against a higher tier discounts the modelled EUR value.

Output Schema

ParametersJSON Schema
NameRequiredDescription
totalYesTotal initiatives in the portfolio (scored + skipped).
validYesTrue when the portfolio passed schema validation. False means no initiatives were scored.
summaryYes
readinessYesReadiness value applied across all initiatives.
bvf_versionYesAI BVF protocol version used.
organizationYesEcho of the portfolio organisation fields applied to scoring.
validation_errorsNoEmpty when valid; otherwise one entry per schema violation.
advisory_next_stepNoOptional CTA, present only when any initiative was Fix or Stop.
scored_initiativesYesPer-initiative scoring result.
skipped_initiativesYesInitiatives that could not be scored, with the reason. Empty when all initiatives scored.
aggregate_net_value_eurYesSum of net EUR value across scored initiatives, low/high.
highest_risk_initiativeNoScored initiative most at risk: worst classification (Stop > Fix > Accelerate), tie-broken by lowest decision_confidence. Omitted when none were scored.
top_initiative_by_valueNoScored initiative with the highest mid-point net EUR value. Omitted when none were scored.
mean_decision_confidenceYesMean decision confidence across scored initiatives (0–100); 0 when none were scored.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) are consistent with description. Description adds valuable context: schema validation runs first, returns errors on malformed input, pure deterministic calculation, no side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose, usage note, and parameter details. Somewhat long but every sentence adds value given complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers validation, deterministic behavior, aggregation logic, parameter semantics, and output. With output schema present, description is complete and sufficient for correct tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description adds significant meaning: explains portfolio schema, required fields, scoring rules; for readiness, explains enum meanings and effect on value modeling, surpassing baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scores an entire AI BVF v1.0 portfolio, listing specific outputs (counts, value range, confidence, top initiatives) and distinguishes from siblings like score_initiative and validate_portfolio.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using after validate_portfolio or instead of looping score_initiative, and notes when board-level verdict is desired, providing clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sequence_portfolioA
Read-onlyIdempotent
Inspect

Turn a scored portfolio into a rollout plan an organisation can actually absorb: three waves with named gates over a configurable horizon (default 90 days). Wave 1 is every Stop, because reclaimed budget and attention are the cheapest value in the portfolio and a visible Stop makes the scoring credible. Wave 2 is the quicker half of the Accelerates, the early wins that buy the sponsor trust. Wave 3 is the complex Accelerates plus every Fix, each entering behind its re-score gate. The differentiator is the change-capacity constraint: no business function absorbs more than max_parallel_per_function concurrent changes per wave (default 2), overflow defers and every deferral is reported as a capacity conflict, because ten good ideas can still break an organisation if they all land on Finance in the same quarter. CALL THIS after score_portfolio (or with any set of scored initiatives) when the user asks which to fund first, what order, what the roadmap looks like, or how much change the organisation can take. Pure deterministic calculation, no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
portfolioNoAlternative input: the same AI BVF v1.0 portfolio document score_portfolio accepts (organization + initiatives with nested {value} pillar scores). Pass either this OR the top-level organization + initiatives; nested score values are flattened automatically, and missing pillars are estimated honestly.
readinessYesOrganisational readiness applied across the portfolio; sets capture rates and pacing. Measure it with infer_readiness when process numbers exist.
constraintsNoChange-capacity constraints. The defaults encode the core principle: no function absorbs unlimited concurrent change.
initiativesNoThe portfolio to sequence. Each initiative carries flat 0-100 pillar numbers (not the nested value objects of the portfolio wire format).
organizationNo

Output Schema

ParametersJSON Schema
NameRequiredDescription
auditYesReproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
wavesYesThree waves with named gates: Stops first (free the budget), quick Accelerates second (buy trust), complex Accelerates plus Fixes third (spend the trust). Present this to the user as the rollout plan.
totalsYesCounts: stopped, quick_wins, complex_or_fix, deferred.
skippedNo
bvf_versionYes
capacity_conflictsYesWhere more initiatives land on one function than it can absorb per wave, with the deferral applied. Surface these: an overloaded function is how good portfolios fail.
sequencing_principlesYes
deferred_beyond_horizonNoInitiatives that did not fit the horizon under the capacity constraint; they need their own decision.
aggregate_accelerate_value_eurNoSum of modelled net EUR for the sequenced Accelerates, low and high.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. The description confirms 'Pure deterministic calculation, no network, auth, or side effects,' adding full transparency and no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly long but front-loads the main purpose and then structurally details each wave. While every sentence adds value, it could be slightly more concise without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the existence of an output schema, the description provides comprehensive context about inputs, the algorithmic logic, constraint handling, and behavior for deferrals, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80% (high), so baseline is 3. However, the description adds significant value by explaining default values (horizon_days=90, max_parallel_per_function=2), the logic behind wave assignments, and the capacity constraint overflow behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it turns a scored portfolio into a rollout plan with three waves and gates, using specific rules. It distinguishes itself from sibling tools like score_portfolio by explicitly mentioning it should be called after score_portfolio.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'CALL THIS after score_portfolio' and lists specific user queries that trigger its use (which to fund first, roadmap, change capacity). It also implies not to call without scoring.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_portfolioA
Read-onlyIdempotent
Inspect

Check that a BVF portfolio document conforms to the AI BVF v1.0 schema before you score, store, or share it. Returns { valid: true } when well-formed, or { valid: false, errors: [...] } where each error names the failing JSON path and the rule it broke. Use this to catch malformed portfolios early; use score_initiative to evaluate a single initiative, or score_portfolio to score them all in one call. Schema: https://www.aibvf.com/protocol. Pure deterministic validation — no network, auth, or side effects.

ParametersJSON Schema
NameRequiredDescriptionDefault
portfolioYesThe portfolio document as a JSON object following the AI BVF v1.0 schema: a top-level object with bvf_version, organization, and a non-empty "initiatives" array, each initiative carrying the same fields score_initiative expects (industry, revenue_eur, function, ai_tier, readiness, and a scores object with the four 0–100 pillars). Checked structurally only — required fields present, correct types, enum values valid, pillar numbers in range; the pillar values are NOT scored or judged here (use score_initiative or score_portfolio for that). On failure, errors[] names each failing JSON path and the rule it broke.

Output Schema

ParametersJSON Schema
NameRequiredDescription
validYesTrue when the portfolio conforms to the schema.
errorsYesEmpty when valid; otherwise one entry per schema violation.
bvf_versionYesAI BVF protocol version validated against.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by detailing the return format, confirming no network/auth/side effects, and specifying what the tool does (structural checks) and does not do (scoring). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two paragraphs, front-loading the purpose and then providing detailed behavior. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and the presence of annotations and output schema, the description fully covers what an agent needs to know: purpose, usage, behavior, parameter details, and return format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description adds significant context about the expected structure, types of checks performed, and what is not done. It complements the schema well.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks a BVF portfolio document against the AI BVF v1.0 schema, and explicitly distinguishes it from sibling tools score_initiative and score_portfolio by saying 'Use this to catch malformed portfolios early; use score_initiative to evaluate a single initiative, or score_portfolio to score them all in one call.'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('before you score, store, or share it') and when to use alternatives. It also clarifies that it is pure deterministic validation with no side effects.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.