aibvf-mcp

by io.github.Bahamas1717

Server Details

AI BVF: score AI portfolios Stop/Fix/Accelerate with decision confidence and pace-layer drag.

Status: Healthy
Last Tested: 2026-07-26 02:31
Transport: Streamable HTTP
URL
Repository: Craig-Horton/ai-bvf
GitHub Stars: 0
Server Listing: aibvf-mcp

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A4.7/5.0

Tool DescriptionsA

Average 4.8/5 across 9 of 9 tools scored.

Server CoherenceA

Disambiguation5/5

Each tool targets a distinct operation: cost calculation, process diagnosis, benchmark lookup, readiness inference, taxonomy listing, improvement recommendations, single initiative scoring, portfolio scoring, and portfolio validation. There is no overlap, and the descriptions clearly differentiate between similar tools like score_initiative and score_portfolio.

Naming Consistency5/5

All tool names consistently follow the verb_noun pattern in snake_case, e.g., calculate_pace_layer_drag, diagnose_process, score_initiative. There is no mixing of conventions or inconsistent verb forms, making the naming predictable and easy to understand.

Tool Count5/5

With 9 tools, the server covers the essential functions of the AI BVF framework without being excessive or sparse. Each tool serves a well-defined purpose, and the count feels appropriate for the domain of business value assessment and process improvement.

Completeness4/5

The toolset covers core workflows: scoring, diagnosis, drag calculation, benchmarking, readiness inference, taxonomy help, and portfolio management. Minor gaps exist, such as no tool for creating or editing a portfolio (only validating and scoring), but the documented workflow using validate_portfolio and score_portfolio effectively covers portfolio analysis.

Available Tools

13 tools

assemble_portfolioA

Read-onlyIdempotent

Inspect

Assemble a valid AI BVF v1.0 portfolio document from loose inputs, deterministically. Agents arrive with initiative names, plain-language functions and half the pillar scores, then hand-build the portfolio JSON and get the shape wrong; this tool builds it right. Give it the organisation (name plus industry in canonical or everyday language) and one entry per initiative (name, function, ai_tier, plus whatever pillar scores you actually have as bare numbers) and it returns the finished document: aliases resolved through the same mapping as map_to_taxonomy, ids generated from names and deduplicated, missing pillars estimated from readiness, tier, function and the published benchmarks with the estimation reported per initiative in estimated_pillars, and the whole document validated before it is returned. CALL THIS when the user lists several AI initiatives in conversation and you need a portfolio document for validate_portfolio, score_portfolio or sequence_portfolio, instead of composing the JSON by hand. Do NOT invent pillar scores to fill it: pass only the numbers the user gave you and let the estimation carry the rest honestly, the estimated pillars carry low confidence and scoring haircuts accordingly. Unresolvable inputs come back as issues with suggestions; ask the user to choose rather than guessing. Every default the assembler applies is named in plain language in assumptions: surface them to the user, the assembler structures inputs and never makes hidden business judgements. This tool creates a document in the response only: nothing is stored, nothing is edited, no state exists between calls. Pure deterministic calculation, no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description
`readiness`	No	Organisational readiness, canonical or plain language (bureaucratic resolves to siloed). Drives estimation of missing pillars. Defaults to traditional.
`initiatives`	Yes	One entry per initiative, from whatever the user gave you. Only name, function and ai_tier are required.
`organization`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`audit`	Yes	Reproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
`issues`	Yes	Unresolved inputs, each with path, message and suggestions where the taxonomy has them.
`guidance`	Yes
`portfolio`	No	The assembled BVF v1.0 document, ready for validate_portfolio, score_portfolio and sequence_portfolio. Null when assembly is blocked on issues.
`validation`	No	validate() run on the assembled document.
`assumptions`	Yes	Every default the assembler applied, in plain language. Surface these to the user: what was not given is named here.
`bvf_version`	Yes
`resolutions`	Yes	Every alias resolution performed, in plain language.
`readiness_used`	Yes
`estimated_pillars`	Yes	Initiative id to the pillars the assembler estimated. Gather evidence for these, or expect scoring to haircut confidence.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description supplements the annotations (readOnlyHint, idempotentHint, destructiveHint) by explicitly stating the tool is a 'pure deterministic calculation, no network, auth, or side effects' and that 'nothing is stored, nothing is edited, no state exists between calls.' It also discloses that missing pillar scores are estimated with low confidence and reported separately. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively long but front-loaded with the core purpose. It contains useful details without being overly verbose for a complex tool. However, it could be slightly more structured (e.g., bullet points) to improve readability while maintaining completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but indicated), the description adequately covers the return values by summarizing what the document contains (aliases resolved, IDs generated, missing pillars estimated, validated). It also explains handling of unresolved inputs. The description is complete for a tool with three parameters, nested objects, and clear behavioral context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the schema, such as explaining the readiness parameter drives estimation of missing pillars and defaults to 'traditional,' and clarifying that only name, function, and ai_tier are required for initiatives. It also warns against inventing pillar scores and explains aliases resolution. The schema already has descriptions for some parameters, so the overall coverage is good.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool assembles a valid AI BVF v1.0 portfolio document from loose inputs, deterministically. It distinguishes itself from sibling tools by noting it should be used instead of composing JSON by hand, and references downstream tools like validate_portfolio, score_portfolio, and sequence_portfolio.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'CALL THIS when the user lists several AI initiatives in conversation and you need a portfolio document for validate_portfolio, score_portfolio or sequence_portfolio.' It provides clear usage guidance, including what not to do (don't invent pillar scores) and how to handle unresolved inputs. It lacks explicit alternatives for when not to use the tool, but the context is sufficiently clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assess_ai_initiativeA

Read-onlyIdempotent

Inspect

The front door for one AI investment decision. CALL THIS FIRST when the user describes an AI idea in ordinary language or asks whether it should proceed. Pass the proposal as written; the tool resolves industry, revenue, business function, AI tier and organisational readiness deterministically. If one or more inputs remain unknown, it returns the single next question to ask, along with every value already resolved, so call it again with that answer. When all five inputs are present it runs the same engine as score_initiative and returns Accelerate, Fix or Stop, the modelled EUR range, confidence, assumptions and audit trail. Explicit fields override proposal inference, pillar scores remain optional, and unresolved values are never guessed. Use score_initiative only when the canonical fields are already known, score_portfolio for several initiatives, and diagnose_process for measured waste in an existing process. Pure deterministic calculation, no network, auth or side effects.

ParametersJSON Schema

Name	Required	Description
`scores`	No	OPTIONAL, and each pillar inside it is optional. The four AI BVF pillars, each an honest 0–100 self-assessment, combining deterministically into the verdict: governance_risk ≥ 70 OR financial_return ≤ 20 returns Stop; strategic_alignment, financial_return and change_enablement all ≥ 60 with governance_risk ≤ 40 returns Accelerate; everything else returns Fix. Pass ONLY the pillars the user has real evidence for — do NOT invent numbers for the rest. Missing pillars are estimated deterministically by the engine (from readiness, tier, function and published benchmarks), the response reports which via pillar_basis and scores_used, decision confidence is haircut by how much was estimated, and a fully-estimated pass can never return Accelerate (it returns Fix pending confirmation). So call immediately with whatever the user gave you, then ask for evidence on the estimated pillars and re-call to firm the verdict up.
`ai_tier`	No	Optional correction or answer: automation/RPA, GenAI/copilot, or agentic/autonomous. Overrides anything inferred from proposal.
`function`	No	Optional correction or answer in canonical or everyday language, for example customer service, procurement, finance or risk. Overrides anything inferred from proposal.
`industry`	No	Optional correction or answer in canonical or everyday language, for example retail, hospital, bank or public sector. Overrides anything inferred from proposal.
`proposal`	Yes	The AI initiative in ordinary business language. Include the organisation, industry, approximate annual revenue, business function, AI ambition and how the organisation works today when known. The resolver extracts what it can and asks one question for the first missing input; it never guesses an unresolved taxonomy value.
`readiness`	No	Optional correction or answer: agile, traditional, or siloed, including everyday descriptions such as cross-functional, hierarchical or bureaucratic. Overrides anything inferred from proposal.
`revenue_eur`	No	Optional approximate annual revenue in EUR. Overrides any EUR amount extracted from proposal. No currency conversion is performed.
`signal_completeness`	No	Optional 0–1. How grounded the four pillar scores are in real evidence versus estimated from context. Defaults to 1 (treated as measured). If the organisation lacks formal change-readiness or risk metadata, estimate the pillars from what you know AND set this lower to say so — decision confidence is reduced proportionally and a caveat is attached, instead of returning a falsely confident verdict on soft inputs.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes	needs_input when one or more required decision inputs remain unresolved; verdict when scoring completed.
`verdict`	No	The AI BVF score. Present only when status is verdict.
`proposal`	Yes	The supplied proposal, returned so the next call can preserve it verbatim.
`bvf_version`	Yes
`resolutions`	Yes	Every deterministic resolution, naming the field, canonical value, source and matched phrase.
`suggestions`	No	Accepted values for an explicitly supplied field that could not be resolved.
`next_question`	No	The single next question to ask. Present only when status is needs_input.
`missing_fields`	Yes
`resolved_inputs`	Yes	Canonical fields resolved so far. Explicit corrections override proposal inference.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnly, idempotent, non-destructive. Description reinforces with 'Pure deterministic calculation, no network, auth or side effects.' Discloses iterative behavior: returns next question if inputs incomplete, never guesses unresolved values. Adds context beyond annotations about the deterministic resolution process.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Despite length, every sentence is necessary given the tool's complexity. Front-loaded with the critical instruction 'CALL THIS FIRST'. Clear sections: when to call, what it does, parameter behavior, sibling differentiation, and safety guarantees. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Completely covers all aspects: purpose, usage, iterative process, parameter behavior (overrides, optionality, estimation), verdict types (Accelerate, Fix, Stop), return values (EUR range, confidence, assumptions, audit trail), and sibling distinctions. Output schema exists but description sufficiently explains outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions, but the description adds substantial meaning: explains the optional nature of scores, provides estimation logic, and instructs to pass only real evidence. Adds guidance on overrides and the signal_completeness parameter. This goes well beyond the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is the 'front door for one AI investment decision' and specifies when to call it first. Distinguishes from siblings like score_initiative, score_portfolio, diagnose_process. The verb+resource 'assess_ai_initiative' is specific and informative.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'CALL THIS FIRST when the user describes an AI idea' and provides detailed when-to-use guidance, including alternatives: 'Use score_initiative only when the canonical fields are already known, score_portfolio for several initiatives, and diagnose_process for measured waste.' Also explains the iterative calling pattern.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calculate_pace_layer_dragA

Read-onlyIdempotent

Inspect

Quantify the annual EUR cost of an AI ambition outrunning the operating model: queues, hand-offs and slow decisions that prevent the organisation capturing the value already assumed in the case. CALL THIS when the user needs the cost of waiting for the organisation to change, or when a Fix plan needs a cost-of-waiting figure. Do not use it to score an AI initiative, estimate the implementation cost, or calculate a process saving: use score_initiative for the investment verdict, diagnose_process for a running process, and recommend_improvements for the change plan. revenue_eur sets the absolute EUR range; ai_tier and readiness together set the drag rate and pace_gap, so gen3 in a siloed organisation costs more than gen1 in an agile one. industry is accepted for a consistent interface and defaults to universal, but does not change this calculation yet. Returns a low/high EUR range, drag rate, pace-gap severity, drivers and source. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description
`ai_tier`	Yes	Ambition of the AI operating model: gen1 = automation/RPA, gen2 = GenAI, gen3 = agentic. Paired with readiness to set pace_gap severity — gen3 on any readiness below agile, or gen2 on siloed, is severe; a higher tier against a slower operating model widens the gap and raises the drag.
`industry`	No	Optional; defaults to universal if omitted. Reserved for future vertical drag-rate adjustments — does not change the result today. Call list_taxonomy for accepted values.
`readiness`	Yes	Organisational readiness, honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. Agile readiness yields minimal drag at any tier; the mismatch between a fast AI tier and a slower operating model is what generates the Organisational Drag Cost.
`revenue_eur`	Yes	Approximate annual revenue in EUR (must be ≥ 0). The result scales with this: annual_drag_eur is returned as an absolute range and as drag_rate, a fraction of this revenue (e.g. 0.02 = 2%).

Output Schema

ParametersJSON Schema

Name	Required	Description
`source`	Yes	Citation for the drag-rate model applied.
`drivers`	Yes	Named factors contributing to the drag.
`pace_gap`	Yes	Severity of the tier↔readiness mismatch.
`drag_rate`	Yes	Drag as a fraction of revenue (e.g. 0.02 = 2%), low/high.
`bvf_version`	Yes	AI BVF protocol version used.
`annual_drag_eur`	Yes	Estimated annual Organisational Drag Cost in EUR, low/high.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds context beyond annotations: 'Pure deterministic calculation — no network, auth, or side effects.' It also clarifies that industry does not affect the result yet. No contradiction exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is approximately 150 words, well-structured with the main purpose first, followed by usage guidelines, parameter relationships, and return types. Every sentence adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, deterministic calculation with multiple output fields), the description is thorough. It explains return values: 'Returns a low/high EUR range, drag rate, pace-gap severity, drivers and source.' The presence of an output schema (as indicated by context) further reduces the burden on the description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds explanatory value by describing interactions: 'revenue_eur sets the absolute EUR range; ai_tier and readiness together set the drag rate and pace_gap' and clarifies that industry is accepted but does not change the calculation. This raises the score to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Quantify the annual EUR cost of an AI ambition outrunning the operating model'. It uses a specific verb ('Quantify') and resource ('annual EUR cost'), and explicitly distinguishes from sibling tools by stating when to call and when not to, with alternative tool names provided.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'CALL THIS when the user needs the cost of waiting for the organisation to change, or when a Fix plan needs a cost-of-waiting figure.' It also states when not to use it ('Do not use it to score an AI initiative, estimate the implementation cost, or calculate a process saving') and lists alternative sibling tools ('score_initiative, diagnose_process, recommend_improvements').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diagnose_processA

Read-onlyIdempotent

Inspect

Diagnose a single existing business process from operational evidence and return the intervention, modelled net EUR saving, efficiency gain, verdict and confidence. CALL THIS when the user can describe a process already running, including volume, touch time, waiting, hand-offs, rework, automation and cost. instances_per_year × fte_hours_per_instance × loaded_hourly_rate_eur builds the labour baseline, direct_spend_eur adds the non-labour baseline, and readiness caps the saving that the organisation can realise. The friction signals select the intervention: low automation points to Automate, many hand-offs or wait to Consolidate & re-sequence, rework to Quality controls, low-volume heavy work to Eliminate / insource. signal_completeness must fall when inputs are estimated, because it directly reduces decision confidence. Use score_initiative for a proposed AI investment and infer_readiness when the question is the organisation’s change capacity. Effectiveness bands are benchmark-cited and figures are directional, not audited. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description
`function`	Yes	Business function the process belongs to. See list_taxonomy.
`handoffs`	Yes	Distinct owners/systems an instance passes through. Weighed against the per-function median; many handoffs make handoff drag dominant and point to Consolidate & re-sequence.
`readiness`	No	Optional. Org change-absorption capacity — agile / traditional / siloed — which caps the realised (net) saving below the gross potential. Defaults to traditional.
`process_id`	Yes	Stable identifier for the process.
`rework_rate`	Yes	Fraction of instances reopened/reworked (0–1). When rework is the dominant drag factor the intervention becomes Quality controls, and it also sets the addressable share for that path.
`touch_ratio`	Yes	Touch-time ÷ cycle-time (0–1). The remainder is wait; a low value means the process is mostly waiting, which pushes the intervention toward Consolidate & re-sequence.
`cycle_time_days`	Yes	Median wall-clock days per instance, end to end. Long cycles relative to touch-time signal wait/latency drag.
`automation_level`	Yes	Share already automated (0–1). Low automation makes manual effort the dominant drag and selects Automate; the un-automated remainder is the addressable share.
`direct_spend_eur`	Yes	Annual licence/vendor/tooling spend on the process in EUR. Added to the labour baseline and shifts how much of the saving is labour- vs spend-addressable.
`instances_per_year`	Yes	Process volume: how many times it runs per year. Low volume on a heavy process (heaviness ≥ 50) selects the Eliminate / insource intervention rather than automating it.
`signal_completeness`	No	Optional 0–1. How much of the above was measured versus defaulted. Governs decision_confidence proportionally — lower it when you estimated inputs so the verdict stays honest. Defaults to 0.7.
`fte_hours_per_instance`	Yes	Human touch-time in hours per instance. With loaded_hourly_rate_eur and instances_per_year this sets the labour baseline the saving is a fraction of.
`loaded_hourly_rate_eur`	Yes	Fully-loaded labour cost per hour in EUR (salary + on-costs). Multiplies fte_hours_per_instance × instances_per_year into the annual labour baseline.

Output Schema

ParametersJSON Schema

Name	Required	Description
`verdict`	Yes	The call on the intervention.
`function`	Yes	Business function diagnosed.
`heaviness`	Yes	Process heaviness index, 0–100.
`disclaimer`	Yes	Directional decision aid, not an audited figure.
`process_id`	Yes	Echo of the input process id.
`assumptions`	Yes	The assumptions behind the figure — never a naked number.
`bvf_version`	Yes	AI BVF protocol version used.
`intervention`	Yes	Recommended move.
`brain_version`	Yes	Advisor Brain model version used.
`net_saving_eur`	Yes	Modelled net annual saving in EUR after readiness capture, low/high.
`offer_to_execute`	Yes	True when the verdict warrants offering to action it (Accelerate).
`baseline_cost_eur`	Yes	Current annual cost: labour + direct spend.
`evidence_maturity`	Yes	Strength of the benchmark evidence behind the effectiveness band.
`advisory_next_step`	No	Optional CTA, present only for Fix/Stop verdicts.
`drag_decomposition`	Yes	Share of heaviness from each friction factor (sums to ~1).
`decision_confidence`	Yes	Confidence in the verdict, 0–100.
`efficiency_gain_pct`	Yes	Efficiency improvement on the targeted slice, percent.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and idempotentHint=true. The description adds 'Pure deterministic calculation — no network, auth, or side effects' and explains how signal_completeness affects confidence, which adds context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but front-loaded with core purpose. It uses multiple sentences effectively, though a more structured format could improve readability. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's 13 parameters and existing output schema, the description covers input-output logic, default behaviors, and boundary conditions. It explains how decisions are made, providing sufficient context for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema coverage, the description adds significant meaning: explains how inputs build the labour baseline, how friction signals select intervention, defaults for readiness and signal_completeness, and relationships between parameters (e.g., low volume + heavy process → Eliminate/insource).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Diagnose a single existing business process' and lists specific outputs (intervention, saving, efficiency, verdict, confidence). It distinguishes from siblings by referencing score_initiative and infer_readiness for alternative questions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage cue: 'CALL THIS when the user can describe a process already running...' and provides when-not-to-use by directing to score_initiative and infer_readiness for specific sub-questions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_benchmarkA

Read-onlyIdempotent

Inspect

Look up the published raw benchmark rates behind the value model for one business function and industry. CALL THIS when the user wants to inspect the revenue-uplift and cost-takeout assumptions before scoring, or to compare the value drivers across functions. function selects the base rate range and named drivers; industry applies the multiplier, while universal returns the unadjusted base rate. The output is a rate, expressed as a fraction of revenue, not an initiative verdict or EUR business case. Use score_initiative for an Accelerate/Fix/Stop decision, score_portfolio for several initiatives and diagnose_process for measured operational waste. Pure deterministic lookup — no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description	Default
`function`	Yes	Business function to benchmark — must be one of the list_taxonomy function values. Selects the base revenue-uplift and cost-reduction rate ranges (returned as fractions of revenue) and the value drivers.
`industry`	Yes	Industry whose multiplier to apply — must be one of the list_taxonomy industry values. The returned industry_multiplier is applied to the function base rates; pass "universal" for the un-adjusted rates.

Output Schema

ParametersJSON Schema

Name	Required	Description
`source`	Yes	Citation for the benchmark figures.
`drivers`	Yes	Named value drivers behind the benchmark.
`function`	Yes	Business function the rates apply to.
`industry`	Yes	Industry whose multiplier was applied.
`cost_takeout_range`	Yes	Cost take-out as a fraction of revenue, lo/hi.
`industry_multiplier`	Yes	Multiplier applied to the base rates for this industry.
`revenue_uplift_range`	Yes	Revenue uplift as a fraction of revenue, lo/hi.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds 'Pure deterministic lookup — no network, auth, or side effects,' which aligns with and reinforces the annotations, providing additional confidence about the tool's behavior with no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the primary purpose, then provides usage guidance, parameter semantics, and behavioral notes—all in a compact paragraph. Every sentence adds value; no redundancy or irrelevant details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with two enum parameters and an output schema, the description is complete. It specifies the return type ('fraction of revenue'), clarifies what is not returned (initiative verdict or EUR business case), and mentions the existence of industry_multiplier. The output schema handles the detailed structure, so the description provides sufficient context without over-explaining.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (both parameters fully described in schema). The description goes beyond schema by explaining the roles: 'function selects the base rate range and named drivers; industry applies the multiplier, while universal returns the unadjusted base rate.' This adds meaningful context for effective parameter selection.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it looks up published raw benchmark rates for a specific business function and industry. It uses a specific verb ('look up') and resource ('published raw benchmark rates'), and distinguishes itself from sibling tools like score_initiative, score_portfolio, and diagnose_process by explaining what it does not do.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs when to call this tool: 'when the user wants to inspect the revenue-uplift and cost-takeout assumptions before scoring, or to compare the value drivers across functions.' It also tells when not to use it (e.g., for initiative verdicts) and provides specific sibling alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

infer_readinessA

Read-onlyIdempotent

Inspect

Measure organisational readiness from process data, so the investment case does not depend on an untested maturity claim. CALL THIS before score_initiative, score_portfolio or calculate_pace_layer_drag when the user can provide at least two of five signals: hand-offs, rework, touch ratio, automation level and cycle time. function selects the comparison medians for hand-offs and cycle time; more signals increase confidence and disagreement between them reduces it. claimed_readiness is optional, but pass it when the organisation has declared itself agile, traditional or siloed, because the returned gap exposes where its self-image runs ahead of the process data. Fewer than two signals produces a refusal, not a guess. Pass the measured readiness into the downstream tool, then use diagnose_process when the next question is what to change in that process. Pure deterministic calculation, no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description
`function`	Yes	Business function the process belongs to. Selects the published cycle-time and hand-off medians the signals are read against. Call list_taxonomy if unsure.
`handoffs`	No	Distinct owners or systems an instance passes through. Read against the function median: 1.5x or more the median reads siloed, at or above the median reads traditional, below it reads agile.
`rework_rate`	No	Fraction of instances reopened or reworked (0-1). 15% or more reads siloed, 5-15% traditional, under 5% agile.
`touch_ratio`	No	Touch-time divided by cycle-time (0-1); the remainder is waiting. Under 0.15 reads siloed (the process lives in queues), 0.15-0.4 traditional, above 0.4 agile.
`cycle_time_days`	No	Median wall-clock days per instance. Read against the function median, same bands as handoffs.
`automation_level`	No	Share of the process already automated (0-1). Under 0.2 reads siloed, 0.2-0.5 traditional, above 0.5 agile.
`claimed_readiness`	No	Optional. What the organisation says about itself. The measured result is compared against it and the gap returned as readiness_gap plus a gap_finding, because an organisation whose self-image runs ahead of its process data has just told you where the change work starts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`audit`	No	Reproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
`guidance`	Yes	How to use the result downstream, including what a gap between measured and self-reported readiness means.
`readiness`	Yes	The readiness classification the measured signals support.
`confidence`	Yes	Confidence 0-100, set by signal coverage (2 signals ~45, 5 signals ~90) and discounted when signals disagree.
`bvf_version`	Yes	AI BVF protocol version used.
`gap_finding`	No	The claimed-versus-measured gap read as a change-readiness finding. Surface verbatim when present.
`disagreement`	No	Present when signals point in opposing directions: readiness is uneven across the process, read the per-signal detail.
`signal_reads`	Yes	Per-signal read: the value, which readiness it leans toward, and why in plain language. Show these to the user.
`signals_used`	Yes	How many of the five signals were provided.
`readiness_gap`	No	Ordinal distance claimed-to-measured. Positive: the organisation claims better than it measures.
`readiness_basis`	Yes	Always measured: this came from process data, not self-report.
`claimed_readiness`	No	Echo of the claim, when supplied.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint. The description adds that it is a pure deterministic calculation with no network, auth, or side effects, aligning with annotations. It also describes algorithmic behavior (selection of medians, confidence vs disagreement) without contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is comprehensive (8-9 sentences) but each sentence adds distinct value. It is front-loaded with purpose and usage. Slightly less concise than ideal, but no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, 5 signals, output schema exists), the description covers all necessary aspects: when to use, algorithmic behavior, parameter semantics, output usage, and next steps. Output schema existence reduces need for return value details, making the description complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with individual parameter descriptions, but the description adds significant contextual meaning: for each signal parameter, it explains how values map to readiness levels (siloed, traditional, agile) based on thresholds relative to function medians. It also clarifies the purpose of the optional claimed_readiness parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool measures organisational readiness from process data, with a specific verb and resource. It distinguishes itself from sibling tools by explicitly calling out when to call it before score_initiative, score_portfolio, or calculate_pace_layer_drag.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use conditions ('CALL THIS before... when the user can provide at least two of five signals'), what to do with the output (pass into downstream tool), and what to use next (diagnose_process). It also states a clear exclusion: fewer than two signals produces a refusal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_taxonomyA

Read-onlyIdempotent

Inspect

Return the exact industry, function, AI-tier and readiness values every AI BVF calculation accepts. CALL THIS when the caller needs the complete allowed list or when a free-text value is not obvious. It returns taxonomy only, no score, verdict or language mapping. Use map_to_taxonomy when the user has said customer service, banking, RPA or bureaucratic and you need the one canonical value; use this tool when they need the whole menu of values to choose from. Takes no parameters. Pure deterministic lookup — no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`ai_tiers`	Yes	All accepted ai_tier values (gen1/gen2/gen3).
`functions`	Yes	All accepted business-function values.
`readiness`	Yes	All accepted organisational-readiness values.
`industries`	Yes	All accepted industry values.
`bvf_version`	Yes	AI BVF protocol version these enums belong to.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. Description adds 'Pure deterministic lookup — no network, auth, or side effects', which expands on the behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

5 sentences with no wasted words. Front-loaded with action and purpose. Every sentence adds necessary information: return values, usage context, distinctions, and nature.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a parameterless tool with output schema, description covers what it returns, when to use, what it doesn't do, and compares with sibling. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema coverage is 100%. Description mentions 'Takes no parameters', which is redundant but confirms. Baseline for 0 params is 4; no additional param detail needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns exact industry, function, AI-tier, and readiness values. It uses specific verbs 'return' and contrasts with sibling tool map_to_taxonomy, making the tool's purpose distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'CALL THIS when the caller needs the complete allowed list' and contrasts with map_to_taxonomy for single-value mapping. Provides clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

map_to_taxonomyA

Read-onlyIdempotent

Inspect

Map everyday business language to the canonical AI BVF values required by the scoring tools. CALL THIS when the user says customer service, procurement, banking, GenAI copilot or bureaucratic and the matching enum is not certain. Pass only the fields written in free text; each returns the canonical value, what it matched on, or null with suggestions. A null result requires the user to choose from the suggestions, because a plausible guess would change the score. Use list_taxonomy when the user needs every permitted value, then pass the mapped values into score_initiative, diagnose_process, get_benchmark or the portfolio tools. Pure deterministic lookup, no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description
`ai_tier`	No	Everyday AI language, e.g. RPA, GenAI copilot, autonomous agents. Resolved to gen1/gen2/gen3.
`function`	No	Everyday function language, e.g. customer service, procurement, legal, people. Resolved to cx, supply, risk, hr and so on.
`industry`	No	Everyday industry language, e.g. banking, ecommerce, pharma. Resolved to the canonical enum.
`readiness`	No	Everyday culture language, e.g. bureaucratic, cross-functional, hierarchical. Resolved to agile/traditional/siloed.

Output Schema

ParametersJSON Schema

Name	Required	Description
`ai_tier`	No
`function`	No
`guidance`	Yes
`industry`	No	input, resolved and matched_on; or resolved null with suggestions when no confident match.
`readiness`	No
`bvf_version`	Yes

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Even though annotations already declare readOnlyHint and idempotentHint, the description adds deterministic, no-side-effects context. It explains null results require user choice, which is crucial behavioral information beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with a one-sentence summary, followed by usage and behavioral details. It is dense but not verbose; every sentence serves a purpose. Slight room for improvement in sentence flow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 free-text parameters and an output schema (not shown), the description adequately covers purpose, usage, parameter semantics, and null behavior. It references sibling tools and downstream workflows, making it complete for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning by explaining each parameter accepts everyday language and returns canonical values. It also describes the return structure (canonical value, matched phrase, null with suggestions), which enriches the static schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool maps everyday business language to canonical taxonomy values. It provides specific examples (customer service, procurement) and distinguishes its role from list_taxonomy. The verb 'map' and resource 'taxonomy' are precise, with scope indicated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to call this tool (when enum uncertainty) and when not to (use list_taxonomy for all values). It also lists downstream tools that accept the mapped values, providing clear context for alternative choices.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recommend_improvementsA

Read-onlyIdempotent

Inspect

Turn a Fix or Stop verdict into the change plan that could earn a re-score: pillar targets plus named plays, owners, stop conditions, cost of waiting and a deadline. CALL THIS after score_initiative returns Fix or Stop. Do not call it for an Accelerate verdict unless the user has identified a real delivery risk and needs to test the plan before committing. resistance_type changes the change-enablement play: will selects Kotter urgency, coalition and ADKAR Desire, skill selects ADKAR Knowledge and Ability. risk_type changes the governance route: regulatory selects pre-deployment remediation, reputational selects visible trust guardrails, operational selects proportionate controls. Omit either only when unknown, then the response marks the inferred play provisional and gives the question needed to test it. The four pillars can be partial, but estimated scores make the plan provisional and must be re-measured before spend is committed. Lead with binding_constraint, surface honest_stop verbatim when present, then use the rescore_gate to decide whether this remains a Fix or becomes Stop. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description
`scores`	No	OPTIONAL, and each pillar inside it is optional. The four AI BVF pillars, each an honest 0–100 self-assessment, combining deterministically into the verdict: governance_risk ≥ 70 OR financial_return ≤ 20 returns Stop; strategic_alignment, financial_return and change_enablement all ≥ 60 with governance_risk ≤ 40 returns Accelerate; everything else returns Fix. Pass ONLY the pillars the user has real evidence for — do NOT invent numbers for the rest. Missing pillars are estimated deterministically by the engine (from readiness, tier, function and published benchmarks), the response reports which via pillar_basis and scores_used, decision confidence is haircut by how much was estimated, and a fully-estimated pass can never return Accelerate (it returns Fix pending confirmation). So call immediately with whatever the user gave you, then ask for evidence on the estimated pillars and re-call to firm the verdict up.
`ai_tier`	Yes	Ambition of the AI being deployed: gen1 = automation/RPA, gen2 = GenAI, gen3 = agentic. Interacts with readiness — a more ambitious tier running on lower readiness widens the pace-layer gap, which discounts the modelled EUR value even when the four pillar scores are strong.
`function`	Yes	Business function where the AI will operate, as one of the accepted enum values — selects which benchmark value drivers and rate ranges apply. Call list_taxonomy for the exact strings if unsure.
`industry`	Yes	Your industry, as one of the accepted enum values — used to select the benchmark rate multiplier applied to the modelled EUR value. Call list_taxonomy for the exact strings if unsure.
`readiness`	Yes	Organisational readiness, honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. Sets the value-capture rate and, paired with ai_tier, the pace-layer drag — lower readiness against a higher tier reduces the captured value. Self-report is gameable: when the user has real process numbers, call infer_readiness first and pass its measured classification here instead.
`risk_type`	No	Optional. The nature of a high governance-risk score: "regulatory" = statute applies (EU AI Act, GDPR Article 22, DORA), "reputational" = the risk is how failure looks and lands publicly, "operational" = the system failing quietly inside a process. Selects between a regulatory remediation sequence, visible trust guardrails, and a proportionate governance review. If you do not know, omit it: the engine infers (gen3 tier, or a regulated function/industry, infers regulatory) and marks the play provisional.
`revenue_eur`	Yes	Approximate annual revenue in EUR (must be ≥ 0). Scales the whole output: the benchmark rates are applied as fractions of this figure, so the modelled EUR value range grows with it. A rough order-of-magnitude estimate is fine.
`resistance_type`	No	Optional. What sits behind a low change-enablement score: "will" = people do not want the change (power shifts, fear, no case for change), "skill" = people cannot yet do it (capability and capacity gap). Selects between a coalition-building play (Kotter 1-2 + ADKAR Awareness/Desire) and an owner-and-capability play (ADKAR Knowledge/Ability). If you do not know, omit it: the engine infers from readiness (agile infers skill, traditional/siloed infers will) and marks the play provisional. Ask the user "is the resistance about not wanting this, or not being able to do it yet?" and re-call to sharpen.

Output Schema

ParametersJSON Schema

Name	Required	Description
`audit`	No	Reproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
`notes`	Yes	Caveats or context on the recommendation set.
`feasible`	Yes	Whether the target is reachable via the listed pillar moves.
`feedback`	No	Optional one-question feedback route, present only for Fix/Stop verdicts. The link opens a prefilled email; no response is recorded unless the user chooses to send it.
`bvf_version`	Yes	AI BVF protocol version used.
`change_plan`	No	The change-leader layer: a specific, sequenced route from Fix or Stop toward Go, aimed at the organisation. Present for Fix/Stop, absent when the initiative is already Accelerate. Present this to the user as the plan, not as raw data.
`recommendations`	Yes	Per-pillar improvement actions.
`advisory_next_step`	No	Optional CTA, present only for Fix/Stop verdicts.
`target_classification`	Yes	Verdict the recommendations aim to reach.
`current_classification`	Yes	Verdict as the initiative stands today.
`projected_decision_confidence`	Yes	Confidence in the verdict if the recommendations land, 0-100.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnly, idempotent, non-destructive; description confirms 'Pure deterministic calculation — no network, auth, or side effects', adding context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is dense but well-structured with front-loaded purpose and usage; could be slightly more concise but every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (8 params, nested objects, output schema), description covers all aspects: inputs, behavior, edge cases, decision logic; fully sufficient for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions; description adds extra guidance on when to omit parameters and how estimates work, significantly enhancing understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Turn a Fix or Stop verdict into the change plan...' and distinguishes from sibling tools like score_initiative by specifying when to call.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'CALL THIS after score_initiative returns Fix or Stop' and provides exceptions for Accelerate; gives clear when-to-use and when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_initiativeA

Read-onlyIdempotent

Inspect

Canonical-field scorer for one AI initiative. CALL THIS when industry, revenue_eur, function, ai_tier and readiness are already known, or when re-scoring with measured pillar evidence. For a proposal written in ordinary business language, call assess_ai_initiative first; it resolves these fields and asks for anything missing. Pillar scores remain optional: missing pillars are estimated deterministically, reported through pillar_basis, and reduce decision confidence, while a fully estimated pass can never return Accelerate. Returns Accelerate, Fix or Stop, modelled gross and net EUR ranges, decision confidence, sensitivity, assumptions and an audit trail. Use score_portfolio for several initiatives and diagnose_process for measured waste in an existing process. Pure deterministic calculation, no network, auth or side effects.

ParametersJSON Schema

Name	Required	Description
`scores`	No	OPTIONAL, and each pillar inside it is optional. The four AI BVF pillars, each an honest 0–100 self-assessment, combining deterministically into the verdict: governance_risk ≥ 70 OR financial_return ≤ 20 returns Stop; strategic_alignment, financial_return and change_enablement all ≥ 60 with governance_risk ≤ 40 returns Accelerate; everything else returns Fix. Pass ONLY the pillars the user has real evidence for — do NOT invent numbers for the rest. Missing pillars are estimated deterministically by the engine (from readiness, tier, function and published benchmarks), the response reports which via pillar_basis and scores_used, decision confidence is haircut by how much was estimated, and a fully-estimated pass can never return Accelerate (it returns Fix pending confirmation). So call immediately with whatever the user gave you, then ask for evidence on the estimated pillars and re-call to firm the verdict up.
`ai_tier`	Yes	Ambition of the AI being deployed: gen1 = automation/RPA, gen2 = GenAI, gen3 = agentic. Interacts with readiness — a more ambitious tier running on lower readiness widens the pace-layer gap, which discounts the modelled EUR value even when the four pillar scores are strong.
`function`	Yes	Business function where the AI will operate, as one of the accepted enum values — selects which benchmark value drivers and rate ranges apply. Call list_taxonomy for the exact strings if unsure.
`industry`	Yes	Your industry, as one of the accepted enum values — used to select the benchmark rate multiplier applied to the modelled EUR value. Call list_taxonomy for the exact strings if unsure.
`readiness`	Yes	Organisational readiness, honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. Sets the value-capture rate and, paired with ai_tier, the pace-layer drag — lower readiness against a higher tier reduces the captured value. Self-report is gameable: when the user has real process numbers, call infer_readiness first and pass its measured classification here instead.
`revenue_eur`	Yes	Approximate annual revenue in EUR (must be ≥ 0). Scales the whole output: the benchmark rates are applied as fractions of this figure, so the modelled EUR value range grows with it. A rough order-of-magnitude estimate is fine.
`signal_completeness`	No	Optional 0–1. How grounded the four pillar scores are in real evidence versus estimated from context. Defaults to 1 (treated as measured). If the organisation lacks formal change-readiness or risk metadata, estimate the pillars from what you know AND set this lower to say so — decision confidence is reduced proportionally and a caveat is attached, instead of returning a falsely confident verdict on soft inputs.

Output Schema

ParametersJSON Schema

Name	Required	Description
`audit`	No	Reproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
`caveat`	No	Present only when signal_completeness was low: warns the verdict rests on soft inputs and confidence was reduced.
`reason`	Yes	One-line justification for the classification.
`drivers`	Yes	Named value drivers behind the estimate.
`feedback`	No	Optional one-question feedback route, present only for Fix/Stop verdicts. The link opens a prefilled email; no response is recorded unless the user chooses to send it.
`bvf_version`	Yes	AI BVF protocol version used.
`multipliers`	Yes	Factors applied to the base rates.
`scores_used`	No	The four pillar values the verdict was actually computed on, whether given by the caller or estimated by the engine. Show these to the user when any pillar was estimated.
`sensitivity`	No	What moves this verdict, computed deterministically: the value if readiness were one notch worse, the value at revenue minus 20 percent, and the nearest single-pillar movements that flip the classification. Boards trust ranges with visible assumptions over point estimates; show this.
`pillar_basis`	No	Per pillar: "given" (caller supplied it) or "estimated" (deterministic prior). When any pillar is estimated, tell the user which, and ask for evidence on those to firm up the verdict.
`net_value_eur`	Yes	Modelled net value in EUR after capture rate, low/high.
`classification`	Yes	The verdict for this initiative.
`applied_modules`	Yes	BVF scoring modules that fired for this input.
`gross_value_eur`	Yes	Modelled gross value in EUR before capture, low/high.
`benchmark_source`	Yes	Citation for the benchmark rates applied.
`advisory_next_step`	No	Optional CTA, present only for Fix/Stop verdicts.
`decision_confidence`	Yes	Confidence in the verdict, 0-100.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) already indicate safe read-only behavior. The description reinforces this with 'Pure deterministic calculation — no network, auth, or side effects, so calling it is always safe and free.' No contradiction; the description adds context about the deterministic algorithm and estimation logic.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose (multiple paragraphs) but packed with essential guidance. While every sentence adds value, the length could be reduced for an AI agent without losing meaning. It is front-loaded with the purpose and proactive call instruction, but some details (e.g., pillar estimation logic) could be summarized more concisely.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested objects, multiple enums, estimation logic), the description covers all necessary aspects: what the tool does, when to use it, how to handle missing parameters, how to interpret results, and relationships with sibling tools. The output schema likely handles return details, so no gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds significant context beyond the schema: it explains when to omit optional pillar scores, how missing pillars are estimated, and the interaction between ai_tier and readiness. However, the schema already provides good parameter docs, so the added value is substantial but not exceptional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scores a single AI initiative using AI BVF v1.0, returning a classification, EUR value range, and reasoning. It distinguishes from siblings like score_portfolio (for portfolios) and diagnose_process (for diagnosing processes), making the purpose specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly instructs 'CALL THIS PROACTIVELY' and lists scenarios (any AI initiative question). It tells when to use alternatives (score_portfolio for portfolios, diagnose_process for existing processes) and advises calling list_taxonomy for enum clarity. The agent is guided clearly on when and how to invoke the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_portfolioA

Read-onlyIdempotent

Inspect

Score several AI initiatives as one AI BVF v1.0 portfolio and return the board-level position: counts of Accelerate / Fix / Stop, aggregate modelled EUR value range, mean decision confidence, the highest-value initiative, the highest-risk initiative, and every individual result. CALL THIS when the user has a portfolio document and needs to know what it contains before deciding funding or order, instead of looping score_initiative one initiative at a time. The single readiness value applies across every initiative: it changes capture rates and the pace-layer drag, so measure it with infer_readiness first when process data exists. The portfolio must carry organization.revenue_eur for EUR values; initiatives with missing revenue or invalid taxonomy are reported as skipped, never silently counted. Run validate_portfolio first only when the document shape is uncertain, then call sequence_portfolio when the verdicts need turning into a 90-day order. Pure deterministic calculation — no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description	Default
`portfolio`	Yes	A portfolio document conforming to the AI BVF v1.0 schema: bvf_version, organization (name, industry, optional revenue_eur), and a non-empty initiatives array. Each initiative carries id, name, function, ai_tier, and a scores object whose four pillars are each either a bare number (0–100) or an object { value: 0–100 }; both shapes are accepted everywhere. Every initiative is run through the same rule as score_initiative — governance_risk ≥ 70 OR financial_return ≤ 20 → Stop; all of strategic_alignment/financial_return/change_enablement ≥ 60 with governance_risk ≤ 40 → Accelerate; else Fix — and the verdicts are aggregated into portfolio counts. organization.revenue_eur is required to model EUR value; initiatives that cannot be scored (missing revenue, unknown function/ai_tier) appear in skipped_initiatives rather than scored_initiatives. Validate first with validate_portfolio if the document may be malformed. Schema: https://www.aibvf.com/protocol.
`readiness`	Yes	Organisational readiness applied to every initiative in the portfolio. Honest self-assessment: agile = cross-functional, fast decisions; traditional = functional hierarchy; siloed = rigid, hand-off heavy. The portfolio schema does not carry per-initiative readiness; this single value sets the capture rate for the whole portfolio and, paired with the ai_tier of each initiative, its pace-layer drag — lower readiness against a higher tier discounts the modelled EUR value.

Output Schema

ParametersJSON Schema

Name	Required	Description
`total`	Yes	Total initiatives in the portfolio (scored + skipped).
`valid`	Yes	True when the portfolio passed schema validation. False means no initiatives were scored.
`summary`	Yes
`feedback`	No	Optional one-question feedback route, present only when any initiative was Fix or Stop. The link opens a prefilled email; no response is recorded unless the user chooses to send it.
`readiness`	Yes	Readiness value applied across all initiatives.
`bvf_version`	Yes	AI BVF protocol version used.
`organization`	Yes	Echo of the portfolio organisation fields applied to scoring.
`validation_errors`	No	Empty when valid; otherwise one entry per schema violation.
`advisory_next_step`	No	Optional CTA, present only when any initiative was Fix or Stop.
`scored_initiatives`	Yes	Per-initiative scoring result.
`skipped_initiatives`	Yes	Initiatives that could not be scored, with the reason. Empty when all initiatives scored.
`aggregate_net_value_eur`	Yes	Sum of net EUR value across scored initiatives, low/high.
`highest_risk_initiative`	No	Scored initiative most at risk: worst classification (Stop > Fix > Accelerate), tie-broken by lowest decision_confidence. Omitted when none were scored.
`top_initiative_by_value`	No	Scored initiative with the highest mid-point net EUR value. Omitted when none were scored.
`mean_decision_confidence`	Yes	Mean decision confidence across scored initiatives (0–100); 0 when none were scored.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses 'Pure deterministic calculation — no network, auth, or side effects' which aligns with annotations (readOnlyHint, idempotentHint). It adds details about skipped initiatives, the effect of readiness on capture rates and pace-layer drag, and that revenue is required for EUR values—beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is comprehensive and front-loaded with key outputs, but somewhat lengthy with nested information. Every sentence adds value, but could be slightly more concise without losing clarity. Well-structured overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-initiative scoring, readiness model, skip rules, aggregated outputs), the description covers all necessary context: prerequisites (infer_readiness, validate_portfolio), dependencies (revenue_eur), output content, deterministic nature, and relationship to sibling tools (score_initiative, sequence_portfolio). No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema coverage, the description provides detailed semantics beyond the schema: for portfolio, it explains the required shape, scoring logic (Accelerate/Fix/Stop rules), and handling of missing data. For readiness, it explains the practical meaning of each enum value and its effect on modelled EUR values via capture rate and pace-layer drag.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it scores AI initiatives as a portfolio and returns board-level position with specific counts and values. It distinguishes from sibling tools by explicitly mentioning 'instead of looping score_initiative one initiative at a time' and referencing validate_portfolio and sequence_portfolio for related tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit when-to-use guidance: 'CALL THIS when the user has a portfolio document and needs to know what it contains before deciding funding or order'. It also tells when not to loop score_initiative, and suggests infer_readiness first, validate_portfolio if document shape is uncertain, and sequence_portfolio after for ordering.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sequence_portfolioA

Read-onlyIdempotent

Inspect

Turn a scored AI portfolio into three waves with gates over a configurable horizon, so the roadmap respects the change capacity of each business function. CALL THIS after score_portfolio when the user asks what to stop, fund first, defer or fit into the next 90 days. It does not change any verdict or re-score the business case. Stops enter wave 1 to reclaim budget and attention, quicker Accelerates enter wave 2, complex Accelerates and Fixes enter wave 3 behind their re-score gates. Pass the portfolio returned by score_portfolio directly through portfolio, or pass organization plus initiatives; both score shapes are accepted and nested values are flattened. readiness sets capture rates and pacing, max_parallel_per_function caps simultaneous change in one function per wave, and horizon_days divides the plan into three equal windows. Capacity overflow is reported as a conflict or a deferral beyond the horizon, never hidden. Run recommend_improvements for a Fix before treating its wave placement as permission to proceed. Pure deterministic calculation, no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description
`portfolio`	No	Alternative input: the same AI BVF v1.0 portfolio document score_portfolio accepts (organization + initiatives with nested {value} pillar scores). Pass either this OR the top-level organization + initiatives; nested score values are flattened automatically, and missing pillars are estimated honestly.
`readiness`	Yes	Organisational readiness applied across the portfolio; sets capture rates and pacing. Measure it with infer_readiness when process numbers exist.
`constraints`	No	Change-capacity constraints. The defaults encode the core principle: no function absorbs unlimited concurrent change.
`initiatives`	No	The portfolio to sequence. Each initiative carries flat 0-100 pillar numbers (not the nested value objects of the portfolio wire format).
`organization`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`audit`	Yes	Reproducibility record: engine version, the rules that fired, and the resolved inputs. Deterministic, no timestamps. If the verdict is challenged months later, the same inputs on the same engine version reproduce it exactly.
`waves`	Yes	Three waves with named gates: Stops first (free the budget), quick Accelerates second (buy trust), complex Accelerates plus Fixes third (spend the trust). Present this to the user as the rollout plan.
`totals`	Yes	Counts: stopped, quick_wins, complex_or_fix, deferred.
`skipped`	No
`bvf_version`	Yes
`capacity_conflicts`	Yes	Where more initiatives land on one function than it can absorb per wave, with the deferral applied. Surface these: an overloaded function is how good portfolios fail.
`sequencing_principles`	Yes
`deferred_beyond_horizon`	No	Initiatives that did not fit the horizon under the capacity constraint; they need their own decision.
`aggregate_accelerate_value_eur`	No	Sum of modelled net EUR for the sequenced Accelerates, low and high.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, but description adds significant context: 'Pure deterministic calculation, no network, auth, or side effects.' Also details wave assignment logic and capacity overflow handling, which enriches understanding beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is comprehensive but not overly verbose; each sentence adds useful information. It is front-loaded with main purpose and usage, and structured logically. Slightly longer than minimal but justified by complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (5 params, nested objects, output schema exists), the description covers parameters, behavior, and usage thoroughly. Output schema exists so return values need not be detailed. Minor gap: does not explicitly mention that output is deterministic and pure, but that's covered in transparency. Overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80%, high, so baseline is 3. The description adds value by explaining alternative input (portfolio vs. organization+initiatives), flattening of nested values, and the effect of readiness and constraints on behavior. It also clarifies defaults and overflow reporting, providing rationale beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sequences a scored AI portfolio into three waves with gates, using specific verbs and resources. It distinguishes from sibling tools like score_portfolio and recommend_improvements, and gives exact usage context ('CALL THIS after score_portfolio when the user asks what to stop...').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (after score_portfolio, when user asks about stop/fund/defer) and when not to use ('It does not change any verdict or re-score the business case'). Provides alternatives like recommend_improvements for fixes, and explains prerequisites (pass portfolio or organization+initiatives).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_portfolioA

Read-onlyIdempotent

Inspect

Check whether a supplied AI BVF v1.0 portfolio document has the shape the portfolio tools require, before scoring, sequencing, storing or sharing it. CALL THIS when the document came from a file, another system or hand-built JSON and its structure is uncertain. It checks required fields, taxonomy values and 0–100 pillar ranges only; it does not judge the evidence or calculate a verdict. Pillars may be bare numbers or { value, confidence } objects, both are valid. Use assemble_portfolio when the user has a list of initiatives in conversation and needs the document built for them, score_portfolio when the document is already ready for verdicts, and sequence_portfolio only after its initiatives are scoreable. Returns valid=true or one error per failing JSON path. Pure deterministic validation — no network, auth, or side effects.

ParametersJSON Schema

Name	Required	Description	Default
`portfolio`	Yes	The portfolio document as a JSON object following the AI BVF v1.0 schema: a top-level object with bvf_version, organization, and a non-empty "initiatives" array, each initiative carrying the same fields score_initiative expects (industry, revenue_eur, function, ai_tier, readiness, and a scores object with the four 0–100 pillars, each either a bare number or an object { value, confidence? }; both shapes pass). Checked structurally only — required fields present, correct types, enum values valid, pillar numbers in range; the pillar values are NOT scored or judged here (use score_initiative or score_portfolio for that). On failure, errors[] names each failing JSON path and the rule it broke.

Output Schema

ParametersJSON Schema

Name	Required	Description
`valid`	Yes	True when the portfolio conforms to the schema.
`errors`	Yes	Empty when valid; otherwise one entry per schema violation.
`bvf_version`	Yes	AI BVF protocol version validated against.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds that it is 'pure deterministic validation — no network, auth, or side effects.' No contradiction; provides context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with purpose and usage. Contains multiple sentences, but each adds value. Slightly verbose on details that could be inferred, but still efficient for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete coverage: explains validation scope, what is not validated, error format, and relationship to siblings. Output schema exists but description still adds return value details. Handles complexity well.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% but description adds meaning: explains acceptable pillar shapes (bare numbers or {value, confidence} objects), describes validation scope, and clarifies error format. Adds significant value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks whether a portfolio document has the required shape. It uses specific verb 'Check' and resource 'portfolio document', and distinguishes from siblings like assemble_portfolio and score_portfolio by explaining when to use each.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to call: 'when the document came from a file, another system or hand-built JSON and its structure is uncertain.' Also clarifies what it does not do (judge evidence, calculate verdict) and names sibling tools with their purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

aibvf-mcp

Server Details

Tool Definition Quality

Available Tools

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Discussions

Your Connectors

Resources