Sipflow

by dev.sipflow

Server Details

SIP/VoIP/telecom grounding for AI agents: vendor docs, RFCs, STIR/SHAKEN, traces, configs.

Status: Healthy
Last Tested: 2026-05-17 03:32
Transport: Streamable HTTP
URL
Repository: cmendes0101/sipflow-cursor-plugin
GitHub Stars: 0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A4.7/5.0

Tool DescriptionsA

Average 4.7/5 across 22 of 22 tools scored. Lowest: 4.1/5.

Server CoherenceA

Disambiguation5/5

Every tool has a distinct purpose, from SDP comparison to STIR/SHAKEN validation. Even closely related tools like lookup_response_code and troubleshoot_response_code are clearly differentiated by static vs. vendor-specific RAG lookup, minimizing misselection.

Naming Consistency5/5

All tool names follow a consistent verb_noun snake_case pattern (e.g., compare_sdp_offer_answer, detect_sip_stack, validate_stir_shaken_identity). No mixing of conventions or vague verbs, making the set predictable for agents.

Tool Count5/5

With 22 tools, the server covers a broad range of SIP debugging tasks without excess. The count feels well-scoped for its purpose—neither sparse nor overwhelming.

Completeness5/5

The tool surface covers the full lifecycle of SIP troubleshooting: parsing, diffing, linting, DNS, config review, documentation search, STIR/SHAKEN, and feedback. No obvious gaps for its analytical domain.

Available Tools

22 tools

compare_sdp_offer_answerCompare an SDP offer/answer pair (RFC 3264)A

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Diff a SIP/SDP offer and answer and surface the issues that actually break calls in practice: codec intersection per m-line, direction compatibility (sendrecv ↔ recvonly), DTLS setup-role conflicts (active+active / passive+passive), rtcp-mux / BUNDLE asymmetry, missing DTLS fingerprints when DTLS-SRTP is negotiated, ICE asymmetry, and fax reinvite mismatches (e.g. offer m=image udptl t38 answered with audio-only, or T38FaxVersion / T38FaxMaxBuffer / T38FaxRateManagement drift).

Use when the user has both halves of a negotiation and is debugging 488 Not Acceptable Here, no-audio, one-way-audio, or a failed T.38 reinvite (488 / 415 / 606 on a m=image offer).

Pair with: parse_sdp to inspect either side in isolation; search_sip_docs(vendor=...) to ground vendor-specific fixes (FreeSWITCH mod_spandsp, Cisco CUBE fax protocol t38); lookup_response_code(488) for the static SIP-side context.

ParametersJSON Schema

Name	Required	Description	Default
`offer`	Yes	SDP offer body.
`answer`	Yes	SDP answer body.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses read-only, pure CPU operation, aligning with annotations. It details the specific checks performed, providing rich behavioral context beyond the annotations alone.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, well-structured description with key information front-loaded. Every sentence adds value—first paragraph details functionality, second paragraph provides usage context and sibling tool suggestions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description covers what the tool returns (list of issues) and the types of checks performed. Given the tool's complexity and the richness of annotations/schema, it is complete enough for the agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with simple descriptions ('SDP offer body', 'SDP answer body'). The description adds no additional parameter-specific meaning beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly articulates what the tool does: compare SDP offer/answer and surface call-breaking issues. Includes specific checks (codec intersection, direction compatibility, DTLS conflicts) and distinguishes itself from sibling tools like parse_sdp and search_sip_docs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (debugging 488 Not Acceptable Here, no-audio, one-way-audio) and provides pairings with other tools, offering clear guidance on when to use versus alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

detect_sip_stackDetect SIP stack / vendor from a trace OR a configA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Identify the SIP product behind a piece of input. Works on both:

a SIP trace (User-Agent / Server headers from PCAP/sngrep/syslog), and
a vendor config blob (kamailio.cfg, sip.conf, pjsip.conf, FreeSWITCH XML, opensips.cfg) detected via structural signatures (loadmodule, route blocks, [transport-*] sections, <profile name=>, etc.).

Returns a vendor slug (e.g. "kamailio", "freeswitch", "asterisk", "twilio", "cisco-cube") aligned with the vendor filter on search_sip_docs, so you can pipe the output of this tool directly into a follow-up doc search.

Pair with: search_sip_docs(vendor=<slug>, ...) for grounded vendor docs; review_sip_config when the input is a config and you also want extracted modules + risk flags; troubleshoot_response_code(vendorHint=<slug>, ...) when chasing a status code.

ParametersJSON Schema

Name	Required	Description	Default
`kind`	No	What the input is. "trace" looks at SIP headers only, "config" runs vendor-config heuristics, "auto" tries trace first then falls back to config detection.	auto
`text`	Yes	Raw SIP trace text OR a vendor config blob.
`filenameHint`	No	Optional filename (e.g. "kamailio.cfg", "pjsip.conf"). Strongly improves config-mode detection when supplied.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by stating it is 'cost: free (pure CPU, no network) | read-only', explaining the read-only nature and cost. It also details the two detection modes and the auto fallback behavior, providing full behavioral transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: a cost line, a clear purpose sentence, bullet points for input types, return value explanation, and pairing suggestions. Every sentence is informative and non-redundant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains what the tool returns (a vendor slug) and how it can be used with other tools. It covers the key aspects, though it does not explicitly mention behavior for unknown inputs or edge cases, but is complete for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

While all parameters have schema descriptions (100% coverage), the description adds meaning by explaining how vendor slugs align with search_sip_docs and that filenameHint improves config-mode detection. It adds context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Identify the SIP product behind a piece of input.' It specifies two distinct input types (SIP trace and vendor config blob) and lists example vendor slugs, making the tool's function unambiguous and differentiating it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides pairing suggestions: search_sip_docs, review_sip_config, and troubleshoot_response_code, clearly indicating when to use this tool versus alternatives. This gives strong guidance on tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

detect_sip_vendor_from_configDetect SIP vendor from a config fileA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Heuristic-only sibling of detect_sip_stack, scoped to vendor configs. Returns the matched vendor slug, a confidence level, and the structural signals that fired (loadmodule syntax, route blocks, profile elements, etc.).

Use this when the user asks 'what is this config?' or attaches a SIP config file. Detect-only - does not extract directives or flag risks.

Pair with: review_sip_config for the structured outline + risk flags; search_sip_docs(vendor=<slug>, ...) to ground each directive.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Config blob (UTF-8 plaintext).
`filenameHint`	No	Optional filename ("kamailio.cfg", "pjsip.conf", "sofia/external.xml") to bias detection.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, and the description adds value by noting 'pure CPU, no network' and confirming 'read-only' cost. It also explains the heuristic nature and outputs, surpassing what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact yet packed with essential information: cost, sibling relationship, use cases, outputs, and tool pairings. Every sentence serves a purpose, and key details are front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description fully explains what the tool returns (vendor slug, confidence, signals) and what it does not do. It also provides context for further actions, making it self-contained for a simple detection tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters. The description hints at the filenameHint's purpose (bias detection) but adds no new semantic detail beyond the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's specific verb ('detect') and resource ('SIP vendor from a config file'). It explicitly differentiates itself from its sibling 'detect_sip_stack' by being 'heuristic-only' and 'scoped to vendor configs', providing a distinct purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit use cases: 'when the user asks what is this config? or attaches a SIP config file.' It also clarifies when not to use it (detect-only, does not extract directives or flag risks) and suggests pairing with 'review_sip_config' and 'search_sip_docs' for deeper analysis.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diff_sip_messagesStructurally diff two SIP messagesA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only, no persistence]

Take two SIP messages (typically the same request observed at two adjacent hops - e.g. the INVITE leaving FreeSWITCH and the INVITE arriving at Kamailio) and surface a structured per-header diff: added, removed, mutated (with old/new value), duplicated (single header → many), de-duplicated, whitespace-only-change, parameter-reorder (Via params, From tag), and body-changed. SDP bodies on both sides are delegated to compareSdp for codec / DTLS / ICE diffs.

Use FIRST when the user has two captures or two log lines that should be carrying the same message and wants to know what an intermediate proxy / SBC / B2BUA changed. Far more reliable than visual inspection.

Pair with: parse_sip_message to inspect either side in isolation; lint_sip_request if the diff reveals the downstream side became malformed; search_sip_docs(vendor=<intermediate>) once you know which hop's behavior is the source of the change.

ParametersJSON Schema

Name	Required	Description	Default
`after`	Yes	SIP message as observed at the downstream hop (e.g. what Kamailio believes it received).
`before`	Yes	SIP message as observed at the upstream hop (e.g. what FreeSWITCH believes it sent).
`labelAfter`	No	Display label for the downstream side. Default "after".	after
`labelBefore`	No	Display label for the upstream side. Default "before".	before

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond readOnlyHint annotation, the description adds 'cost: free (pure CPU, no network) | read-only, no persistence' and enumerates all diff categories (added, removed, mutated, etc.), including delegated SDP comparison via compareSdp. This fully discloses behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact (5 lines) with front-loaded cost/behavior note, then a concise explanation of what the tool does and when to use it. Every sentence adds value, no repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, input (two SIP messages), output (structured per-header diff), edge cases (SDP delegation), and integration with sibling tools. Despite no output schema, the explanation of diff types is sufficient for understanding return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptive parameter definitions (e.g., 'SIP message as observed at the *downstream* hop'). The description adds usage context (upstream/downstream) but no additional parameter semantics beyond the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool diffs two SIP messages, listing specific diff types (added, removed, mutated, etc.) and the use case (comparing same request at adjacent hops). It distinguishes itself from sibling tools like parse_sip_message and lint_sip_request by mentioning when to use them in conjunction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: 'Use FIRST when the user has two captures...' and suggests pairing with other tools (parse_sip_message, lint_sip_request, search_sip_docs) for specific follow-ups. This provides clear when-to-use and when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dns_diagnose_sip_targetRFC 3263 NAPTR/SRV/A walk + sips TLS cert diagnostic for a SIP targetA

Read-only

Inspect

[cost: external_io (DNS via Cloudflare + Google; TLS handshake to public sips/_sips._tcp targets when applicable) | read-only | rate-limited per IP: 10/min, 200/day]

Walk DNS the same way a SIP UA does (RFC 3263 §4.1): NAPTR → SRV → A/AAAA. Given a SIP URI ("sip:example.com"), bare hostname ("example.com"), or "host:port" string, return the records that exist and the resolution ladder a UA would try.

When the queried target uses TLS (sips: URI, transport=tls/wss, or any _sips._tcp SRV record), the tool also performs a TLS handshake against each resolved sips target and reports the negotiated TLS version + cipher, the leaf certificate's subject / issuer / SANs / validity, the chain length and whether it validates against Node's default trust store, plus two cert-domain checks: RFC 5922 §7.2 strict (cert must cover the original SIP domain) and a lenient SAN match against the SRV target hostname.

Egress safety:

Per-IP rate limited.
Hostnames that resolve only to RFC 1918 / loopback / link-local / documentation / multicast space are refused (SSRF guard).
Walk depth capped to prevent runaway NAPTR / CNAME chains.
TLS probes capped at 4 (host, port) tuples per call, 5 s handshake timeout each, public-IP only (we connect to the resolved IP, not the hostname, so the system resolver cannot redirect us into private space).

Use to diagnose:

"carrier doesn't answer" / "wrong port" / "TLS instead of UDP" routing puzzles
"carrier rejects our target because no SRV is published" - when A/AAAA resolves but SRV is missing the tool synthesises a copy-pasteable suggested zone-record block pointing at the resolved canonical hostname
"TLS handshake works but cert isn't valid for the SIP domain" - RFC 5922 §7.2 compliance is checked separately from generic chain validation, since the SAN must cover the original SIP domain (not the SRV-redirected target)

ACL caveat: this tool checks DNS + TLS only. Most carriers (Twilio, Telnyx, Bandwidth, …) authorize inbound SIP by source IP whitelist on the trunk (see https://www.twilio.com/docs/sip-trunking/api/ipaccesscontrollist-resource). Even if DNS resolves cleanly and the TLS cert is valid, INVITEs from any IP not on your trunk's IP ACL will be silently dropped or rejected. Verify reachability from the SBC itself.

Pair with: troubleshoot_response_code when 503 / 408 / 480 are involved; search_sip_docs(vendor=...) for carrier-specific routing docs.

ParametersJSON Schema

Name	Required	Description	Default
`target`	Yes	SIP URI ("sip:example.com"), "example.com:5060", or bare hostname ("example.com"). Userinfo is stripped before lookup.
`transport`	No	Transport hint. "any" surfaces all NAPTR services; specific transports filter the SRV walk to that service.	any

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses cost (external_io), read-only nature, rate limits (10/min, 200/day), egress safety (SSRF guard, depth cap, TLS probe caps). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose and cost, but lengthy. Uses formatting effectively. Could be more concise but appropriate for complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers resolution process, return values, TLS details, edge cases (SSRF, rate limits), IP ACL caveat, and sibling tools. Complete for a complex diagnostic tool without output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds meaning: explains target interpretation (SIP URI, host:port, bare host) and transport hint behavior. Adds value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool walks DNS (NAPTR → SRV → A/AAAA) like a SIP UA and performs TLS handshakes for sips targets. It distinguishes from siblings like troubleshoot_response_code and search_sip_docs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly describes when to use (diagnose carrier routing, TLS cert issues) and provides caveats (IP ACLs). Suggests pairing with other tools and provides alternative scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fetch_sipflow_shareHydrate a Sipflow share link into the conversationA

Read-only

Inspect

[cost: external_io (Mongo + S3 fetch on the Sipflow backend) | read-only, no persistence | rate limit: shared with the public share endpoint]

Given a Sipflow share URL (https://sipflow.dev/share/, or any sipflow.dev subdomain that serves /share/), load the shared SIP trace AND any prior AI analysis attached to it in a single round trip. Use this whenever a user pastes a /share/<token> URL: the tool fetches the redacted trace text, the AI executive summary / root-cause / remediation steps (if present), and metadata (vendor, filename, source format, pseudonymized flag), so the agent can review the trace alongside the user's own configs without manual download + paste.

In addition to the AI output, the response includes rule-based diagnostics: detected issues (severity-tagged SIP/SDP/media problems with RFC references), WebRTC signal checklist scores, multi-leg call correlation (Session-ID grouping), and detected SIP stacks (User-Agent/Server header values). These diagnostics are computed at share-creation time; for older shares without persisted diagnostics, the tool parses the trace on the fly.

Privacy: the share endpoint deliberately strips the original problem and architecture fields the sharer typed in (those may contain customer-internal context). This tool returns the same public projection - only the trace, the AI output, diagnostics, and basic metadata. Traces are pseudonymized by default (phone numbers / IPs / Call-IDs replaced with consistent fakes); the pseudonymized field tells you whether the sharer opted to keep raw values.

Trace bytes are capped at 200kB (matching the budget the Sipflow AI worker uses). For very large captures the response sets trace.truncated=true - pair with minimize_sip_trace to compact further before passing to your own LLM, or with render_sip_ladder to visualize the call flow.

Pair with: review_sip_config to compare the shared trace against the user's own kamailio.cfg / pjsip.conf / FreeSWITCH XML; render_sip_ladder to draw the shared call flow inline; minimize_sip_trace if trace.truncated is true; troubleshoot_response_code for any failing transactions surfaced in the AI analysis.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Full Sipflow share URL. Example: "https://sipflow.dev/share/eyJqb2JJZCI6Ii4uLiJ9.abc123". The path must be /share/<token>; /api/share/... endpoints are not accepted.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true. The description adds cost details, rate limiting, privacy behavior (pseudonymization, field stripping), truncation handling, and that diagnostics are computed at share-creation time or on the fly. This goes well beyond the annotations, though the openWorldHint is not fully explained.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized, front-loading cost and read-only nature, then purpose, then detailed behavior. Every sentence is informative, though it could be slightly shortened without losing clarity. It remains efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool without an output schema, the description covers return contents (trace, AI output, diagnostics, metadata, privacy info) and pairing suggestions. Missing explicit error handling (e.g., for invalid URLs), but otherwise comprehensive enough for most use cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'url' has schema description but the tool description adds valuable constraints: example format, required path structure (/share/<token>), and explicit rejection of /api/share/ endpoints. This significantly helps the agent provide a correct input.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to load a Sipflow share URL and retrieve both the SIP trace and any AI analysis. It distinguishes itself from sibling tools by mentioning specific pairing with tools like minimize_sip_trace and render_sip_ladder, and by detailing what it returns that others don't.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this whenever a user pastes a /share/<token> URL', providing clear when-to-use. It also explains what is not included (privacy-stripped fields). However, it does not explicitly state when not to use or provide direct alternatives, leaving minor ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lint_sip_requestLint a raw SIP request for RFC complianceA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only, no persistence]

Run RFC 3261 / RFC 3325 / RFC 8224 / RFC 8225 / CTIA BCID compliance checks on a single raw SIP request (typically an INVITE) and return a list of findings.

Catches the failure modes that silently break carrier interop:

Two From: headers in one request (RFC 3261 §7.3 / §20.20).
Missing CRLF between consecutive header lines (RFC 3261 §7.3).
;tag= (or any other) parameter on P-Asserted-Identity / P-Preferred-Identity (RFC 3325 §9.1).
PASSporT orig.tn not matching the From caller TN (RFC 8224 §5).
PASSporT dest.tn not matching the To callee TN (RFC 8224 §5).
Non-canonical TN inside a PASSporT claim (RFC 8225 §5.2.1).
Branded display name in From with no ppt=rcd Identity header (CTIA BCID §5).

Use FIRST when chasing 422 / 400 Bad Request / 484 Invalid FROM on a single INVITE - these usually have a structural cause this tool catches mechanically.

Pair with: parse_sip_message for purely structural checks on any SIP message (responses included); validate_stir_shaken_identity for the cryptographic verdict on Identity headers; search_sip_docs({ sourceType: 'stir-shaken', ... }) to ground the explanation in RFC text.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Raw SIP request text. Should start with the request line (e.g. `INVITE sip:...@... SIP/2.0`). Headers must be CRLF or LF separated.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=false; the description adds operational context: 'cost: free (pure CPU, no network) | read-only, no persistence'. It also enumerates specific behavioral traits like the types of checks performed (e.g., two From headers, missing CRLF, tag parameters on PAIPPI, PASSporT mismatches). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: a cost/safety header, then purpose, then enumerated failure modes, then usage guidance, then pairing suggestions. Every sentence is informative, with no filler. Concise yet comprehensive.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (SIP RFC compliance linting) and the absence of an output schema, the description thoroughly covers purpose, input requirements, behavioral constraints, and integration with sibling tools. An agent can confidently understand when to invoke this tool and what results to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter 'text' with 100% coverage, including a description of format (starts with request line, CRLF/LF separated). The description reinforces that it expects a single raw SIP request (typically INVITE) but does not add significant new detail beyond the schema. Slight enhancement by specifying typical use case.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Run RFC 3261 / RFC 3325 / RFC 8224 / RFC 8225 / CTIA BCID compliance checks on a single raw SIP request (typically an INVITE) and return a list of findings.' It specifies the exact RFCs and compliance domains, lists specific failure modes caught, and distinguishes from sibling tools like parse_sip_message (structural only) and validate_stir_shaken_identity (cryptographic).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises 'Use FIRST when chasing 422 / 400 Bad Request / 484 Invalid FROM on a single INVITE' and provides a 'Pair with:' section listing three alternative tools with clear differentiation (e.g., parse_sip_message for structural checks on any SIP message, validate_stir_shaken_identity for cryptographic verdict). This gives both when-to-use and when-not.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lookup_response_codeLook up a SIP response code (instant, RFC-cited)A

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Instant static lookup of a SIP response code (100-699). Returns name, RFC anchor, category, description, common operator-flavored causes, and known vendor-specific reason-phrase variants (e.g. OpenSIPS emits 484 'Invalid FROM' on From-header parse failure).

USE FIRST when the user pastes or asks about any 3-digit SIP code - sub-millisecond, no API cost.

Pair with: troubleshoot_response_code for vendor-specific RAG hits beyond the static entry; lint_sip_request when the code is 4xx and the user has the offending request; stir_attestation_explainer for STIR-shaped codes (428/436/437/438/608); validate_stir_shaken_identity when the code is 438 and they have the JWS.

ParametersJSON Schema

Name	Required	Description	Default
`code`	Yes	SIP response code (e.g. 488 for Not Acceptable Here).

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=true), description reveals it's free, pure CPU, no network, sub-millisecond. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise, front-loaded sections: cost/tag, return details, usage guidance. No fluff, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description lists full return content. Combined with usage guidance and context signals, the tool is fully specified for its simple lookup purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already fully describes the single parameter (integer 100-699, example). Description adds no new semantic meaning beyond restating the range.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly specifies static lookup of SIP response codes (100-699). Lists return fields (name, RFC anchor, category, description, causes, vendor variants). Distinguishes from siblings by positioning itself as the first, instant lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'USE FIRST' for any 3-digit SIP code. Provides specific pairing guidance with other tools (e.g., troubleshoot_response_code, lint_sip_request) based on context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lookup_sip_headerLook up a SIP header (RFC-cited)A

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Instant lookup of a SIP header by canonical or compact form (e.g. "Via" / "v", "Diversion", "P-Asserted-Identity", "Identity", "Session-Expires"). Returns canonical form, compact alias, RFC anchor, where it appears (request / response / both), cardinality (exactly-one / at-most-one / one-or-more / any), allowed/forbidden URI parameters with RFC citations, short description, and related headers.

USE FIRST when the user asks about a specific header they saw in a trace - sub-millisecond, no API cost. The cardinality + paramRules fields surface failure modes (e.g. two From: headers, ;tag= on P-Asserted-Identity) without needing a RAG round-trip.

Pair with: lint_sip_request to mechanically check a real request against these rules; search_sip_docs for vendor-specific or 3GPP P-headers not in the bundled registry.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	Header name (canonical, e.g. "Via", or compact, e.g. "v"). Case-insensitive.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds [cost: free (pure CPU, no network) | read-only] and details the return fields (canonical form, compact alias, RFC anchor, etc.), providing additional value beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: it opens with cost/read-only info, then explains what the tool does, provides usage guidance, and lists paired tools. Every sentence adds value, and information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single parameter, no output schema), the description fully covers behavior: what it returns, when to use it, how it relates to siblings. No gaps remain for an AI agent to understand selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a description for the `name` parameter. The description adds context by noting the parameter is case-insensitive and giving examples ("Via" / "v"), extending the schema's meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool looks up a SIP header by canonical or compact form, listing examples like "Via" / "v". It distinguishes itself from siblings by advising to use it first for specific headers and pairing with other tools for further analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'USE FIRST when the user asks about a specific header... sub-millisecond, no API cost.' It also suggests pairing with `lint_sip_request` and `search_sip_docs`, providing clear context. It doesn't explicitly state when not to use it, but the pairings imply exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

minimize_sip_traceMinimize a SIP traceA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only, no persistence]

Reduce a raw SIP trace to a compact form suitable for sending to an LLM. Preserves SDP bodies and routing/auth/dialog headers; prunes well-known noise (User-Agent, Server, Allow, Accept-, Date, P- informational, etc.).

Expected input format: raw SIP messages separated by blank lines, each starting with a request line (INVITE sip:...@... SIP/2.0) or status line (SIP/2.0 200 OK). PCAP-decoded text from sngrep / ngrep / tcpdump / tshark, syslog with SIP body, sipflow's own export format, or a hand-pasted INVITE/200 dialog all work. Annotation lines like # [timestamp] sender -> receiver or ngrep-style U <ip>:<port> -> <ip>:<port> between blocks are tolerated.

Safe to run on production traces - the input is processed in-memory and is not persisted or sent off-server.

Pair with: detect_sip_stack to identify the vendor, then search_sip_docs(vendor=...) for vendor-grounded analysis; render_sip_ladder to visualize the trace as a Mermaid call-flow ladder; lint_sip_request / parse_sip_message to mechanically validate any single message in the trace.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Raw SIP trace text. Multiple messages may be concatenated.
`maxBytes`	No	Truncate the minimized output if it exceeds this many bytes (default 200000, matches Sipflow's analyze pipeline).

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Explicitly declares read-only, no persistence, cost-free, and in-memory processing, going beyond annotations to add behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is thorough but somewhat lengthy; front-loaded with key properties. Each sentence adds value, but could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers input format, preservation/pruning behavior, safety, and pairing with related tools. No output schema, but explanation suffices for a minimization tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description adds value by elaborating on acceptable input formats for 'text' and truncation behavior for 'maxBytes', exceeding baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reduces a raw SIP trace to compact form for LLM, with specific verb 'minimize' and resource 'SIP trace'. It distinguishes from siblings by describing pairing with other tools like detect_sip_stack.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides detailed input format guidance and pairing suggestions, but lacks explicit when-not-to-use instructions relative to alternatives. It states safe for production, which implies usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

parse_sdpParse an SDP body (RFC 8866)A

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Parse a Session Description Protocol body and return a structured view: origin, session, timing, per-media codecs (rtpmap + fmtp), direction, DTLS setup + fingerprint, ICE credentials + candidates, rtcp-mux, BUNDLE groups, fax-relay (m=image udptl t38 plus the a=T38Fax* attribute family), and crypto attributes.

Useful for debugging WebRTC ↔ SIP interop (codec negotiation, DTLS-SRTP fingerprints, ICE candidate gathering, bundle alignment), and for inspecting fax negotiation (T.38 reinvite SDP, T38FaxMaxBuffer/T38FaxUdpEC/T38FaxRateManagement) without an LLM having to re-derive the SDP grammar each call.

Pair with: compare_sdp_offer_answer when the user has both halves of the negotiation (including T.30→T.38 reinvites); webrtc_sip_checklist for the bridge-config angle.

ParametersJSON Schema

Name	Required	Description	Default
`sdp`	Yes	SDP body - the section after the empty line in an INVITE/200/UPDATE.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, and the description reinforces it with 'read-only' and 'pure CPU, no network'. It adds detail on the structured view returned, but does not mention any edge cases or additional behaviors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise and well-structured: cost hint, then main functionality, followed by use cases and pairings. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description fully covers the tool's purpose, input semantics, and relationship to siblings. Even without an output schema, it lists the major fields returned, making it complete for the intended use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the single parameter. The description adds clarity by specifying the SDP body is the part after the empty line in an INVITE/200/UPDATE, which aids correct usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it parses an SDP body and returns a structured view, listing key fields such as origin, session, timing, and per-media codecs. Explicitly distinguishes from sibling tools like compare_sdp_offer_answer and webrtc_sip_checklist.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use (debugging WebRTC ↔ SIP interop) and suggests pairing with compare_sdp_offer_answer for full negotiation context and webrtc_sip_checklist for bridge-config angle.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

parse_sip_messageStructurally parse a single SIP messageA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only, no persistence]

Parse a single raw SIP message (request OR response) and return a structured view: start line (method/status), every header in order with line numbers, body, duplicate-header counts, and a list of structural flags the parser noticed (missing-crlf, tag-on-pai, tag-on-ppi, invalid-folding, duplicate-single-instance, content-length-mismatch).

Use FIRST when the user pastes a single INVITE / 200 / NOTIFY and asks 'what does this look like to a parser?' or 'is this even valid?'. The output makes header-level bugs (two From: headers, ;tag= on PAI/PPI, missing CRLF between headers, broken Identity folding) obvious without an LLM having to scan visually.

Pair with: lint_sip_request for the full RFC compliance suite (request only); diff_sip_messages to compare two parsed messages structurally; validate_stir_shaken_identity if the message carries an Identity header.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Raw SIP message text. Should start with a request line (`INVITE sip:...@... SIP/2.0`) or status line (`SIP/2.0 200 OK`). Headers must be CRLF or LF separated.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavioral traits beyond annotations: it states the cost ('free, pure CPU, no network'), confirms read-only and no persistence, and details the exact structural flags the parser can detect. This aligns with annotations (readOnlyHint: true) and adds valuable context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that front-loads essential info (cost, read-only) and then details outputs and usage. It is relatively concise, though a bit dense; every sentence adds value, but could be slightly more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema, clear annotations), the description is complete: it explains the output format, flags, use cases, and relationships with sibling tools. No gaps for an agent to interpret.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter 'text' with full description covering format requirements (start line, headers). Schema coverage is 100%, so the description does not add new semantic meaning beyond reinforcing the expected input.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: parsing a single raw SIP message into a structured view. It explicitly lists outputs (start line, headers, body, duplicate-header counts, flags) and distinguishes from sibling tools like lint_sip_request and diff_sip_messages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance: 'Use FIRST when the user pastes a single INVITE / 200 / NOTIFY and asks...' and suggests pairing with specific sibling tools for complementary tasks. It also notes cost and read-only nature, aiding agent decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

render_sip_ladderRender a SIP trace as a Mermaid call-flow ladderA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Parse a raw SIP trace (PCAP-decoded text, sngrep export, syslog, or pasted INVITE/200 dialog) and emit a Mermaid sequenceDiagram block visualizing the call flow. Most chat hosts (Claude, ChatGPT, Cursor, GitHub) render Mermaid inline.

Lane labeling: aliases are matched against (in order) ${ip}:${port} from message source/dest, then bare ${ip}, then top-Via host, then Contact host. The most-specific match wins. When no alias matches the renderer falls back to the peer's address rather than emitting unknown:5060.

Pair with: minimize_sip_trace first to compact a noisy trace; diff_sip_messages when two adjacent INVITEs in the ladder differ unexpectedly; lint_sip_request to validate a single message you pulled from the ladder.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	Raw SIP trace text. Multiple messages may be concatenated.
`callId`	No	Render only this Call-ID. Required when the trace contains multiple calls; otherwise the only call is used.
`aliases`	No	Friendly lane labels. Match order: exact "ip:port" → bare "ip" → top-Via host (with or without port) → Contact host (with or without port). Most-specific match wins; otherwise the lane is labeled with its raw "ip:port" (never "unknown"). Example: `{"192.0.2.10:5060":"Alice","203.0.113.50":"Carrier"}`.
`compact`	No	Drop OPTIONS keepalives and retransmissions. Hidden counts are summarized in a `Note over` line.
`maxMessages`	No	Hard cap on rendered arrows. Extra messages produce a truncation note. Hard ceiling is 200.
`includeTiming`	No	Append `+Nms` (delta from previous arrow) to each arrow label.
`groupRetransmits`	No	Collapse adjacent identical retransmissions on the same direction into a single arrow + `Note over: xN over Tms`. Independent of `compact` (which drops them entirely).
`splitOnNewBranch`	No	Emit a `--- failover to <ip:port> ---` separator before any request sent to a previously-unseen destination. Useful when the trace fails over between gateway IPs.
`highlightFailures`	No	Bold the first non-1xx final response per request leg (CSeq) so the failure jumps out in the ladder.
`correlationHeaders`	No	Header names to use for cross-leg call correlation (value-equality). When provided, calls sharing the same value for any listed header are merged into one ladder. Example: `["X-ACME-Session-ID","X-ACME-Call-ID"]`.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the readOnlyHint annotation, the description details lane labeling logic, fallback behavior, and cost (free, pure CPU). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is structured efficiently with a clear opening sentence, structured details (lane labeling), and a pairing section. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 10 parameters and no output schema, the description covers core functionality, key algorithms, and complements with sibling tools. It could be more explicit about error handling or output format details, but overall it's quite complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds valuable context for parameters like aliases (match order) and compact (drops retransmissions), raising the score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Parse a raw SIP trace ... and emit a Mermaid sequenceDiagram block'. It distinguishes itself from sibling tools by explicitly pairing with minimize_sip_trace, diff_sip_messages, and lint_sip_request.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'Pair with' section advises when to use sibling tools instead, and the note about chat hosts rendering Mermaid inline provides usage context. This gives clear guidance on when and with what alternatives to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

review_sip_configGround-truth review of a SIP/VoIP config or repo fileA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Use this when the user asks 'review my config' or attaches a kamailio.cfg, sip.conf, pjsip.conf, FreeSWITCH XML profile, opensips.cfg, res_fax.conf / udptl.conf / spandsp.conf (fax-relay tuning), or a SIP-shaped source file from a repo. This tool:

Detects the vendor from filename + structural signatures (loadmodule, route blocks, [transport-*] sections, <profile name=>, KEMI calls).
Extracts a structured outline: loaded modules, modparams, listen lines, route blocks, profiles, gateways, dialplan extensions.
Surfaces risk flags - e.g. websocket loaded without TLS, nathelper without rtpengine, chan_sip used in modern Asterisk, AND the Kamailio/OpenSIPS lump-vs-subst race (subst('/^From:.../...') colliding with KSR.hdr.append/remove or uac_replace_* or append_hf/remove_hf on the same header - corrupts the buffer at serialization).
Returns a list of suggestedQueries for search_sip_docs so you can ground the actual review in vendor docs.

Pair with: one or more search_sip_docs calls (cite returned source_url values verbatim instead of recalling vendor behavior from memory); webrtc_sip_checklist when the config is a WebRTC ↔ SIP bridge.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	The full config blob (or a representative excerpt). UTF-8 plaintext.
`vendorHint`	No	Skip auto-detection and force a specific vendor. Use when the heuristics return low confidence or the wrong vendor.
`filenameHint`	No	Optional filename or path (e.g. "kamailio.cfg", "etc/asterisk/pjsip.conf", "sofia/external.xml"). Strongly improves vendor detection.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description adds that it's 'pure CPU, no network' and 'read-only'. It details the processing steps (vendor detection, outline extraction, risk flagging, suggested queries) and mentions a specific risk (subst race), giving full behavioral transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with numbered steps and a pairing section. It front-loads cost and read-only nature. While slightly lengthy, every sentence adds value and the structure aids readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explains return value as structured outline, risk flags, and suggestedQueries. It covers inputs, behavior, and pairing, making it complete enough for this complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description adds context: filenameHint improves vendor detection, vendorHint skips auto-detection. This goes beyond schema descriptions, earning a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool reviews SIP/VoIP configs, detects vendor, extracts outline, and surfaces risk flags. It clearly distinguishes from siblings like parse_sip_message by specifying input type (config files vs. messages).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly indicates when to use: when user asks to 'review my config' or attaches relevant files. It also suggests pairing with search_sip_docs and webrtc_sip_checklist. While it doesn't explicitly list alternatives, the context of sibling tools provides differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_sip_docsSearch SIP / VoIP documentation (RAG)A

Read-only

Inspect

[cost: rag (one embed + one vector search) | read-only, network: outbound to embed model only]

Vector search over Sipflow's curated VoIP knowledge base: vendor docs (Asterisk, FreeSWITCH, Kamailio, OpenSIPS, Twilio, Cisco, etc.), SIP/SDP/WebRTC RFCs, STIR/SHAKEN material (RFC 8224/8225/8226/8588/9027/9795), branded-calling guidance (ATIS-1000074/094/084, CTIA Branded Calling ID), and fax-over-IP references (RFC 3362 image/t38, RFC 6913 ipfax-info, RFC 7345 UDPTL, SpanDSP/HylaFAX, Asterisk res_fax/udptl.conf, FreeSWITCH mod_spandsp/t38_gateway, Cisco CUBE T.38).

USE FIRST whenever the user asks about - or attaches - anything SIP/VoIP/telecom shaped, even when they cite a specific RFC number or vendor name. The corpus has the current text and your training data may not. Trigger conditions: vendor configs (kamailio.cfg, sip.conf, pjsip.conf, FreeSWITCH XML profile, opensips.cfg, res_fax.conf / udptl.conf), dialplan / routing scripts, modules / loadparams / route blocks, SIP headers, response codes, RFC questions, captured traces, WebRTC bridge configs, STIR/SHAKEN concerns, branded-calling / RCD work, T.38 / T.30 fax decoding or reinvite failures.

Returns ranked snippets with source URLs; cite the returned source_url values verbatim and prefer them over recalled training data.

Examples of when to use:

"does this kamailio.cfg look standard for WebRTC + SIP users?"
"why would Asterisk PJSIP reject this re-INVITE?"
"what does Kamailio's loose_route() do? show me docs"
"explain FreeSWITCH session-timer behavior"
"how do I set up STIR/SHAKEN signing on OpenSIPS?"
"what does ATIS-1000074 say about A-level attestation?"
"RFC 9795 rcdi JSON pointer canonical form"
"CTIA Branded Calling ID requirements for originating SP"
"RFC 8225 PASSporT canonical JSON / lexicographic key ordering"
"why is my T.38 reinvite getting 488 from a Cisco CUBE?"
"Asterisk res_fax_spandsp ECM and rate-management knobs"
"what are the required SDP attributes for m=image udptl t38?"

Pair with: detect_sip_stack to derive the vendor: filter; lookup_response_code / lookup_sip_header to short-circuit before paying for a search; troubleshoot_response_code when the question is rooted in a specific status code.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of snippets to return (1-15).
`query`	Yes	Natural-language question or keywords. Be specific - include vendor name, header, error code, module, or RFC if known. Multi-sentence queries are fine.
`vendor`	No	Restrict results to a single vendor/stack (e.g. "asterisk", "kamailio"). Omit to search all vendors.
`sourceType`	No	Restrict by document type. Available: rfc \| pbx (Asterisk/FreeSWITCH/Kamailio/OpenSIPS) \| sbc (SBCs) \| cpaas (Twilio/Telnyx/...) \| uc-cloud (Teams/Zoom/...) \| endpoint (phones/softphones) \| stir-shaken \| transport (RTP/SRTP/DTLS/ICE/STUN/TURN RFCs) \| regulatory \| observability.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses cost structure ('[cost: rag (one embed + one vector search) | read-only, network: outbound to embed model only]'), which goes beyond the annotations that already declare readOnlyHint=true. It also explains that results include source URLs and instructs to cite them verbatim, providing clear behavioral expectations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Despite being long, the description is well-structured and every sentence adds unique value. It is front-loaded with the most critical use case, then covers corpus, trigger conditions, examples, and pairing advice. No filler or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, 28 sibling tools, rich annotations), the description is remarkably complete. It covers purpose, usage, behavior, parameters, and inter-tool relations. It also addresses return format and citation guidance, compensating for the lack of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all 4 parameters, so the description does not need to explain each parameter in detail. However, it adds value by providing usage tips for the query parameter ('Be specific - include vendor name...') and suggesting how to derive the vendor filter using detect_sip_stack. This enhances the schema's guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a vector search over a curated VoIP knowledge base, lists specific vendors, RFCs, and topics covered, and provides numerous example queries. It distinguishes itself from sibling tools by saying 'USE FIRST' for anything SIP/VoIP/telecom shaped and explicitly pairs with other tools like detect_sip_stack.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance ('USE FIRST whenever the user asks about - or attaches - anything SIP/VoIP/telecom shaped'), lists trigger conditions, and suggests using sibling tools like lookup_response_code or detect_sip_stack to short-circuit when appropriate. It also includes example queries that cover a wide range of scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sip_ladder_exampleCanonical SIP scenario as a Mermaid ladderA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Return a hand-curated SIP scenario as a Mermaid sequenceDiagram plus a bullet list of step-by-step explanations with RFC references. Use this when the user asks 'show me what X looks like' and you don't have a real trace handy.

Available scenarios: basic-call, auth-challenge, cancel-before-answer, early-media, hold-resume, refer-blind, proxy-with-record-route, shaken-attested-invite, bye-glare, redirect-302.

Pair with: search_sip_docs for vendor-specific quirks of the scenario; render_sip_ladder if the user does have a real trace.

ParametersJSON Schema

Name	Required	Description
`verbose`	No	Also include the scenario's long Markdown explanation. Useful when the LLM is going to teach the user; off by default to keep responses small.
`scenario`	Yes	Which scenario to render. Valid ids: basic-call, auth-challenge, cancel-before-answer, early-media, hold-resume, refer-blind, proxy-with-record-route, shaken-attested-invite, bye-glare, redirect-302.
`actorNames`	No	Override the default actor display names (Alice/Bob/Proxy). Maps to the first/second/third lane in left-to-right order.
`includeExplanation`	No	Append a bulleted Notes section with RFC references for each step.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds '[cost: free (pure CPU, no network) | read-only]' beyond annotations, disclosing safety and cost. It also specifies output format and that scenarios are hand-curated. No contradiction with annotations; readOnlyHint=true is consistent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three compact sentences plus list of scenarios and pairing guidance. Every sentence serves a purpose: purpose, usage, available scenarios, complementary tools. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description fully explains return format (Mermaid diagram + bullet list with RFCs). Verbose and includeExplanation details clarify optional parts. All 4 parameters are covered. For a diagram generation tool, context is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 4 parameters have schema descriptions (100% coverage), and the description adds value: actorNames mapping 'to the first/second/third lane in left-to-right order,' verbose explains 'off by default to keep responses small,' includeExplanation specifies 'Append a bulleted Notes section.'

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Return a hand-curated SIP scenario as a Mermaid sequenceDiagram plus a bullet list of step-by-step explanations with RFC references.' It identifies the verb (Return), resource (SIP scenario), and output format, while distinguishing from siblings like render_sip_ladder for real traces.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: 'Use this when the user asks "show me what X looks like" and you don't have a real trace handy.' Also pairs with search_sip_docs and render_sip_ladder, clarifying when to use alternatives. The scenario enum list further guides selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stir_attestation_explainerExplain STIR/SHAKEN attestation levels and codesA

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Static explainer for STIR/SHAKEN: maps attestation levels (A / B / C per RFC 8588) to plain-English requirements + common scenarios, and SIP codes commonly emitted by signing/verification (428 / 436 / 437 / 438 / 608) to their RFC anchors and operator causes.

Provide either attestation (A/B/C) or code (e.g. 438).

Pair with: validate_stir_shaken_identity when the user has the JWS segments and wants the cryptographic verdict; search_sip_docs({ sourceType: 'stir-shaken', ... }) for ATIS / CTIA / RFC depth.

ParametersJSON Schema

Name	Required	Description	Default
`code`	No	SIP response code commonly seen in STIR/SHAKEN flows (428, 436, 437, 438, 608).
`attestation`	No	Attestation level: "A" Full / "B" Partial / "C" Gateway.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Disclosures include 'read-only', 'pure CPU, no network', and 'static explainer' which aligns with annotations (readOnlyHint: true) and adds context beyond annotations about cost and behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two compact paragraphs that front-load purpose and usage. Every sentence adds value: first paragraph explains function, second gives pairing guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a static explainer without output schema, the description adequately covers what it returns (plain-English explanations, RFC anchors, operator causes) and how it fits among many sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description restates the two parameters with examples but adds little additional semantic information beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'static explainer for STIR/SHAKEN' that maps attestation levels and SIP codes to plain-English requirements and RFC anchors. It explicitly distinguishes from siblings like validate_stir_shaken_identity and search_sip_docs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance: 'Provide either attestation (A/B/C) or code' and 'Pair with: validate_stir_shaken_identity... search_sip_docs...' to indicate when to use this tool versus alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_sipflow_feedbackSubmit feedback about Sipflow tools, docs, or coverageAInspect

[cost: write (single MongoDB row) | rate-limited per IP: 3/min, 20/day]

Send the Sipflow team feedback when something doesn't work, a vendor or RFC isn't covered, or a tool produced a wrong/incomplete answer. Categories:

docs_gap: search_sip_docs returned nothing useful, vendor missing, coverage incomplete
tool_bug: a tool errored, returned garbage, or behaved unexpectedly on a real input
wrong_answer: the answer it produced was incorrect for the SIP/VoIP question asked
feature_request: a new tool, dataset, or behavior the user wants
general: anything else

PRIVACY CONTRACT (MUST FOLLOW):

Use this tool only when the user explicitly asks to send feedback, OR when you have completed the user's primary task and there is a clear, actionable gap worth reporting.
ALWAYS show the user the exact summary + details + other fields you plan to send and wait for an explicit yes before calling this tool. Set userConsent: true only after that confirmation.
NEVER include raw SIP traces, INVITE/REGISTER bodies, SDP, phone numbers, IP addresses, Call-IDs, or any other PII. Summarize in your own words instead. The server runs a sanitizer as a backstop, but you are the first line of defense.
The contact field is optional and may only be filled when the user explicitly provides an email and asks you to include it.
The traceExcerpt field is optional and accepts a sanitized SIP message text block (Via/From/To/Call-ID, optional minimal SDP) the user explicitly approved attaching. Pipe minimize_sip_trace output here, NEVER raw INVITE / REGISTER bodies or full pcap text. Phone numbers, IPs, and emails are scrubbed server-side as a backstop; the agent must still summarize / minimize first. The same userConsent: true covers both the text fields and the excerpt - if the user wants the excerpt included you must show it to them before sending.

The tool returns a ticket id (fb_xxxxxxxx) and stores one anonymous row keyed by your daily-rotating IP hash (no raw IP, no account). Rate-limited at 3/min and 20/day per IP hash.

ParametersJSON Schema

Name	Required	Description
`contact`	No	Optional email the user can be reached at. Only include when the user explicitly provides one and asks you to attach it.
`details`	No	Longer description: what the user was trying to do, what happened, what they expected. PII-free.
`summary`	Yes	Short one-line description of the feedback. Will be shown to humans triaging. PII-free.
`category`	Yes	Bucket. Use docs_gap for missing RAG coverage, tool_bug for broken behavior, wrong_answer for incorrect output, feature_request for new asks, general for anything else.
`relatedTool`	No	Name of the Sipflow MCP tool the feedback relates to, if any (e.g. `search_sip_docs`, `troubleshoot_response_code`).
`userConsent`	Yes	MUST be true. Set this only after you have shown the user the exact payload above (including any `traceExcerpt`) and they have confirmed they want it sent.
`relatedQuery`	No	The search query / question that failed or returned poor results. PII-free.
`traceExcerpt`	No	Optional sanitized SIP message text the user explicitly approved attaching. Use the output of `minimize_sip_trace` (or a hand-scrubbed Via/From/To/Call-ID/CSeq block, optionally with minimal SDP). NEVER paste raw INVITE / REGISTER bodies, full pcap text, or anything containing phone numbers / IPs / Call-IDs you have not already redacted. The server runs a backstop sanitizer that redacts phones, IPs, and emails. Hard cap 32 kB.
`vendorOrTopic`	No	Vendor slug, RFC number, or topic the feedback relates to (e.g. 'freeswitch', 'RFC 3261', 'fax-over-IP').

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=false, destructiveHint=false, openWorldHint=true), the description adds critical behavioral context: a) it is a write operation costing a MongoDB row, b) rate-limited per IP (3/min, 20/day), c) returns a ticket id, d) includes a privacy contract with PII-handling rules, and e) discloses a server-side sanitizer backstop. All disclosures align with annotations and there is no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: cost/rate-limits first, then purpose, categories, and a numbered privacy contract. It is front-loaded and organized. However, it is somewhat verbose (especially the privacy contract could be condensed) and the format uses markdown-style elements that may not render in all contexts. Still, it efficiently conveys necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, privacy contract, rate limits, categories), the description is fully complete. It covers all aspects an agent needs to invoke correctly: categories, required fields, consent mechanics, PII restrictions, and return value. No output schema exists, but the return value (ticket id) is described. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description still adds significant value: it explains the categories in depth, reiterates field constraints (e.g., traceExcerpt requires minimization via minimize_sip_trace), and clarifies the consent flow. It also elaborates on the optional fields (contact, relatedTool, etc.) beyond the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Send the Sipflow team feedback when something doesn't work, a vendor or RFC isn't covered, or a tool produced a wrong/incomplete answer.' It uses specific verbs and resources ('submit feedback about Sipflow tools, docs, or coverage') and is easily distinguishable from all sibling tools, which are SIP analysis and diagnostics tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance (e.g., 'when something doesn't work, a vendor or RFC isn't covered, or a tool produced a wrong/incomplete answer') and a detailed privacy contract with numbered rules for when and how to use the tool. It also lists categories (docs_gap, tool_bug, wrong_answer, etc.) that help the agent decide appropriateness. While no alternative feedback tool exists, the guidance is thorough and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

troubleshoot_response_codeTroubleshoot a SIP response code (RAG, vendor-aware)A

Read-only

Inspect

[cost: rag (one embed + one vector search) | read-only, network: outbound to embed model only | rate-limited per IP]

Like lookup_response_code but augmented: returns the static RFC entry PLUS the top vendor-specific RAG hits for the exact code (and any free-text context the user pasted). When the static entry carries known vendor-specific reason-phrase variants (e.g. 484 + opensips → 'Invalid FROM' from parse_from.c), those phrases are folded into the embed query so the right vendor docs surface.

Use when the user asks 'why did reject this with ?' and you want vendor-grounded common causes, not just the RFC text. Especially helpful for fax-rejection paths — 488 / 415 / 606 on a T.38 reinvite (m=image udptl t38) is one of the most common 488 variants and the tool surfaces FreeSWITCH mod_spandsp / Cisco CUBE / AudioCodes T.38 docs alongside the RFC text.

Pair with: lookup_response_code first (cheaper); lint_sip_request when the code is 4xx and they have the offending request; compare_sdp_offer_answer for 488/415 caused by a T.38 reinvite SDP mismatch; validate_stir_shaken_identity when the code is 438; stir_attestation_explainer for STIR-shaped codes (428/436/437/438/608); dns_diagnose_sip_target when the code is 503 / 408 and routing is suspect.

ParametersJSON Schema

Name	Required	Description
`code`	Yes	SIP response code (e.g. 488, 503, 438).
`context`	No	Optional free-text context: a snippet of the trace, the Reason header, the Warning header, or a one-line description of what the user was trying to do.
`vendorHint`	No	Vendor slug to filter the RAG search (e.g. "kamailio", "freeswitch", "twilio"). Strongly recommended.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description adds cost details ('rag: one embed + one vector search'), network outbound, rate limiting, and the specific behavioral trait of folding vendor-specific reason-phrase variants into the embed query. This provides substantial transparency beyond structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with cost and read-only info in brackets. It is structured with clear sections (cost, comparison, usage, pairing). While slightly long, each part serves a purpose, and the information density is high.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a RAG-based tool without an output schema, the description adequately explains inputs and intended behavior. It does not specify exact return format, but the context provided is sufficient for an AI agent to understand when and how to use it. Minor gap but acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description adds qualitative value: it notes that 'vendorHint' is 'Strongly recommended' and explains how 'context' can be used. This goes beyond the schema descriptions, but does not drastically change understanding, so a 4 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'troubleshoots a SIP response code' and specifies it returns 'static RFC entry PLUS the top vendor-specific RAG hits'. It distinguishes itself from sibling 'lookup_response_code' by being augmented with vendor-aware search, providing a specific and distinct purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Use when the user asks "why did <vendor> reject this with <code>?" and you want vendor-grounded common causes'. It also provides pairing suggestions with specific sibling tools for different scenarios, offering clear guidance on alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_e164_numberValidate / classify a phone number (E.164 + NANP)A

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Parse a phone number, normalize to E.164, and classify it. International coverage is via libphonenumber-js (every country, line type when known). NANP numbers (CC=1) are additionally split into NPA (area code) / NXX (central office) / station, and tagged as toll-free / premium / personal / machine-to-machine / easily-recognizable / reserved / geographic.

Use when validating From / P-Asserted-Identity / SHAKEN orig.tn, deciding whether an outbound call needs full attestation, or sanity-checking caller ID format.

Pair with: lint_sip_request to validate that PASSporT orig.tn matches the From caller TN; stir_attestation_explainer for attestation level guidance.

ParametersJSON Schema

Name	Required	Description	Default
`number`	Yes	Phone number in any common form. E.164 (+CCNNN…) is preferred; 10-digit US numbers are accepted as a convenience.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds value beyond annotations (readOnlyHint and cost=free). It explains the underlying library (libphonenumber-js), international coverage, NANP splitting details, and classification tags (toll-free, premium, etc.), providing deep behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Highly concise: cost/read-only info front-loaded, then clear bullet points for features and use cases. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks explicit description of the return value or output format. Since there is no output schema, the description should at least mention what the tool returns (e.g., a classification result object). This gap reduces completeness despite other strengths.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a good description. The description adds nuance ('in any common form', 'E.164 is preferred; 10-digit US numbers accepted') that aids correct usage, earning a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb-resource combination ('Parse, normalize, classify') and clearly differentiates from siblings by listing paired tools (lint_sip_request, stir_attestation_explainer) and specific use cases (validating From, deciding attestation).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance with phrases like 'Use when validating...', 'deciding whether...', and 'sanity-checking...'. Also recommends pairing with sibling tools, providing clear context for when to use vs. alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_stir_shaken_identityValidate STIR/SHAKEN Identity JWTA

Read-only

Inspect

[cost: external_io (HTTPS fetch of the x5u cert) | read-only]

Verify a SIP Identity: JWS (RFC 8224 / SHAKEN). Fetches the x5u certificate, parses it, verifies the ES256 signature against the cert's public key, and optionally validates the RCD icon hash (RFC 9795). The icon-hash check accepts both payload.rcdi["/icn"] (RFC 9795 §6.1 spec form) and the legacy payload.rcdi["icn"] form deployed in the wild - the legacy form raises a warning unless strictRfc9795: true (then it fails). Returns per-check pass/fail/warning with details - useful for diagnosing 438 Invalid Identity Header rejections, expired certs, and tampered PASSporTs.

Pair with: stir_attestation_explainer for the human-readable A/B/C interpretation; lookup_response_code(438) for the SIP-side context; lint_sip_request for non-cryptographic structural checks on the host INVITE.

ParametersJSON Schema

Name	Required	Description
`rcdi`	No	Full `payload.rcdi` claim. The validator looks up `/icn` first (RFC 9795 §6.1 spec form), then falls back to the legacy `icn` key (still seen in the wild). A legacy hit produces a `rcdi-pointer-form` warning unless `strictRfc9795: true` (then it fails).
`iconUrl`	No	RCD icon URL - `payload.rcd.icn`. Omit if no Rich Call Data icon.
`infoUrl`	No	Cert URL from the SIP Identity header `info=` param (or the JWT `x5u`). If omitted, signature verification is skipped.
`headerB64`	Yes	Base64url-encoded JWS protected header (the first dot-separated segment).
`payloadB64`	Yes	Base64url-encoded JWS payload (the second segment).
`signatureB64`	Yes	Base64url-encoded ES256 signature (the third segment, raw R\|\|S, 64 bytes).
`strictRfc9795`	No	When true, reject the legacy `rcdi['icn']` key as a hard failure rather than warning. Default false.
`expectedIconHash`	No	Pre-extracted icon hash, e.g. `sha256-XYZ` (RFC 9795 §6.1 form `<algorithm>-<base64>`). Pass this OR `rcdi`.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true; description adds cost hint, external IO, and edge case handling (legacy icn with warning/failure). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with cost hint upfront, core action, edge case, return type, and sibling pairing. Slightly dense but efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 8 params, nested objects, and external IO, description covers validation flow, edge cases, and return type. Output schema absent but return details mentioned.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% parameters with descriptions; description adds minimal extra parameter detail beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it verifies a SIP Identity JWS, explains the process (fetch x5u, verify signature, validate icon hash), and distinguishes from siblings by naming tools to pair with.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides pairing suggestions and diagnostic use cases (438 errors, expired certs, tampered PASSporTs). Lacks explicit when-not-to-use but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

webrtc_sip_checklistWebRTC ↔ SIP interop checklist (config-aware)A

Read-only

Inspect

[cost: free (pure CPU, no network) | read-only]

Return a curated checklist of WebRTC ↔ SIP requirements (WSS transport, ICE gathering, DTLS-SRTP fingerprint, rtcp-mux + BUNDLE, media relay / rtpengine, STUN/TURN, secure-context Origin allowlist, Opus codec, session-timer behavior across the bridge, STIR/SHAKEN signing). When configText is supplied, each item is marked as 'looks present' or 'check needed' based on simple regex signals.

Use when the user is building a WebRTC ↔ SIP bridge or troubleshooting one (no media, one-way audio, ICE failures).

Pair with: review_sip_config for the full structured outline; search_sip_docs(vendor=...) to ground each unchecked item in vendor docs; parse_sdp / compare_sdp_offer_answer when the bug is in SDP negotiation.

ParametersJSON Schema

Name	Required	Description
`vendor`	No	Vendor slug. Omit and supply `configText` to auto-detect.
`configText`	No	Optional config blob. When supplied, items with matching signals are marked as present; vendor is auto-detected if not supplied.
`filenameHint`	No	Optional filename ("kamailio.cfg", etc.) to bias auto-detection.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true; description reinforces with 'read-only' and 'no network'. Adds useful detail about regex-based marking when configText is supplied, but doesn't go beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is efficient, front-loaded with cost/read-only, lists checklist items, usage guidance, and pairing. No fluff, but slightly long. Still well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and simple behavior, description is fairly complete. Could mention output format explicitly, but context signals and sibling list provide enough overall context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described. Description repeats schema information (e.g., vendor auto-detection) without adding new meaning, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a curated checklist of WebRTC↔SIP requirements, listing specific items. It differentiates from siblings by explicitly mentioning pairing with review_sip_config and search_sip_docs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: building or troubleshooting a WebRTC↔SIP bridge, with concrete failure scenarios. Also provides pairing guidance with related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Sipflow

Server Details

Tool Definition Quality

Available Tools

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Discussions

Your Connectors

Resources