Supports ingesting GitHub API tool definitions to manage complex workflows and reduce context usage in repository and issue management scenarios.
Ingests Kubernetes API definitions to build a tool graph, enabling high-accuracy tool retrieval and significant token reduction for complex K8s management tasks.
Provides a dedicated integration for LangChain, allowing developers to incorporate tool graph retrieval and workflow guidance into LangChain-based agents.
Integrates with Ollama to provide semantic embedding similarity for tool retrieval, enabling cross-language and semantic search capabilities.
Connects to OpenAI for semantic embedding search and supports ingesting OpenAI-compliant tool definitions to facilitate workflow-aware tool selection.
Automatically ingests tool definitions from Swagger and OpenAPI specifications to construct relationships and suggest multi-step execution workflows.
Supports parsing and ingesting tool definitions from YAML-based OpenAPI specifications to build the internal tool graph.
graph-tool-call
LLM agents can't fit thousands of tool definitions into context. Vector search finds similar tools, but misses the workflow they belong to. graph-tool-call builds a tool graph and retrieves the right chain — not just one match.
Baseline (all tools) | graph-tool-call | |
248 tools (K8s API) | 12% accuracy | 82% accuracy |
Token usage | 8,192 tokens | 1,699 tokens (79% reduction) |
50 tools (GitHub API) | 100% accuracy | 90% accuracy, 88% fewer tokens |
Measured with qwen3:4b (4-bit) — full benchmark below
The Problem
LLM agents need tools. But as tool count grows, two things break:
Context overflow — 248 Kubernetes API endpoints = 8,192 tokens of tool definitions. The LLM chokes and accuracy drops to 12%.
Vector search misses workflows — Searching "cancel my order" finds
cancelOrder, but the actual flow islistOrders → getOrder → cancelOrder → processRefund. Vector search returns one tool; you need the chain.
graph-tool-call solves both. It models tool relationships as a graph, retrieves multi-step workflows via hybrid search (BM25 + graph traversal + embedding + MCP annotations), and cuts token usage by 64–91% while maintaining or improving accuracy.
At a Glance
What you get | How |
Workflow-aware retrieval | Graph edges encode PRECEDES, REQUIRES, COMPLEMENTARY relations |
Hybrid search | BM25 + graph traversal + embedding + MCP annotations, fused via wRRF |
Zero dependencies | Core runs on Python stdlib only — add extras as needed |
Any tool source | Auto-ingest from OpenAPI / Swagger / MCP / Python functions |
History-aware | Previously called tools are demoted; next-step tools are boosted |
MCP Proxy | 172 tools across servers → 3 meta-tools, saving ~1,200 tokens/turn |
Why Not Just Vector Search?
Scenario | Vector-only | graph-tool-call |
"cancel my order" | Returns |
|
"read and save file" | Returns |
|
"delete old records" | Returns any tool matching "delete" | Destructive tools ranked first via MCP annotations |
"now cancel it" (after listing orders) | No context from history | Demotes used tools, boosts next-step tools |
Multiple Swagger specs with overlapping tools | Duplicate tools in results | Cross-source auto-deduplication |
1,200 API endpoints | Slow, noisy results | Categorized + graph traversal for precise retrieval |
How It Works
OpenAPI / MCP / Python functions → Ingest → Build tool graph → Hybrid retrieve → AgentExample: User says "cancel my order and process a refund"
Vector search finds cancelOrder. But the actual workflow is:
┌──────────┐
PRECEDES │listOrders│ PRECEDES
┌─────────┤ ├──────────┐
▼ └──────────┘ ▼
┌──────────┐ ┌───────────┐
│ getOrder │ │cancelOrder│
└──────────┘ └─────┬─────┘
│ COMPLEMENTARY
▼
┌──────────────┐
│processRefund │
└──────────────┘graph-tool-call returns the entire chain, not just one tool. Retrieval combines four signals via weighted Reciprocal Rank Fusion (wRRF):
BM25 — keyword matching
Graph traversal — relation-based expansion (PRECEDES, REQUIRES, COMPLEMENTARY)
Embedding similarity — semantic search (optional, any provider)
MCP annotations — read-only / destructive / idempotent hints
Installation
The core package has zero dependencies — just Python standard library. Install only what you need:
pip install graph-tool-call # core (BM25 + graph) — no dependencies
pip install graph-tool-call[embedding] # + embedding, cross-encoder reranker
pip install graph-tool-call[openapi] # + YAML support for OpenAPI specs
pip install graph-tool-call[mcp] # + MCP server mode
pip install graph-tool-call[all] # everythingExtra | Installs | When to use |
| pyyaml | YAML OpenAPI specs |
| numpy | Semantic search (connect to Ollama/OpenAI/vLLM) |
| numpy, sentence-transformers | Local sentence-transformers models |
| rapidfuzz | Duplicate detection |
| langchain-core | LangChain integration |
| pyvis, networkx | HTML graph export, GraphML |
| dash, dash-cytoscape | Interactive dashboard |
| ai-api-lint | Auto-fix bad API specs |
| mcp | MCP server mode |
pip install graph-tool-call[lint]
pip install graph-tool-call[similarity]
pip install graph-tool-call[visualization]
pip install graph-tool-call[dashboard]
pip install graph-tool-call[langchain]Quick Start
Try it in 30 seconds (no install needed)
uvx graph-tool-call search "user authentication" \
--source https://petstore.swagger.io/v2/swagger.jsonQuery: "user authentication"
Source: https://petstore.swagger.io/v2/swagger.json (19 tools)
Results (5):
1. getUserByName
Get user by user name
2. deleteUser
Delete user
3. createUser
Create user
4. loginUser
Logs user into the system
5. updateUser
Updated userPython API
from graph_tool_call import ToolGraph
# Build a tool graph from the official Petstore API
tg = ToolGraph.from_url(
"https://petstore3.swagger.io/api/v3/openapi.json",
cache="petstore.json",
)
print(tg)
# → ToolGraph(tools=19, nodes=22, edges=100)
# Search for tools
tools = tg.retrieve("create a new pet", top_k=5)
for t in tools:
print(f"{t.name}: {t.description}")
# Search with workflow guidance
results = tg.retrieve_with_scores("process an order", top_k=5)
for r in results:
print(f"{r.tool.name} [{r.confidence}]")
for rel in r.relations:
print(f" → {rel.hint}")
# Execute an API directly (OpenAPI tools)
result = tg.execute(
"addPet", {"name": "Buddy", "status": "available"},
base_url="https://petstore3.swagger.io/api/v3",
)MCP Server (Claude Code, Cursor, Windsurf, etc.)
Run as an MCP server — any MCP-compatible agent can use tool search with just a config entry:
// .mcp.json
{
"mcpServers": {
"tool-search": {
"command": "uvx",
"args": ["graph-tool-call[mcp]", "serve",
"--source", "https://api.example.com/openapi.json"]
}
}
}The server exposes 6 tools: search_tools, get_tool_schema, execute_tool, list_categories, graph_info, load_source.
Search results include workflow guidance — relations between tools and suggested execution order:
{
"tools": [
{"name": "createOrder", "relations": [
{"target": "getOrder", "type": "precedes", "hint": "Call this tool before getOrder"}
]},
{"name": "getOrder", "prerequisites": ["createOrder"]}
],
"workflow": {"suggested_order": ["createOrder", "getOrder", "updateOrderStatus"]}
}MCP Proxy (aggregate multiple MCP servers)
When you have many MCP servers, their tool names pile up in every LLM turn. MCP Proxy bundles them behind a single server — 172 tools → 3 meta-tools, saving ~1,200 tokens per turn.
Step 1. Create backends.json with your existing MCP servers:
// ~/backends.json
{
"backends": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp", "--headless"]
},
"filesystem": {
"command": "npx",
"args": ["-y", "@anthropic/mcp-filesystem", "/home"]
},
"my-api": {
"command": "uvx",
"args": ["some-mcp-server"],
"env": { "API_KEY": "sk-..." }
}
},
"top_k": 10,
"cache_path": "~/.cache/mcp-proxy-cache.json"
}Embedding is optional. Add
"embedding": "ollama/qwen3-embedding:0.6b"for cross-language search (requires Ollama running). Without it, BM25 keyword search still works.
Step 2. Register the proxy with Claude Code:
claude mcp add -s user tool-proxy -- \
uvx "graph-tool-call[mcp]" proxy --config ~/backends.jsonStep 3. Remove the original individual servers (so they don't duplicate):
claude mcp remove playwright -s user
claude mcp remove filesystem -s user
claude mcp remove my-api -s userStep 4. Restart Claude Code and verify:
claude mcp list
# tool-proxy: ... - ✓ Connected
# (individual servers should be gone)That's it. The proxy exposes search_tools, get_tool_schema, and call_backend_tool. After searching, matched tools are dynamically injected for 1-hop direct calling.
// .mcp.json (project-level or global)
{
"mcpServers": {
"tool-proxy": {
"command": "uvx",
"args": ["graph-tool-call[mcp]", "proxy",
"--config", "/path/to/backends.json"]
}
}
}Direct Integration (OpenAI, Ollama, vLLM, Azure, etc.)
Use retrieve() to search, then convert to OpenAI function-calling format. Works with any OpenAI-compatible API:
from openai import OpenAI
from graph_tool_call import ToolGraph
from graph_tool_call.langchain.tools import tool_schema_to_openai_function
# Build graph from any source
tg = ToolGraph.from_url(
"https://petstore3.swagger.io/api/v3/openapi.json",
cache="petstore.json",
)
# Retrieve only the relevant tools for a query
tools = tg.retrieve("create a new pet", top_k=5)
# Convert to OpenAI function-calling format
openai_tools = [
{"type": "function", "function": tool_schema_to_openai_function(t)}
for t in tools
]
# Use with any provider — OpenAI, Azure, Ollama, vLLM, llama.cpp, etc.
client = OpenAI() # or OpenAI(base_url="http://localhost:11434/v1") for Ollama
response = client.chat.completions.create(
model="gpt-4o",
tools=openai_tools, # only 5 relevant tools instead of all 248
messages=[{"role": "user", "content": "create a new pet"}],
)from anthropic import Anthropic
from graph_tool_call import ToolGraph
tg = ToolGraph.from_url("https://api.example.com/openapi.json")
tools = tg.retrieve("cancel an order", top_k=5)
# Convert to Anthropic tool format
anthropic_tools = [
{
"name": t.name,
"description": t.description,
"input_schema": {
"type": "object",
"properties": {
p.name: {"type": p.type, "description": p.description}
for p in t.parameters
},
"required": [p.name for p in t.parameters if p.required],
},
}
for t in tools
]
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
tools=anthropic_tools,
messages=[{"role": "user", "content": "cancel my order"}],
max_tokens=1024,
)SDK Middleware (zero code changes)
Already have tool-calling code? Add one line to automatically filter tools:
from graph_tool_call import ToolGraph
from graph_tool_call.middleware import patch_openai
tg = ToolGraph.from_url("https://api.example.com/openapi.json")
patch_openai(client, graph=tg, top_k=5) # ← add this line
# Existing code unchanged — 248 tools go in, only 5 relevant ones are sent
response = client.chat.completions.create(
model="gpt-4o",
tools=all_248_tools,
messages=messages,
)# Also works with Anthropic
from graph_tool_call.middleware import patch_anthropic
patch_anthropic(client, graph=tg, top_k=5)LangChain Integration
pip install graph-tool-call[langchain]from graph_tool_call import ToolGraph
from graph_tool_call.langchain import GraphToolRetriever
tg = ToolGraph.from_url("https://api.example.com/openapi.json")
# Use as a LangChain retriever — compatible with any chain/agent
retriever = GraphToolRetriever(tool_graph=tg, top_k=5)
docs = retriever.invoke("cancel an order")
for doc in docs:
print(doc.page_content) # "cancelOrder: Cancel an existing order"
print(doc.metadata["tags"]) # ["order"]Benchmark
graph-tool-call verifies two things.
Can performance be maintained or improved by giving the LLM only a subset of retrieved tools?
Does the retriever itself rank the correct tools within the top K?
The evaluation compared the following configurations on the same set of user requests.
baseline: pass all tool definitions to the LLM as-is
retrieve-k3 / k5 / k10: pass only the top K retrieved tools
+ embedding / + ontology: add semantic search and LLM-based ontology enrichment on top of retrieve-k5
The model used was qwen3:4b (4-bit, Ollama).
Evaluation Metrics
Accuracy: Did the LLM ultimately select the correct tool?
Recall@K: Was the correct tool included in the top K results at the retrieval stage?
Avg tokens: Average tokens passed to the LLM
Token reduction: Token savings compared to baseline
Results at a glance
Small-scale APIs (19~50 tools): baseline is already strong. In this range, graph-tool-call's main value is 64~91% token savings while maintaining near-baseline accuracy.
Large-scale APIs (248 tools): baseline collapses to 12%. In contrast, graph-tool-call maintains 78~82% accuracy. At this scale, it's not an optimization — it's closer to a required retrieval layer.
How to read the metrics
End-to-end Accuracy: Did the LLM ultimately succeed in selecting the correct tool or performing the correct workflow?
Gold Tool Recall@K: Was the canonical gold tool designated as the correct answer included in the top K at the retrieval stage?
These two metrics measure different things, so they don't always match.
In particular, evaluations that accept alternative tools or equivalent workflows as correct answers may show
End-to-end Accuracythat doesn't exactly matchGold Tool Recall@K.baseline has no retrieval stage, so
Gold Tool Recall@Kdoes not apply.
Dataset | Tools | Pipeline | End-to-end Accuracy | Gold Tool Recall@K | Avg tokens | Token reduction |
Petstore | 19 | baseline | 100.0% | — | 1,239 | — |
Petstore | 19 | retrieve-k3 | 90.0% | 93.3% | 305 | 75.4% |
Petstore | 19 | retrieve-k5 | 95.0% | 98.3% | 440 | 64.4% |
Petstore | 19 | retrieve-k10 | 100.0% | 98.3% | 720 | 41.9% |
GitHub | 50 | baseline | 100.0% | — | 3,302 | — |
GitHub | 50 | retrieve-k3 | 85.0% | 87.5% | 289 | 91.3% |
GitHub | 50 | retrieve-k5 | 87.5% | 87.5% | 398 | 87.9% |
GitHub | 50 | retrieve-k10 | 90.0% | 92.5% | 662 | 79.9% |
Mixed MCP | 38 | baseline | 96.7% | — | 2,741 | — |
Mixed MCP | 38 | retrieve-k3 | 86.7% | 93.3% | 328 | 88.0% |
Mixed MCP | 38 | retrieve-k5 | 90.0% | 96.7% | 461 | 83.2% |
Mixed MCP | 38 | retrieve-k10 | 96.7% | 100.0% | 826 | 69.9% |
Kubernetes core/v1 | 248 | baseline | 12.0% | — | 8,192 | — |
Kubernetes core/v1 | 248 | retrieve-k5 | 78.0% | 91.0% | 1,613 | 80.3% |
Kubernetes core/v1 | 248 | retrieve-k5 + embedding | 80.0% | 94.0% | 1,728 | 78.9% |
Kubernetes core/v1 | 248 | retrieve-k5 + ontology | 82.0% | 96.0% | 1,699 | 79.3% |
Kubernetes core/v1 | 248 | retrieve-k5 + embedding + ontology | 82.0% | 98.0% | 1,924 | 76.5% |
How to read this table
baseline is the result of passing all tool definitions to the LLM without any retrieval.
retrieve-k variants pass only a subset of retrieved tools to the LLM, so both retrieval quality and LLM selection ability affect performance.
Therefore, a baseline accuracy of 100% does not mean retrieve-k accuracy must also be 100%.
Gold Tool Recall@Kmeasures whether retrieval placed the canonical gold tool in the top-k, whileEnd-to-end Accuracymeasures whether the final task execution succeeded.Because of this, evaluations that accept alternative tools or equivalent workflows may show the two values not exactly matching.
Key insights
Petstore / GitHub / Mixed MCP: When tool count is small or medium, baseline is already strong. In this range, graph-tool-call's main value is significantly reducing tokens without much accuracy loss.
Kubernetes core/v1 (248 tools): When tool count is large, baseline collapses due to context overload. graph-tool-call recovers performance from 12.0% to 78.0~82.0% by narrowing candidates through retrieval.
In practice, retrieve-k5 is the best default. It offers a good balance of token efficiency and performance. On large datasets, adding embedding / ontology yields further improvement.
Retrieval performance: Does the retriever find the correct tools in the top K?
The table below measures the quality of retrieval itself, before the LLM stage. Only BM25 + graph traversal were used here — no embedding or ontology.
How to read the metrics
Gold Tool Recall@K: Was the canonical gold tool designated as the correct answer included in the top K at the retrieval stage?
This table shows how well the retriever constructs the candidate set, not the final LLM selection accuracy.
Therefore, this table should be read together with the End-to-end Accuracy table above.
Even if retrieval places the gold tool in the top-k, the final LLM doesn't always select the correct answer.
Conversely, in end-to-end evaluations that accept alternative tools or equivalent workflows as correct, the final accuracy and gold recall may not exactly match.
Dataset | Tools | Gold Tool Recall@3 | Gold Tool Recall@5 | Gold Tool Recall@10 |
Petstore | 19 | 93.3% | 98.3% | 98.3% |
GitHub | 50 | 87.5% | 87.5% | 92.5% |
Mixed MCP | 38 | 93.3% | 96.7% | 100.0% |
Kubernetes core/v1 | 248 | 82.0% | 91.0% | 92.0% |
How to read this table
Gold Tool Recall@K shows the retriever's ability to include the correct tool in the candidate set.
On small datasets,
k=5alone achieves high recall.On large datasets, increasing
kraises recall, but also increases the tokens passed to the LLM.In practice, you should consider not just recall but also token cost and final end-to-end accuracy together.
Key insights
Petstore / Mixed MCP:
k=5alone includes nearly all correct tools in the candidate set.GitHub: There is a recall gap between
k=5andk=10, sok=10may be better if higher recall is needed.Kubernetes core/v1: Even with a large number of tools,
k=5already achieves 91.0% gold recall. The retrieval stage alone can significantly compress the candidate set while retaining most correct tools.Overall,
retrieve-k5.k=3is lighter but may miss some correct tools, whilek=10may increase token costs relative to recall gains.
When do embedding and ontology help?
On the largest dataset, Kubernetes core/v1 (248 tools), we compared adding extra signals on top of retrieve-k5.
Pipeline | End-to-end Accuracy | Gold Tool Recall@5 | Interpretation |
retrieve-k5 | 78.0% | 91.0% | BM25 + graph alone is a strong baseline |
+ embedding | 80.0% | 94.0% | Recovers queries that are semantically similar but differently worded |
+ ontology | 82.0% | 96.0% | LLM-generated keywords/example queries significantly improve retrieval quality |
+ embedding + ontology | 82.0% | 98.0% | Accuracy maintained, gold recall at its highest |
Summary
Embedding compensates for semantic similarity that BM25 misses.
Ontology expands the searchable representation itself when tool descriptions are short or non-standard.
Using both together may show limited additional gains in end-to-end accuracy, but the ability to include correct tools in the candidate set becomes strongest.
Reproduce it
# Retrieval quality (fast, no LLM needed)
python -m benchmarks.run_benchmark
python -m benchmarks.run_benchmark -d k8s -v
# Pipeline benchmark (LLM comparison)
python -m benchmarks.run_benchmark --mode pipeline -m qwen3:4b
python -m benchmarks.run_benchmark --mode pipeline --pipelines baseline retrieve-k3 retrieve-k5 retrieve-k10
# Save baseline and compare
python -m benchmarks.run_benchmark --mode pipeline --save-baseline
python -m benchmarks.run_benchmark --mode pipeline --diffBasic Usage
From OpenAPI / Swagger
from graph_tool_call import ToolGraph
# From file (JSON / YAML)
tg = ToolGraph()
tg.ingest_openapi("path/to/openapi.json")
# From URL — auto-discovers all spec groups from Swagger UI
tg = ToolGraph.from_url("https://api.example.com/swagger-ui/index.html")
# With caching — build once, reload instantly
tg = ToolGraph.from_url(
"https://api.example.com/swagger-ui/index.html",
cache="my_api.json",
)
# Supports: Swagger 2.0, OpenAPI 3.0, OpenAPI 3.1From MCP Server Tools
from graph_tool_call import ToolGraph
mcp_tools = [
{
"name": "read_file",
"description": "Read a file",
"inputSchema": {"type": "object", "properties": {"path": {"type": "string"}}},
"annotations": {"readOnlyHint": True, "destructiveHint": False},
},
{
"name": "delete_file",
"description": "Delete a file permanently",
"inputSchema": {"type": "object", "properties": {"path": {"type": "string"}}},
"annotations": {"readOnlyHint": False, "destructiveHint": True},
},
]
tg = ToolGraph()
tg.ingest_mcp_tools(mcp_tools, server_name="filesystem")
tools = tg.retrieve("delete temporary files", top_k=5)MCP annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) are used as retrieval signals.
Query intent is automatically classified — read queries prioritize read-only tools, delete queries prioritize destructive tools.
Directly From an MCP Server
from graph_tool_call import ToolGraph
tg = ToolGraph()
# Public MCP endpoint
tg.ingest_mcp_server("https://mcp.example.com/mcp")
# Local/private MCP endpoint (explicit opt-in)
tg.ingest_mcp_server(
"http://127.0.0.1:3000/mcp",
allow_private_hosts=True,
)ingest_mcp_server() calls HTTP JSON-RPC tools/list, fetches the tool list,
then ingests it with MCP annotations preserved.
Remote ingest safety defaults:
private / localhost hosts are blocked by default
remote response size is capped
redirects are limited
unexpected content types are rejected
From Python Functions
from graph_tool_call import ToolGraph
def read_file(path: str) -> str:
"""Read contents of a file."""
def write_file(path: str, content: str) -> None:
"""Write contents to a file."""
tg = ToolGraph()
tg.ingest_functions([read_file, write_file])Parameters are extracted from type hints, descriptions from docstrings.
Manual Tool Registration
from graph_tool_call import ToolGraph
tg = ToolGraph()
tg.add_tools([
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
},
},
},
])
tg.add_relation("get_weather", "get_forecast", "complementary")Embedding-based Hybrid Search
Add embedding-based semantic search on top of BM25 + graph. No heavy dependencies needed — use any external embedding server (Ollama, OpenAI, vLLM, etc.) or local sentence-transformers.
pip install graph-tool-call[embedding] # numpy only (~20MB)
pip install graph-tool-call[embedding-local] # + sentence-transformers (~2GB, local models)# Ollama (recommended — lightweight, cross-language)
tg.enable_embedding("ollama/qwen3-embedding:0.6b")
# OpenAI
tg.enable_embedding("openai/text-embedding-3-large")
# vLLM / llama.cpp / any OpenAI-compatible server
tg.enable_embedding("vllm/Qwen/Qwen3-Embedding-0.6B")
tg.enable_embedding("vllm/model@http://gpu-box:8000/v1")
tg.enable_embedding("llamacpp/model@http://192.168.1.10:8080/v1")
tg.enable_embedding("http://localhost:8000/v1@my-model")
# Sentence-transformers (requires embedding-local extra)
tg.enable_embedding("sentence-transformers/all-MiniLM-L6-v2")
# Custom callable
tg.enable_embedding(lambda texts: my_embed_fn(texts))Weights are automatically rebalanced when embedding is enabled. You can fine-tune them:
tg.set_weights(keyword=0.1, graph=0.4, embedding=0.5)Save and Load
Build once, reuse everywhere. The full graph structure (nodes, edges, relation types, weights) is preserved.
# Save
tg.save("my_graph.json")
# Load
tg = ToolGraph.load("my_graph.json")
# Or use cache= in from_url() for automatic save/load
tg = ToolGraph.from_url(url, cache="my_graph.json")When embedding search is enabled, saved graphs also preserve:
embedding vectors
restorable embedding provider config when available
retrieval weights
diversity settings
This lets ToolGraph.load() restore hybrid retrieval state without rebuilding embeddings from scratch.
Analysis and Dashboard
report = tg.analyze()
print(report.orphan_tools)
app = tg.dashboard_app()
# or: tg.dashboard(port=8050)analyze() builds an operational summary with duplicates, conflicts, orphan tools,
category coverage, and relation counts. dashboard() launches the interactive
Dash Cytoscape UI for graph inspection and retrieval testing.
Advanced Features
Cross-Encoder Reranking
Second-stage reranking using a cross-encoder model.
tg.enable_reranker() # default: cross-encoder/ms-marco-MiniLM-L-6-v2
tools = tg.retrieve("cancel order", top_k=5)After narrowing candidates with wRRF, (query, tool_description) pairs are jointly encoded for more precise ranking.
MMR Diversity
Reduces redundant results to secure more diverse candidates.
tg.enable_diversity(lambda_=0.7)History-Aware Retrieval
Pass previously called tool names to improve next-step retrieval.
# First call
tools = tg.retrieve("find my order")
# → [listOrders, getOrder, ...]
# Second call
tools = tg.retrieve("now cancel it", history=["listOrders", "getOrder"])
# → [cancelOrder, processRefund, ...]Already-used tools are demoted, and tools closer to the next step in the graph are boosted.
wRRF Weight Tuning
Adjust the contribution of each signal.
tg.set_weights(
keyword=0.2, # BM25 text matching
graph=0.5, # graph traversal
embedding=0.3, # semantic similarity
annotation=0.2, # MCP annotation matching
)LLM-Enhanced Ontology
Build richer tool ontologies using any LLM. Useful for category generation, relation inference, and search keyword expansion.
tg.auto_organize(llm="ollama/qwen2.5:7b")
tg.auto_organize(llm=lambda p: my_llm(p))
tg.auto_organize(llm=openai.OpenAI())
tg.auto_organize(llm="litellm/claude-sonnet-4-20250514")Input | Wrapped as |
| Pass-through |
|
|
OpenAI client (has |
|
|
|
|
|
| litellm.completion wrapper |
Duplicate Detection
Find and merge duplicate tools across multiple API specs.
duplicates = tg.find_duplicates(threshold=0.85)
merged = tg.merge_duplicates(duplicates)
# merged = {"getUser_1": "getUser", ...}Export and Visualization
# Interactive HTML (vis.js)
tg.export_html("graph.html", progressive=True)
# GraphML (Gephi, yEd)
tg.export_graphml("graph.graphml")
# Neo4j Cypher
tg.export_cypher("graph.cypher")API Spec Lint Integration
Auto-fix poor OpenAPI specs before ingestion using ai-api-lint.
pip install graph-tool-call[lint]tg = ToolGraph.from_url(url, lint=True)CLI Reference
# One-liner search (ingest + retrieve in one step)
graph-tool-call search "cancel order" --source https://api.example.com/openapi.json
graph-tool-call search "delete user" --source ./openapi.json --scores --json
# MCP server
graph-tool-call serve --source https://api.example.com/openapi.json
graph-tool-call serve --graph prebuilt.json
graph-tool-call serve -s https://api1.com/spec.json -s https://api2.com/spec.json
# Build and save graph
graph-tool-call ingest https://api.example.com/openapi.json -o graph.json
graph-tool-call ingest ./spec.yaml --embedding --organize
# Search from pre-built graph
graph-tool-call retrieve "query" -g graph.json -k 10
# Analyze, visualize, dashboard
graph-tool-call analyze graph.json --duplicates --conflicts
graph-tool-call visualize graph.json -f html
graph-tool-call info graph.json
graph-tool-call dashboard graph.json --port 8050Full API Reference
Method | Description |
| Add a single tool (auto-detects format) |
| Add multiple tools |
| Ingest from OpenAPI / Swagger spec |
| Ingest from MCP tool list |
| Fetch and ingest from MCP HTTP server |
| Ingest from Python callables |
| Ingest Arazzo 1.0.0 workflow spec |
| Build from Swagger UI or spec URL |
| Add a manual relation |
| Auto-categorize tools |
| Build complete ontology |
| Search for tools |
| Validate and auto-correct a tool call |
| Return |
| Enable hybrid embedding search |
| Enable cross-encoder reranking |
| Enable MMR diversity |
| Tune wRRF fusion weights |
| Find duplicate tools |
| Merge detected duplicates |
| Detect and add CONFLICTS_WITH edges |
| Build operational analysis summary |
| Serialize / deserialize |
| Export interactive HTML visualization |
| Export to GraphML format |
| Export as Neo4j Cypher statements |
| Build or launch interactive dashboard |
| Suggest next tools based on graph |
Feature Comparison
Feature | Vector-only solutions | graph-tool-call |
Dependencies | Embedding model required | Zero (core runs on stdlib) |
Tool source | Manual registration | Auto-ingest from Swagger / OpenAPI / MCP |
Search method | Flat vector similarity | Multi-stage hybrid (wRRF + rerank + MMR) |
Behavioral semantics | None | MCP annotation-aware retrieval |
Tool relations | None | 6 relation types, auto-detected |
Call ordering | None | State machine + CRUD + response→request data flow |
Deduplication | None | Cross-source duplicate detection |
Ontology | None | Auto / LLM-Auto modes |
History awareness | None | Demotes used tools, boosts next-step |
Spec quality | Assumes good specs | ai-api-lint auto-fix integration |
LLM dependency | Required | Optional (better with, works without) |
Documentation
Doc | Description |
System overview, pipeline layers, data model | |
Work Breakdown Structure — Phase 0~4 progress | |
Algorithm design — spec normalization, dependency detection, search modes, call ordering, ontology modes | |
Competitive analysis, API scale data, commerce patterns | |
Release process, changelog flow, pre-release checks | |
How to write API specs that produce better tool graphs |
Contributing
Contributions are welcome.
# Development setup
git clone https://github.com/SonAIengine/graph-tool-call.git
cd graph-tool-call
pip install poetry
poetry install --with dev --all-extras # install all optional deps for full test coverage
# Run tests
poetry run pytest -v
# Lint
poetry run ruff check .
poetry run ruff format --check .
# Run benchmarks
python -m benchmarks.run_benchmark -v