agentkit-mcp-server
Allows deploying agents to Cloudflare Workers runtime.
Allows downloading models from Hugging Face for local execution.
Allows using Ollama as a local model endpoint for agent execution.
Allows using OpenAI's models for AI agent interactions.
Allows exporting telemetry data via OpenTelemetry for observability.
Provides Redis-based backend for checkpointing and state persistence.
Provides Upstash-based backend for checkpointing and state persistence.
agentkit-js
TypeScript agent runtime with WASM sandboxing, prompt-cache optimization, and parallel quality runners.
Build production-grade AI agents in TypeScript โ code-execution agents, tool-calling agents, or multi-path reasoning pipelines โ with built-in cost controls and Cloudflare Workers deployment.
# For Anthropic (Claude)
npm add @agentkit-js/core @anthropic-ai/sdk
# For OpenAI / compatible endpoints (Ollama, vLLM, etc.)
npm add @agentkit-js/core openai๐ Docs site: https://telleroutlook.github.io/agentkit-js/ ยท Getting started in 5 min: docs/guides/getting-started.md ยท Benchmarks: docs/benchmarks.md ยท Changelog: CHANGELOG.md ยท API stability: docs/strategy/api-stability.md ยท Strategy memo: docs/strategy/2026-06-competitiveness.md ยท Trust Page (D4): docs/strategy/trust.md ยท Enterprise security face: docs/strategy/security-face.md
๐ค Looking for a co-maintainer.
@agentkit-js/core@1.0.0is on the calendar for 2026-12-15. If you ship to the Vercel AI SDK / Mastra / Claude Agent SDK / OpenAI Agents JS / Cloudflare Agents / LangGraph.js communities and want npm-publish + merge rights, see CONTRIBUTING.md and GOVERNANCE.md. Release cadence ledger: docs/strategy/release-cadence-log.md. Sandbox-escape SLA drill log: docs/strategy/security-drill-log.md.
Comparison
There are several mature TypeScript agent frameworks. Here is an honest assessment of where agentkit-js fits.
Last verified: 2026-06-13. Each โ ๏ธ/โ cell links to its source on the column header's project. The "sandbox" rows have been re-framed (D2, 2026-06-13) so that having some sandbox is no longer the differentiator โ three competitors now ship one. The remaining axes โ isolation tier composability, cross-runtime neutrality, offline closure โ are what no other framework offers in one package.
agentkit-js | ||||||
npm downloads/month | ~57M | ~10M | ~3.8M | ~4M | ~3.2M | early-stage |
ToolCallingAgent | โ | โ | โ | โ | โ | โ |
Sandboxed code execution | โ (none in core) | โ (none in core) | โ SandboxAgent โ Unix-local / Docker / hosted | โ Workspace โ E2B / Daytona / Modal / Blaxel / Railway | โ @cloudflare/sandbox container | โ kernels |
Isolation tiers โ composable in one process | n/a | n/a | โ ๏ธ 1 tier (process / container, picked at run time per client) | โ ๏ธ 1 tier per provider (you swap providers, not tiers) | โ ๏ธ 1 tier (container-per-DO, vendor-bound) | โ
3 tiers, swap with one line โ |
Cross-runtime neutrality | โ ๏ธ Node + edge runtime patches | โ Node + edge | โ ๏ธ Node + Docker hosts (sandbox path needs a host process) | โ ๏ธ provider-specific (each provider has its own runtime constraint) | โ Cloudflare-only (sandbox is a CF Container) | โ same kernel API on Node, any edge runtime, browser, and offline laptop |
Offline / air-gapped closure | โ requires provider HTTP | โ requires provider HTTP | โ Sandbox + model both need network | โ all sandbox providers are cloud SaaS | โ vendor-bound | โ
|
Python execution (edge-safe, no container) | โ | โ | โ (containers required) | โ (containers required) | โ | โ Pyodide-in-WASM, runs inside a Worker isolate |
Anthropic prompt-cache management | โ ๏ธ pass-through | โ ๏ธ pass-through | โ ๏ธ via adapter | โ ๏ธ pass-through | โ | โ auto breakpoints + 1h TTL |
Self-consistency / reflect-refine runners | โ | โ manual | โ | โ | โ | โ built-in |
Budget forcing | โ | โ | โ | โ | โ | โ |
DAG tool scheduler + speculative exec | โ | โ ๏ธ graph-level | โ | โ ๏ธ workflow graph | โ | โ |
Long-history compaction | โ ๏ธ syntactic prune | โ manual | โ | โ ๏ธ observational memory | โ | โ model-summarised |
MCP support | โ | โ | โ | โ | โ | โ |
Cloudflare Workers | โ ๏ธ partial | โ | โ ๏ธ experimental | โ ๏ธ alpha | โ native | โ |
UI hooks (React/Next.js) | โ best-in-class | โ | โ | โ ๏ธ via AI SDK | โ ๏ธ | โ useAgentRun |
Provider integrations | 40+ | 300+ | OpenAI-primary | 40+ | CF Workers AI | Anthropic ยท OpenAI ยท Doubao ยท DeepSeek ยท Kimi ยท Qwen ยท GLM ยท MiniMax |
Evals framework | โ | โ ๏ธ LangSmith | โ | โ 12+ scorers | โ | โ 16 scorers + 2 multi-criterion judges |
Observability (OTel) | โ ๏ธ LangSmith | โ ๏ธ LangSmith | โ | โ | โ | โ OtelBridge + GenAI semconv |
Retry / resilience | โ | โ | โ | โ | โ | โ RetryPolicy |
Durable workflows / checkpointing | โ DurableAgent (AI SDK 6) | โ LangGraph | โ (Assistants API retiring 2026-08-26) | โ ๏ธ partial | โ Durable Objects | โ Checkpointer + 4 backends (CF KV / DO / Redis / Upstash) |
SSE Last-Event-ID resume | โ ๏ธ via DurableAgent | โ runtime | โ | โ | โ | โ EventLog primitive + worker-native |
HITL persisted suspend/resume | โ | โ | โ | โ ๏ธ partial | โ ๏ธ via DO | โ
stateless |
Embedded local LLM (in-process, offline) | โ ๏ธ via Ollama HTTP | โ ๏ธ via Ollama HTTP | โ | โ ๏ธ via Ollama HTTP | โ | โ
|
Where competitors are stronger
Vercel AI SDK โ If you're building a chat UI with Next.js, use this. The React hooks (
useChat,useAgent),DurableAgentfor stateful/resumable workflows (AI SDK 6), native MCP support, and DevTools panel are all best-in-class. 57M monthly downloads.LangChain/LangGraph.js โ If you need 300+ integrations (vector stores, document loaders, obscure providers) or graph-based durable workflows with checkpointing and human-in-the-loop, LangGraph is battle-tested at LinkedIn, Uber, and GitLab scale.
Mastra โ Best eval framework (12+ built-in scorers including trajectory and tool accuracy). Strong developer onboarding. Their "Observational memory" pattern was first-mover; agentkit-js now ships an equivalent (
ObservationalMemory) plus extra prompt-cache-aware compression โ see docs/guides/observational-memory.md.Cloudflare Agents SDK โ If you're building on Cloudflare specifically, Durable Objects give you stateful agents with persistent scheduling that nothing else matches natively.
OpenAI Agents JS โ If your stack is OpenAI-only and you want first-party support, the cleanest path. The 2026-04 release added
SandboxAgentwith Unix-local, Docker, and hosted clients; for OS-level isolation backed by OpenAI itself, this is the path of least resistance.
Where agentkit-js is differentiated
Three isolation tiers under one swappable interface (D2, 2026-06-13). OpenAI Agents JS now ships
SandboxAgent(Unix-local / Docker / hosted) and Mastra ships Workspace providers (E2B / Daytona / Modal / Blaxel / Railway). The differentiator is no longer "has a sandbox" โ it's that agentkit-js exposes three tiers (VmKernelin-process,QuickJSKernel/PyodideKernel/WasmtimeKerneltrue WASM,RemoteSandboxKernelmicroVM) under oneKernelinterface, swap them with one line at call time, and apply oneCapabilityManifest(network/fs/env/cpu/memory) across every tier. Competitors give you one tier wired to one provider. See docs/kernels/comparison.md for the decision tree.Cross-runtime neutrality. Cloudflare's sandbox is fast on Cloudflare. Mastra's providers are SaaS. agentkit-js kernels run on Node, on any edge runtime, in a browser tab, and on a laptop with the network unplugged โ same
KernelAPI, same security manifest. This is the structural advantage no platform-bound competitor can match.Offline / air-gapped closure.
@agentkit-js/model-local(node-llama-cpp + grammar-constrained tool calls + multi-mirror downloads HF / hf-mirror / ModelScope) plus a WASM kernel = full agent loop with zero outbound traffic. For compliance-bound and air-gapped deployments, no other framework gives you this without writing the integration yourself.Durable runtime โ Same
KvBackendpowers checkpoints, the SSE event log, and structured memory. Four production backends ship out of the box (Cloudflare KV / Durable Objects / Redis / Upstash REST). A pausedawait_human_inputsurvives worker recycle for hours/days;POST /resumeis stateless. See docs/guides/durable-runtime.md.Quality runners โ Self-consistency with answer extraction (boxed / last-line / custom), reflect-refine, budget forcing ("Wait" prefill), and parallel fork-join are not shipped as first-class APIs by any competitor.
Anthropic prompt-cache optimization โ Framework actively manages
cache_controlbreakpoint placement across multi-turn history, supports the 1-hour extended TTL (ttl:"1h"), and reports per-TTL cache usage. Competitors pass through or validate limits but do not optimise placement.Speculative tool execution โ Read-only, idempotent tools are pre-executed ahead of write barriers within a DAG step. The scheduler is awakened by
$<callId>dependency references in the system prompt, enabling true parallel + ordered hybrid scheduling. No competitor implements this.GenAI semantic conventions โ
OtelBridgeemits standardgen_ai.*attributes (Datadog / Honeycomb / Grafana GenAI view compatible) alongside legacy names, switchable viasemconvMode.Observational Memory + cache-stable prefix (A1) โ Background "observer" model continuously compresses history into ranked observation paragraphs. The compressed prefix is byte-stable so Anthropic prompt cache hits stay hot across observations โ Mastra's reference work has no equivalent. ~22% of baseline tokens on a 50-turn synthetic trace; see
examples/benchmarks/observational-memory.mjs.Time-travel debugger (A2) โ
@agentkit-js/devtoolsexposes the existingEventLog+Checkpointerdata through a navigable step timeline + "fork from any step" UI. LangGraph Studio's headline feature, shipped as a tiny opt-in package (logic core ~250 LOC, React UI optional).Skills + lifecycle hooks (A3) โ
SkillRegistryfor progressive instruction/tool disclosure (Claude Agent SDK / CrewAI v1.12 convention).ToolPostHookchain (redact, truncate, audit) sits beside the existingToolGuardrailโ pre/post symmetry without confusing block vs transform semantics.Multi-criterion LLM judges (A4) โ
judgeScorerextendsllmJudgewith weighted criterion-level scoring + configurable scale. Two built-in judges (trajectoryQualityJudge,answerCompletenessJudge) work with any cheap Model adapter so judges run on Haiku/Doubao while the agent stays on Sonnet/Opus.Reproducible benchmarks โ Every percentage in this README (
โ37%,72โ90%,โ85%,โ84%) is verified by an offline benchmark inexamples/benchmarks/. Runpnpm benchto reproduce. CI fails the PR if any number drifts outside its tolerance.
Honest caveats
agentkit-js is early-stage. The differentiating features (code execution kernels, durable runtime, quality runners, speculative scheduling) are technically novel but also niche โ most teams pick a framework based on ecosystem breadth and documentation volume, where the mature options above win. Choose agentkit-js when sandboxed code execution, durable agent runs, prompt-cache cost control, or output quality runners are first-order concerns.
Verified status
Number | Verified by | |
Tests passing (all packages) | 1341 |
|
README percentages reproducible | 8 / 8 |
|
Cross-process kill-and-resume (A1 DoD โ ) | โ Redis + โ Cloudflare KV + โ Durable Object |
|
SSE Last-Event-ID gap-free replay (A2 DoD โ ) | โ |
|
Stateless HITL resume (A3 DoD โ ) | โ |
|
Observational memory โฅ4ร compression (A1) | โ 22% of baseline |
|
Code-mode bootstrap O(1) vs direct-MCP O(N) (S1/A1) | โ 13.6% of direct at N=30 tools |
|
MCP Portal: O(1) bootstrap across M upstreams (D1) | โ 3.1% of direct multi-MCP at M=5รN=30 |
|
Step-fork bundle (A2 DevTools) | โ 9 unit + 8 jsdom render tests |
|
Skill lazy-load + post-hook chain (A3) | โ |
|
Judge scorer weighted breakdown (A4) | โ |
|
Paired-statistics parity vs scipy (evals-runner) | โ 31 reference values to ยฑ1e-7 |
|
Local Studio HTTP overview (A4 of 2026-06-12 plan) | โ |
|
Framework-agnostic GenAI semconv ingest (D5) | โ 9 adapter tests |
|
Multi-model evaluation across 17ร size range | โ 5 models, 2026-06-12 |
|
Related MCP server: Code Executor MCP Server
Features
Two agent modes โ
CodeAgent(writes + executes code) andToolCallingAgent(native tool_use)Code execution โ three isolation tiers โ
VmKernel(node:vm, in-process dev/test),QuickJSKernel/PyodideKernel/WasmtimeKernel(true WASM, language-level isolation, edge-safe),RemoteSandboxKernel(E2B / Cloudflare Sandbox microVM, full process isolation). Mix tiers viafactory.createKernel().Programmatic Tool Calling (PTC) โ
ProgrammaticOrchestratorexecutes model-generated scripts inside any kernel;callTool()calls registered tools without surfacing intermediate results to the context (โ37% tokens). Self-hosted alternative to Anthropic's managed PTC container.Prompt-cache optimization โ
MessageAssemblerbuilds cache-stable prefixes; Anthropiccache_controlbreakpoints respect the 4-breakpoint limit, per-chunk token thresholds, and the 1-hour extended TTL (ttl:"1h"); per-TTL usage metering (5m vs 1h); OpenAI automatic prefix cache hit trackingTool deferred loading โ
deferLoading: trueon any tool (orMcpToolCollection.deferAll()) excludes its schema from the system prefix and loads on-demand via Anthropic Tool Search (โ85% tokens for large MCP server collections)Tool Use Examples โ
inputExampleson any tool maps to Anthropic'sinput_exampleswire field (72%โ90% parameter accuracy)Context editing โ
assembler.editToolResults({ maxTokens, keepRecent })truncates old tool outputs reversibly without breaking conversation structure (+29% task performance, โ84% tokens on web search)Cross-session Memory Tool โ
createMemoryTool({ backend })gives agents persistent read/write/list/delete memory backed by anyKvBackend(Cloudflare KV, Redis, in-memory Map)Quality runners โ majority-vote self-consistency with answer extraction (boxed / last-line / custom hook), critique-refine cycles, "Wait" prefill budget forcing, parallel fork-join with synthesis
DAG scheduling โ independent tool calls execute concurrently via
Scheduler; read-only tools speculatively pre-execute ahead of write barriers;$<callId>dependency syntax in system prompt enables true data-dependency ordering; wired intoToolCallingAgentby defaultLong-history compaction โ
agent.assembler.compact(model, keepRecentSteps)summarises old steps; inject a customMessageAssemblerviaassembleroptionProduction resilience โ automatic exponential backoff + jitter retry for 429 / 5xx / network errors on all model adapters; configurable via
RetryPolicyEvals framework โ
runEval()with 16 built-in scorers covering correctness (exactMatch,toolCallAccuracy,trajectoryValidity,finalAnswerLength,guardrailCompliance), faithfulness, relevance, recovery, efficiency, constraints, plus two multi-criterionJudgeScorerjudges (trajectoryQualityJudge,answerCompletenessJudge)Evaluation harness (
@agentkit-js/evals-runner) โrunEvaluation()plusagentkit evals runCLI: multi-model ร multi-suite ร multi-seed Pareto reports over (accuracy, cost, p95 wall). Six reference suites cover the gaps single-task benchmarks miss (long-context recall, multi-turn memory, agent trajectory, latency-under-budget, cost-per-correct, tool-sequence). Built-in paired statistics (McNemar exact / Wilson CI / paired bootstrap / G1 gate) match scipy reference values to ยฑ1e-7. All synthetic fixtures โ no overlap with public training corpora.Code-mode MCP server (
@agentkit-js/mcp-server) โcreateCodeModeServer()collapses N downstream tools into adocs_search+execute_codetwo-tool MCP surface. At 30 tools the bootstrap-token cost drops to 13.6% of direct MCP (codemode-lite reported 53%); pairs with any agentkit kernel for unified security policy.MCP Portal โ federate N upstream servers behind one neutral two-tool surface (D1, 2026-06-13) โ
createPortalServer()wraps multipleToolRegistry/ MCP upstreams (filesystem + GitHub + memory + โฆ) into one code-mode face. Bootstrap stays O(1) regardless of how many upstreams are federated; at 5 servers ร 30 tools = 150 tools, the Portal is 3.1% of direct multi-MCP and 19.8% of code-mode-per-server (examples/benchmarks/portal-tokens.mjs). OneCapabilityManifestspans every upstream โ the audit boundary platform-bound Portals (Cloudflare's announced version) cannot give you across heterogeneous providers. Seeexamples/mcp-portal/.AI SDK + Mastra + Claude Agent SDK + OpenAI Agents JS plugin packages (
@agentkit-js/aisdk,@agentkit-js/mastra-sandbox,@agentkit-js/claude-agent-sdk,@agentkit-js/openai-agents) โ drop agentkit's WASM kernels into Vercel AI SDK 4โ6 (sandboxedJsTool,codeModeTool), Mastra (agentkitMastraSandbox), Anthropic Claude Agent SDK (sandboxedJsClaudeTool,codeModeClaudeTool), or OpenAI Agents JS (sandboxedJsAgentTool,codeModeAgentTool) without an external sandbox provider.Observability โ
OtelBridgemapsAgentEventstreams to OTel-compatible spans; emitsgen_ai.*semantic convention attributes (Datadog/Honeycomb/Grafana GenAI view compatible) withsemconvMode: "both" | "stable" | "legacy"Durable runtime โ
KvCheckpointerwith four production backends:CloudflareKvBackend,DurableObjectKvBackend,RedisKvBackend(ioredis-style),RedisRestKvBackend(Upstash REST, edge-safe).CheckpointableRunsaves state after every step;await_human_inputpersistspendingHumanInputand exits the iterator so the worker can recycle while a human reviews.SSE Last-Event-ID resume โ
EventLogtags every event with a monotonic id, persists to the sameKvBackend, and replays only the missing tail when a client reconnects. The reference Cloudflare Worker honorsLast-Event-IDnatively;useAgentRun({ resume: { maxAttempts } })retries automatically.Stateless human-in-the-loop โ
resumeFromHuman(checkpointer, traceId, promptId, response)writes the human's reply into a paused snapshot. Because there is no in-memory state, the worker that pauses and the worker that resumes can be different processes (and different days). Seeexamples/durable-runtime/.React hooks โ
@agentkit-js/reactprovidesuseAgentRun()for streaming SSE agent events in Next.js / React appsMulti-model โ Anthropic (Claude) and OpenAI-compatible endpoints (Ollama, vLLM, llama.cpp)
MCP support โ
McpToolCollectionwraps any MCP server's tools as first-class agentkit toolsCloudflare Workers โ HTTP API entry point with KV session caching, ready to deploy with Wrangler
Quick Start
Code Agent
import { CodeAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const agent = new CodeAgent({
tools: [],
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
maxSteps: 10,
});
for await (const event of agent.run("What is 42 * 1337?")) {
if (event.event === "final_answer") console.log(event.data.answer);
}Tool-Calling Agent
import { ToolCallingAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
import { z } from "zod";
const searchTool = {
name: "search",
description: "Search the web",
inputSchema: z.object({ query: z.string() }),
outputSchema: z.string(),
readOnly: true,
idempotent: true,
forward: async ({ query }) => `Results for: ${query}`,
};
const agent = new ToolCallingAgent({
tools: [searchTool],
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
maxSteps: 5,
});
for await (const event of agent.run("Search for recent AI news")) {
if (event.event === "final_answer") console.log(event.data.answer);
}CLI
# Install globally
npm install -g @agentkit-js/cli
# Run a task
agentkit run "What is the square root of 144?"
# Stream all events as NDJSON
agentkit run "Summarise recent AI news" --stream | jq .
# Use a specific model
agentkit run "Write a haiku" --model claude-opus-4-8 --max-steps 5Quality Runners
Self-Consistency โ majority vote across N independent runs
import { SelfConsistencyRunner, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const runner = new SelfConsistencyRunner({
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
tools: [],
n: 5,
concurrency: 3,
earlyStop: true,
});
const answer = await runner.run("What is the capital of France?");Reflect-Refine โ critique loop until quality signal passes
import { ReflectRefineRunner, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const runner = new ReflectRefineRunner({
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
tools: [],
maxCycles: 3,
qualitySignal: (answer) => answer.length > 100,
});
const answer = await runner.run("Write a detailed analysis of...");Parallel Fork-Join โ diverse reasoning paths, synthesised answer
import { ParallelForkJoinRunner, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const runner = new ParallelForkJoinRunner({
branches: 3,
concurrency: 3,
aggregation: "summary",
branchPrompt: (i, msgs) => [
...msgs,
{ role: "user", content: `Analyse from perspective ${i + 1} of 3.` },
],
});
const result = await runner.run(
new AnthropicModel(AnthropicModels.SONNET_LATEST),
[{ role: "user", content: "What are the trade-offs of microservices?" }]
);
console.log(result.answer); // synthesised
console.log(result.branches); // individual pathsLong-history compaction
import { CodeAgent, AnthropicModel, AnthropicModels, MessageAssembler } from "@agentkit-js/core";
const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
const assembler = new MessageAssembler({ chunkSizeSteps: 8 });
const agent = new CodeAgent({
tools: [],
model,
maxSteps: 50,
assembler,
});
// Summarise old steps, keep context window in check
await agent.assembler.compact(model, 5);Custom Endpoints & Local Models
Both adapters accept an optional baseURL to point at any compatible endpoint โ local models, third-party proxies, or private deployments.
OpenAI-compatible (Ollama / vLLM / llama.cpp / any proxy)
import { OpenAIModel, OpenAIModels } from "@agentkit-js/core";
// Hosted OpenAI
const gpt4o = new OpenAIModel(OpenAIModels.GPT_4O);
// Local Ollama
const local = new OpenAIModel("mistral-7b", {
baseURL: "http://localhost:11434/v1",
apiKey: "ollama",
samplingParams: { temperature: 0.7, seed: 42 },
});Anthropic-compatible proxy or private deployment
import { AnthropicModel, AnthropicModels } from "@agentkit-js/core";
// Standard usage โ reads ANTHROPIC_API_KEY from environment
const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
// Third-party proxy or private endpoint
const proxied = new AnthropicModel(AnthropicModels.SONNET_LATEST, {
apiKey: "your-proxy-key",
baseURL: "https://your-proxy.example.com",
});Chinese model providers (first-class adapters)
Seven providers ship as dedicated packages with full thinking-mode, reasoning-field, and cache-strategy support:
// Doubao / Volcengine Ark (first-class thinking + effort tiers)
import { DoubaoModel, DoubaoModels } from "@agentkit-js/model-doubao";
const doubao = new DoubaoModel(DoubaoModels.LATEST, process.env.ARK_API_KEY);
for await (const e of doubao.generate(msgs, { thinking: { mode: "enabled", effort: "high" } })) { ... }
// DeepSeek V4 (thinking:{type} + effort, V4_FLASH available)
import { DeepSeekModel, DeepSeekModels } from "@agentkit-js/model-deepseek";
const ds = new DeepSeekModel(DeepSeekModels.V4_PRO, process.env.DEEPSEEK_API_KEY);
// Kimi K2.6 (reasoning field: delta.reasoning, thinking:{type} via extra_body)
import { MoonshotModel, KimiModels } from "@agentkit-js/model-moonshot";
const kimi = new MoonshotModel(KimiModels.LATEST, process.env.MOONSHOT_API_KEY);
// Qwen3 (enable_thinking + thinking_budget, intl region option)
import { QwenModel, QwenModels } from "@agentkit-js/model-qwen";
const qwen = new QwenModel(QwenModels.QWEN3_MAX, { region: "cn" });
// GLM-5 (Zhipu self-hosted, thinking:{type} via extra_body)
import { ZhipuModel, GLMModels } from "@agentkit-js/model-zhipu";
const glm = new ZhipuModel(GLMModels.GLM_5, process.env.ZHIPU_API_KEY);
// MiniMax M3 (reasoning_split=true โ reasoning_details; or <think> tag parsing)
import { MiniMaxModel, MiniMaxModels } from "@agentkit-js/model-minimax";
const mm = new MiniMaxModel(MiniMaxModels.M3, process.env.MINIMAX_API_KEY);Provider capability reference:
Provider | Package | Thinking switch | Reasoning field | Cache strategy | Multi-turn round-trip |
Doubao/Ark |
|
|
|
| tool-turns-only |
DeepSeek V4 |
|
|
|
| tool-turns-only |
Kimi K2.6 |
|
|
|
| tool-turns-only |
Qwen3 |
|
|
|
| never |
GLM-5 |
|
|
|
| never |
MiniMax M3 |
|
|
|
| never |
Note on multi-turn round-trip: DeepSeek/Doubao/Kimi require
reasoning_contentechoed back in assistant messages containingtool_use(not in text-only turns โ that causes a 400 error). The adapters implement this automatically viareasoningRoundTripPolicy: "tool-turns-only".
Deploy to Cloudflare Workers
cd packages/cloudflare-worker
cp wrangler.toml.example wrangler.toml # edit account_id and kv_namespaces
wrangler secret put ANTHROPIC_API_KEY
wrangler deployThe Worker exposes a POST /run endpoint. Session state is stored in KV for cost-efficient prompt caching across requests.
Packages
Package | Description |
| Agent runtime, kernels, models, tools, quality runners, evals, observability, checkpointing, observational memory (A1), skills + lifecycle hooks (A3), judge scorers (A4) |
| Time-travel debugger (A2) โ |
|
|
| Parser for |
| React components: MarkdownCard, D2Card, CardRenderer, ChatMessage |
| Composable prompt fragments + |
| Web search adapters: Tavily, Brave, Perplexity (LRU-cached, readOnly+idempotent) |
|
|
| Browser automation: Playwright session + CDP-bridge session, 5 tools (navigate/click/fill/screenshot/extract) |
|
|
| CPython-in-WASM (Pyodide) |
| QuickJS WASM kernel |
| True WASM sandbox via Javy + WASI (requires |
| Cloudflare Workers HTTP entry point |
| Doubao / Volcengine Ark adapter (thinking tiers, ark-context cache) |
| DeepSeek V4 adapter (thinking:{type}, V4_FLASH) |
| Moonshot / Kimi K2.6 adapter (per-version reasoning field) |
| Qwen3 adapter (enable_thinking, thinking_budget, intl region) |
| Zhipu GLM-5 adapter (thinking:{type} via extra_body) |
| MiniMax M2/M3 adapter (reasoning_split, <think> tag parsing) |
Production APIs
Retry / Resilience (C1)
All model adapters automatically retry 429 / 5xx / network errors with exponential backoff + jitter:
import { AnthropicModel } from "@agentkit-js/core";
const model = new AnthropicModel("claude-sonnet-4-6", {
apiKey: process.env.ANTHROPIC_API_KEY,
retry: { maxRetries: 3, baseDelayMs: 500, maxDelayMs: 30_000 },
});Evals (B1)
import { runEval, exactMatch, toolCallAccuracy } from "@agentkit-js/core";
const results = await runEval(dataset, async function* (task) {
yield* agent.run(task);
}, [exactMatch, toolCallAccuracy]);OpenTelemetry Bridge (C2)
import { OtelBridge, InMemorySpanExporter, withOtel } from "@agentkit-js/core";
const exporter = new InMemorySpanExporter(); // swap for OTLP in production
const bridge = new OtelBridge({ exporter });
for await (const ev of withOtel(agent.run(task), bridge)) {
console.log(ev);
}
bridge.flush();Durable runtime โ Checkpoints, SSE resume, HITL
Pick one KvBackend and use it for checkpoints, the SSE event log, and structured memory โ there is one canonical contract.
import {
CheckpointableRun,
EventLog,
KvCheckpointer,
resumeFromHuman,
applyHumanResponse,
restoreFromSnapshot,
} from "@agentkit-js/core";
// Pick a backend that matches your runtime.
import { CloudflareKvBackend } from "@agentkit-js/cloudflare-worker";
// Other options: DurableObjectKvBackend (CF), RedisKvBackend (Node/Bun),
// RedisRestKvBackend (Upstash, edge-safe), MapKvBackend (tests).
const kv = new CloudflareKvBackend(env.MY_KV);
const checkpointer = new KvCheckpointer(kv);
const log = new EventLog(kv); // SSE Last-Event-ID resume
const wrapper = new CheckpointableRun({ checkpointer }, agent.assembler);
// Stream + persist + tag every event with a monotonic id.
for await (const { eventId, event } of log.tap(
wrapper.run(agent.run(task), task, traceId),
traceId,
)) {
// emit `id: ${eventId}\nevent: ${event.event}\ndata: ${...}\n\n` over SSE
if (event.event === "await_human_input") {
// Snapshot is already persisted; the worker is free to exit.
return;
}
}Resume after a worker recycle (different process, possibly different machine):
const lastId = req.headers.get("Last-Event-ID");
for await (const { eventId, event } of log.replay(traceId, lastId)) { /* re-emit */ }
const startSeq = await log.nextSeq(traceId);
for await (const { eventId, event } of log.tap(agent.run(task, traceId), traceId, { startSeq })) { /* live tail */ }Resume after human approval (could be hours/days later):
// In the /resume HTTP handler โ stateless, returns immediately.
await resumeFromHuman(checkpointer, traceId, promptId, response);
// Later, when a worker picks up the trace:
const snap = await checkpointer.load(traceId);
restoreFromSnapshot(snap, agent.assembler);
applyHumanResponse(snap, agent.assembler); // injects user_message into history
// Then continue with `wrapper.run(agent.run(snap.task, traceId), ...)`.The reference Cloudflare Worker (@agentkit-js/cloudflare-worker) wires all of this for you โ bind AGENTKIT_EVENT_LOG and AGENTKIT_CHECKPOINTS in wrangler.toml and you get Last-Event-ID resume + a POST /resume endpoint out of the box. Full guide: docs/guides/durable-runtime.md.
React Hook (B2)
import { useAgentRun } from "@agentkit-js/react";
function ChatUI() {
const { messages, isRunning, run } = useAgentRun("/api/run");
return (
<>
{messages.map((m) => <div key={m.id}>{m.content}</div>)}
<button onClick={() => run({ task: "What is 2 + 2?" })} disabled={isRunning}>
Ask
</button>
</>
);
}Tool Deferred Loading (L1-1)
Exclude large MCP server tool schemas from the context prefix; load on-demand via Anthropic Tool Search. Reduces token usage by up to 85% on servers with many tools.
import { McpToolCollection, ToolCallingAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
// Option A: defer all tools from an MCP server with many tools.
const tools = await McpToolCollection.fromHttp("https://big-mcp-server.example.com");
tools.deferAll(); // marks all tools as deferLoading: true
// Option B: defer individual tools via the ToolDefinition field.
const myTool = {
name: "my_tool",
deferLoading: true, // excluded from system prefix
// ... other fields
};
const agent = new ToolCallingAgent({
tools: tools.list(),
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
});Tool Use Examples (L1-2)
Provide few-shot examples to improve parameter accuracy from ~72% to ~90%.
const searchTool = {
name: "search",
description: "Search the web for information",
inputSchema: z.object({ query: z.string(), maxResults: z.number().optional() }),
inputExamples: [
{ query: "latest AI research 2026", maxResults: 5 },
{ query: "TypeScript best practices" },
],
// ...
};Context Editing (L2-1)
Truncate old tool outputs reversibly to reduce context size without breaking conversation structure.
import { MessageAssembler, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
const assembler = new MessageAssembler({ chunkSizeSteps: 8 });
const agent = new ToolCallingAgent({ tools, model, assembler, maxSteps: 50 });
// After many steps, truncate old tool outputs that are taking too many tokens.
// Keeps the 3 most recent tool steps verbatim; truncates older ones.
const truncated = agent.assembler.editToolResults({ maxTokens: 4096, keepRecent: 3 });
console.log(`Truncated ${truncated} tool outputs`);Cross-Session Memory Tool (L2-2)
Give agents persistent memory that survives across separate run() calls.
import { createMemoryTool, MapKvBackend, ToolCallingAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
// Use MapKvBackend for in-process use, or KvCheckpointer's backend for persistence.
const memory = createMemoryTool({ backend: new MapKvBackend() });
const agent = new ToolCallingAgent({
tools: [memory, ...otherTools],
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
});
// Session 1: agent learns something
for await (const ev of agent.run("What's the capital of France? Remember it for later.")) { }
// Session 2: agent recalls it
for await (const ev of agent.run("What did you remember about France's capital?")) {
if (ev.event === "final_answer") console.log(ev.data.answer); // "Paris"
}Programmatic Tool Calling / Self-Hosted PTC (L3-1)
Execute model-generated orchestration scripts inside a kernel; only the final result enters the context window.
import { ProgrammaticOrchestrator, JsKernel, ToolRegistry } from "@agentkit-js/core";
const kernel = new JsKernel();
const registry = new ToolRegistry();
registry.register(searchTool);
registry.register(calcTool);
const orchestrator = new ProgrammaticOrchestrator(kernel, registry, {
extraCapabilities: ["tool:search", "tool:calc"],
});
// Model-generated script โ intermediate results never enter the LLM context.
const script = `
const results = callTool('search', { query: 'AI news 2026' });
const count = callTool('calc', { expr: results.length + ' items' });
count + ' found';
`;
const { finalOutput, toolCallCount } = await orchestrator.run(script);
console.log(finalOutput); // Only this enters the context window.
console.log(toolCallCount); // e.g. 2 โ intermediate results stayed in the kernel.Development
pnpm install
pnpm build
pnpm test
pnpm typecheck
# Reproduce every percentage in the "Differentiated" section above.
pnpm bench
# Cloudflare Worker local dev
cd packages/cloudflare-worker && wrangler devExamples
Example | What it shows |
| Minimal |
|
|
| RAG-style retrieval tool |
| Checkpoint + SSE resume + HITL across three simulated processes (no model needed) |
| Composite scorer over a small dataset |
| Reproducible verification of every README percentage |
| Production-style Worker deployment |
| OTel bridge with Jaeger backend |
| A1 โ measure compression ratio of |
| A2 โ synthetic event trace + |
| A3 โ three lazily-loaded skills + post-hook chain (redact + truncate) |
| A4 โ code-based vs LLM-judge scorer divergence on a synthetic trace |
Documentation
docs/guides/durable-runtime.md โ checkpoints, SSE Last-Event-ID resume, HITL
docs/kernels/comparison.md โ kernel selection decision tree
docs/guides/evals-cookbook.md โ eval design patterns (incl. A4 multi-criterion judges)
docs/guides/memory-patterns.md โ memory namespace + decay patterns
docs/guides/observational-memory.md โ A1 background-observer compression
docs/guides/devtools.md โ A2 time-travel debugger + fork-from-step
docs/guides/skills-and-hooks.md โ A3 progressive disclosure + post-tool hooks
Environment Variables
Variable | Purpose |
| Anthropic model access |
| OpenAI / compatible endpoint |
| CI/CD Worker deployment |
| CI/CD Worker deployment |
Acknowledgements
Inspired by Hugging Face's smolagents. agentkit-js is a ground-up TypeScript reimplementation โ not a port โ targeting async-first execution, WASM sandboxing, and edge deployment.
License
Apache 2.0
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/telleroutlook/agentkit-js'
If you have feedback or need assistance with the MCP directory API, please join our Discord server