agentkit-mcp-server
Allows deploying agents to Cloudflare Workers runtime.
Allows downloading models from Hugging Face for local execution.
Allows using Ollama as a local model endpoint for agent execution.
Allows using OpenAI's models for AI agent interactions.
Allows exporting telemetry data via OpenTelemetry for observability.
Provides Redis-based backend for checkpointing and state persistence.
Provides Upstash-based backend for checkpointing and state persistence.
agentkit-js
TypeScript agent runtime with WASM sandboxing, prompt-cache optimization, and parallel quality runners.
Build production-grade AI agents in TypeScript โ code-execution agents, tool-calling agents, or multi-path reasoning pipelines โ with built-in cost controls and Cloudflare Workers deployment.
# For Anthropic (Claude)
npm add @agentkit-js/core @anthropic-ai/sdk
# For OpenAI / compatible endpoints (Ollama, vLLM, etc.)
npm add @agentkit-js/core openai๐ Docs site: https://telleroutlook.github.io/agentkit-js/ ยท Getting started in 5 min: docs/guides/getting-started.md ยท Benchmarks: docs/benchmarks.md ยท Changelog: CHANGELOG.md ยท API stability: docs/strategy/api-stability.md ยท Strategy memo: docs/strategy/2026-06-competitiveness.md ยท Trust Page (D4): docs/strategy/trust.md ยท Enterprise security face: docs/strategy/security-face.md
๐ค Looking for a co-maintainer.
@agentkit-js/core@1.0.0is on the calendar for 2026-12-15. If you ship to the Vercel AI SDK / Mastra / Claude Agent SDK / OpenAI Agents JS / Cloudflare Agents / LangGraph.js communities and want npm-publish + merge rights, see CONTRIBUTING.md and GOVERNANCE.md. Release cadence ledger: docs/strategy/release-cadence-log.md. Sandbox-escape SLA drill log: docs/strategy/security-drill-log.md.
Comparison
There are several mature TypeScript agent frameworks. Here is an honest assessment of where agentkit-js fits.
Last verified: 2026-06-13. Each โ ๏ธ/โ cell links to its source on the column header's project. The "sandbox" rows have been re-framed (D2, 2026-06-13) so that having some sandbox is no longer the differentiator โ three competitors now ship one. The remaining axes โ isolation tier composability, cross-runtime neutrality, offline closure โ are what no other framework offers in one package.
agentkit-js | ||||||
npm downloads/month | ~57M | ~10M | ~3.8M | ~4M | ~3.2M | early-stage |
ToolCallingAgent | โ | โ | โ | โ | โ | โ |
Sandboxed code execution | โ (none in core) | โ (none in core) | โ SandboxAgent โ Unix-local / Docker / hosted | โ Workspace โ E2B / Daytona / Modal / Blaxel / Railway | โ @cloudflare/sandbox container | โ kernels |
Isolation tiers โ composable in one process | n/a | n/a | โ ๏ธ 1 tier (process / container, picked at run time per client) | โ ๏ธ 1 tier per provider (you swap providers, not tiers) | โ ๏ธ 1 tier (container-per-DO, vendor-bound) | โ
3 tiers, swap with one line โ |
Cross-runtime neutrality | โ ๏ธ Node + edge runtime patches | โ Node + edge | โ ๏ธ Node + Docker hosts (sandbox path needs a host process) | โ ๏ธ provider-specific (each provider has its own runtime constraint) | โ Cloudflare-only (sandbox is a CF Container) | โ same kernel API on Node, any edge runtime, browser, and offline laptop |
Offline / air-gapped closure | โ requires provider HTTP | โ requires provider HTTP | โ Sandbox + model both need network | โ all sandbox providers are cloud SaaS | โ vendor-bound | โ
|
Python execution (edge-safe, no container) | โ | โ | โ (containers required) | โ (containers required) | โ | โ Pyodide-in-WASM, runs inside a Worker isolate |
Anthropic prompt-cache management | โ ๏ธ pass-through | โ ๏ธ pass-through | โ ๏ธ via adapter | โ ๏ธ pass-through | โ | โ auto breakpoints + 1h TTL |
Self-consistency / reflect-refine runners | โ | โ manual | โ | โ | โ | โ built-in |
Budget forcing | โ | โ | โ | โ | โ | โ |
DAG tool scheduler + speculative exec | โ | โ ๏ธ graph-level | โ | โ ๏ธ workflow graph | โ | โ |
Long-history compaction | โ ๏ธ syntactic prune | โ manual | โ | โ ๏ธ observational memory | โ | โ model-summarised |
MCP support | โ | โ | โ | โ | โ | โ |
Cloudflare Workers | โ ๏ธ partial | โ | โ ๏ธ experimental | โ ๏ธ alpha | โ native | โ |
UI hooks (React/Next.js) | โ best-in-class | โ | โ | โ ๏ธ via AI SDK | โ ๏ธ | โ useAgentRun |
Provider integrations | 40+ | 300+ | OpenAI-primary | 40+ | CF Workers AI | Anthropic ยท OpenAI ยท Doubao ยท DeepSeek ยท Kimi ยท Qwen ยท GLM ยท MiniMax |
Evals framework | โ | โ ๏ธ LangSmith | โ | โ 12+ scorers | โ | โ 16 scorers + 2 multi-criterion judges |
Observability (OTel) | โ ๏ธ LangSmith | โ ๏ธ LangSmith | โ | โ | โ | โ OtelBridge + GenAI semconv |
Retry / resilience | โ | โ | โ | โ | โ | โ RetryPolicy |
Durable workflows / checkpointing | โ DurableAgent (AI SDK 6) | โ LangGraph | โ (Assistants API retiring 2026-08-26) | โ ๏ธ partial | โ Durable Objects | โ Checkpointer + 4 backends (CF KV / DO / Redis / Upstash) |
SSE Last-Event-ID resume | โ ๏ธ via DurableAgent | โ runtime | โ | โ | โ | โ EventLog primitive + worker-native |
HITL persisted suspend/resume | โ | โ | โ | โ ๏ธ partial | โ ๏ธ via DO | โ
stateless |
Embedded local LLM (in-process, offline) | โ ๏ธ via Ollama HTTP | โ ๏ธ via Ollama HTTP | โ | โ ๏ธ via Ollama HTTP | โ | โ
|
Where competitors are stronger
Vercel AI SDK โ If you're building a chat UI with Next.js, use this. The React hooks (
useChat,useAgent),DurableAgentfor stateful/resumable workflows (AI SDK 6), native MCP support, and DevTools panel are all best-in-class. 57M monthly downloads.LangChain/LangGraph.js โ If you need 300+ integrations (vector stores, document loaders, obscure providers) or graph-based durable workflows with checkpointing and human-in-the-loop, LangGraph is battle-tested at LinkedIn, Uber, and GitLab scale.
Mastra โ Best eval framework (12+ built-in scorers including trajectory and tool accuracy). Strong developer onboarding. Their "Observational memory" pattern was first-mover; agentkit-js now ships an equivalent (
ObservationalMemory) plus extra prompt-cache-aware compression โ see docs/guides/observational-memory.md.Cloudflare Agents SDK โ If you're building on Cloudflare specifically, Durable Objects give you stateful agents with persistent scheduling that nothing else matches natively.
OpenAI Agents JS โ If your stack is OpenAI-only and you want first-party support, the cleanest path. The 2026-04 release added
SandboxAgentwith Unix-local, Docker, and hosted clients; for OS-level isolation backed by OpenAI itself, this is the path of least resistance.
Where agentkit-js is differentiated
Three isolation tiers under one swappable interface (D2, 2026-06-13). OpenAI Agents JS now ships
SandboxAgent(Unix-local / Docker / hosted) and Mastra ships Workspace providers (E2B / Daytona / Modal / Blaxel / Railway). The differentiator is no longer "has a sandbox" โ it's that agentkit-js exposes three tiers (VmKernelin-process,QuickJSKernel/PyodideKernel/WasmtimeKerneltrue WASM,RemoteSandboxKernelmicroVM) under oneKernelinterface, swap them with one line at call time, and apply oneCapabilityManifest(network/fs/env/cpu/memory) across every tier. Competitors give you one tier wired to one provider. See docs/kernels/comparison.md for the decision tree.Cross-runtime neutrality. Cloudflare's sandbox is fast on Cloudflare. Mastra's providers are SaaS. agentkit-js kernels run on Node, on any edge runtime, in a browser tab, and on a laptop with the network unplugged โ same
KernelAPI, same security manifest. This is the structural advantage no platform-bound competitor can match.Offline / air-gapped closure.
@agentkit-js/model-local(node-llama-cpp + grammar-constrained tool calls + multi-mirror downloads HF / hf-mirror / ModelScope) plus a WASM kernel = full agent loop with zero outbound traffic. For compliance-bound and air-gapped deployments, no other framework gives you this without writing the integration yourself.Durable runtime โ Same
KvBackendpowers checkpoints, the SSE event log, and structured memory. Four production backends ship out of the box (Cloudflare KV / Durable Objects / Redis / Upstash REST). A pausedawait_human_inputsurvives worker recycle for hours/days;POST /resumeis stateless. See docs/guides/durable-runtime.md.Quality runners โ Self-consistency with answer extraction (boxed / last-line / custom), reflect-refine, budget forcing ("Wait" prefill), and parallel fork-join are not shipped as first-class APIs by any competitor.
Anthropic prompt-cache optimization โ Framework actively manages
cache_controlbreakpoint placement across multi-turn history, supports the 1-hour extended TTL (ttl:"1h"), and reports per-TTL cache usage. Competitors pass through or validate limits but do not optimise placement.Speculative tool execution โ Read-only, idempotent tools are pre-executed ahead of write barriers within a DAG step. The scheduler is awakened by
$<callId>dependency references in the system prompt, enabling true parallel + ordered hybrid scheduling. No competitor implements this.GenAI semantic conventions โ
OtelBridgeemits standardgen_ai.*attributes (Datadog / Honeycomb / Grafana GenAI view compatible) alongside legacy names, switchable viasemconvMode.Observational Memory + cache-stable prefix (A1) โ Background "observer" model continuously compresses history into ranked observation paragraphs. The compressed prefix is byte-stable so Anthropic prompt cache hits stay hot across observations โ Mastra's reference work has no equivalent. ~22% of baseline tokens on a 50-turn synthetic trace; see
examples/benchmarks/observational-memory.mjs.Time-travel debugger (A2) โ
@agentkit-js/devtoolsexposes the existingEventLog+Checkpointerdata through a navigable step timeline + "fork from any step" UI. LangGraph Studio's headline feature, shipped as a tiny opt-in package (logic core ~250 LOC, React UI optional).Skills + lifecycle hooks (A3) โ
SkillRegistryfor progressive instruction/tool disclosure (Claude Agent SDK / CrewAI v1.12 convention).ToolPostHookchain (redact, truncate, audit) sits beside the existingToolGuardrailโ pre/post symmetry without confusing block vs transform semantics.Multi-criterion LLM judges (A4) โ
judgeScorerextendsllmJudgewith weighted criterion-level scoring + configurable scale. Two built-in judges (trajectoryQualityJudge,answerCompletenessJudge) work with any cheap Model adapter so judges run on Haiku/Doubao while the agent stays on Sonnet/Opus.Reproducible benchmarks โ Every percentage in this README (
โ37%,72โ90%,โ85%,โ84%) is verified by an offline benchmark inexamples/benchmarks/. Runpnpm benchto reproduce. CI fails the PR if any number drifts outside its tolerance.
Honest caveats
agentkit-js is early-stage. The differentiating features (code execution kernels, durable runtime, quality runners, speculative scheduling) are technically novel but also niche โ most teams pick a framework based on ecosystem breadth and documentation volume, where the mature options above win. Choose agentkit-js when sandboxed code execution, durable agent runs, prompt-cache cost control, or output quality runners are first-order concerns.
Verified status
Number | Verified by | |
Tests passing (all packages) | 1341 |
|
README percentages reproducible | 8 / 8 |
|
Cross-process kill-and-resume (A1 DoD โ ) | โ Redis + โ Cloudflare KV + โ Durable Object |
|
SSE Last-Event-ID gap-free replay (A2 DoD โ ) | โ |
|
Stateless HITL resume (A3 DoD โ ) | โ |
|
Observational memory โฅ4ร compression (A1) | โ 22% of baseline |
|
Code-mode bootstrap O(1) vs direct-MCP O(N) (S1/A1) | โ 13.6% of direct at N=30 tools |
|
MCP Portal: O(1) bootstrap across M upstreams (D1) | โ 3.1% of direct multi-MCP at M=5รN=30 |
|
Step-fork bundle (A2 DevTools) | โ 9 unit + 8 jsdom render tests |
|
Skill lazy-load + post-hook chain (A3) | โ |
|
Judge scorer weighted breakdown (A4) | โ |
|
Paired-statistics parity vs scipy (evals-runner) | โ 31 reference values to ยฑ1e-7 |
|
Local Studio HTTP overview (A4 of 2026-06-12 plan) | โ |
|
Framework-agnostic GenAI semconv ingest (D5) | โ 9 adapter tests |
|
Multi-model evaluation across 17ร size range | โ 5 models, 2026-06-12 |
|
Related MCP server: Code Executor MCP Server
Features
Two agent modes โ
CodeAgent(writes + executes code) andToolCallingAgent(native tool_use)Code execution โ three isolation tiers โ
VmKernel(node:vm, in-process dev/test),QuickJSKernel/PyodideKernel/WasmtimeKernel(true WASM, language-level isolation, edge-safe),RemoteSandboxKernel(E2B / Cloudflare Sandbox microVM, full process isolation). Mix tiers viafactory.createKernel().Programmatic Tool Calling (PTC) โ
ProgrammaticOrchestratorexecutes model-generated scripts inside any kernel;callTool()calls registered tools without surfacing intermediate results to the context (โ37% tokens). Self-hosted alternative to Anthropic's managed PTC container.Prompt-cache optimization โ
MessageAssemblerbuilds cache-stable prefixes; Anthropiccache_controlbreakpoints respect the 4-breakpoint limit, per-chunk token thresholds, and the 1-hour extended TTL (ttl:"1h"); per-TTL usage metering (5m vs 1h); OpenAI automatic prefix cache hit trackingTool deferred loading โ
deferLoading: trueon any tool (orMcpToolCollection.deferAll()) excludes its schema from the system prefix and loads on-demand via Anthropic Tool Search (โ85% tokens for large MCP server collections)Tool Use Examples โ
inputExampleson any tool maps to Anthropic'sinput_exampleswire field (72%โ90% parameter accuracy)Context editing โ
assembler.editToolResults({ maxTokens, keepRecent })truncates old tool outputs reversibly without breaking conversation structure (+29% task performance, โ84% tokens on web search)Cross-session Memory Tool โ
createMemoryTool({ backend })gives agents persistent read/write/list/delete memory backed by anyKvBackend(Cloudflare KV, Redis, in-memory Map)Quality runners โ majority-vote self-consistency with answer extraction (boxed / last-line / custom hook), critique-refine cycles, "Wait" prefill budget forcing, parallel fork-join with synthesis
DAG scheduling โ independent tool calls execute concurrently via
Scheduler; read-only tools speculatively pre-execute ahead of write barriers;$<callId>dependency syntax in system prompt enables true data-dependency ordering; wired intoToolCallingAgentby defaultLong-history compaction โ
agent.assembler.compact(model, keepRecentSteps)summarises old steps; inject a customMessageAssemblerviaassembleroptionProduction resilience โ automatic exponential backoff + jitter retry for 429 / 5xx / network errors on all model adapters; configurable via
RetryPolicyEvals framework โ
runEval()with 16 built-in scorers covering correctness (exactMatch,toolCallAccuracy,trajectoryValidity,finalAnswerLength,guardrailCompliance), faithfulness, relevance, recovery, efficiency, constraints, plus two multi-criterionJudgeScorerjudges (trajectoryQualityJudge,answerCompletenessJudge)Evaluation harness (
@agentkit-js/evals-runner) โrunEvaluation()plusagentkit evals runCLI: multi-model ร multi-suite ร multi-seed Pareto reports over (accuracy, cost, p95 wall). Six reference suites cover the gaps single-task benchmarks miss (long-context recall, multi-turn memory, agent trajectory, latency-under-budget, cost-per-correct, tool-sequence). Built-in paired statistics (McNemar exact / Wilson CI / paired bootstrap / G1 gate) match scipy reference values to ยฑ1e-7. All synthetic fixtures โ no overlap with public training corpora.Code-mode MCP server (
@agentkit-js/mcp-server) โcreateCodeModeServer()collapses N downstream tools into adocs_search+execute_codetwo-tool MCP surface. At 30 tools the bootstrap-token cost drops to 13.6% of direct MCP (codemode-lite reported 53%); pairs with any agentkit kernel for unified security policy.MCP Portal โ federate N upstream servers behind one neutral two-tool surface (D1, 2026-06-13) โ
createPortalServer()wraps multipleToolRegistry/ MCP upstreams (filesystem + GitHub + memory + โฆ) into one code-mode face. Bootstrap stays O(1) regardless of how many upstreams are federated; at 5 servers ร 30 tools = 150 tools, the Portal is 3.1% of direct multi-MCP and 19.8% of code-mode-per-server (examples/benchmarks/portal-tokens.mjs). OneCapabilityManifestspans every upstream โ the audit boundary platform-bound Portals (Cloudflare's announced version) cannot give you across heterogeneous providers. Seeexamples/mcp-portal/.AI SDK + Mastra + Claude Agent SDK + OpenAI Agents JS plugin packages (
@agentkit-js/aisdk,@agentkit-js/mastra-sandbox,@agentkit-js/claude-agent-sdk,@agentkit-js/openai-agents) โ drop agentkit's WASM kernels into Vercel AI SDK 4โ6 (sandboxedJsTool,codeModeTool), Mastra (agentkitMastraSandbox), Anthropic Claude Agent SDK (sandboxedJsClaudeTool,codeModeClaudeTool), or OpenAI Agents JS (sandboxedJsAgentTool,codeModeAgentTool) without an external sandbox provider.Observability โ
OtelBridgemapsAgentEventstreams to OTel-compatible spans; emitsgen_ai.*semantic convention attributes (Datadog/Honeycomb/Grafana GenAI view compatible) withsemconvMode: "both" | "stable" | "legacy"Durable runtime โ
KvCheckpointerwith four production backends:CloudflareKvBackend,DurableObjectKvBackend,RedisKvBackend(ioredis-style),RedisRestKvBackend(Upstash REST, edge-safe).CheckpointableRunsaves state after every step;await_human_inputpersistspendingHumanInputand exits the iterator so the worker can recycle while a human reviews.SSE Last-Event-ID resume โ
EventLogtags every event with a monotonic id, persists to the sameKvBackend, and replays only the missing tail when a client reconnects. The reference Cloudflare Worker honorsLast-Event-IDnatively;useAgentRun({ resume: { maxAttempts } })retries automatically.Stateless human-in-the-loop โ
resumeFromHuman(checkpointer, traceId, promptId, response)writes the human's reply into a paused snapshot. Because there is no in-memory state, the worker that pauses and the worker that resumes can be different processes (and different days). Seeexamples/durable-runtime/.React hooks โ
@agentkit-js/reactprovidesuseAgentRun()for streaming SSE agent events in Next.js / React appsMulti-model โ Anthropic (Claude) and OpenAI-compatible endpoints (Ollama, vLLM, llama.cpp)
MCP support โ
McpToolCollectionwraps any MCP server's tools as first-class agentkit toolsCloudflare Workers โ HTTP API entry point with KV session caching, ready to deploy with Wrangler
Quick Start
Code Agent
import { CodeAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const agent = new CodeAgent({
tools: [],
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
maxSteps: 10,
});
for await (const event of agent.run("What is 42 * 1337?")) {
if (event.event === "final_answer") console.log(event.data.answer);
}Tool-Calling Agent
import { ToolCallingAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
import { z } from "zod";
const searchTool = {
name: "search",
description: "Search the web",
inputSchema: z.object({ query: z.string() }),
outputSchema: z.string(),
readOnly: true,
idempotent: true,
forward: async ({ query }) => `Results for: ${query}`,
};
const agent = new ToolCallingAgent({
tools: [searchTool],
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
maxSteps: 5,
});
for await (const event of agent.run("Search for recent AI news")) {
if (event.event === "final_answer") console.log(event.data.answer);
}CLI
# Install globally
npm install -g @agentkit-js/cli
# Run a task
agentkit run "What is the square root of 144?"
# Stream all events as NDJSON
agentkit run "Summarise recent AI news" --stream | jq .
# Use a specific model
agentkit run "Write a haiku" --model claude-opus-4-8 --max-steps 5Quality Runners
Self-Consistency โ majority vote across N independent runs
import { SelfConsistencyRunner, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const runner = new SelfConsistencyRunner({
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
tools: [],
n: 5,
concurrency: 3,
earlyStop: true,
});
const answer = await runner.run("What is the capital of France?");Reflect-Refine โ critique loop until quality signal passes
import { ReflectRefineRunner, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const runner = new ReflectRefineRunner({
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
tools: [],
maxCycles: 3,
qualitySignal: (answer) => answer.length > 100,
});
const answer = await runner.run("Write a detailed analysis of...");Parallel Fork-Join โ diverse reasoning paths, synthesised answer
import { ParallelForkJoinRunner, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const runner = new ParallelForkJoinRunner({
branches: 3,
concurrency: 3,
aggregation: "summary",
branchPrompt: (i, msgs) => [
...msgs,
{ role: "user", content: `Analyse from perspective ${i + 1} of 3.` },
],
});
const result = await runner.run(
new AnthropicModel(AnthropicModels.SONNET_LATEST),
[{ role: "user", content: "What are the trade-offs of microservices?" }]
);
console.log(result.answer); // synthesised
console.log(result.branches); // individual pathsLong-history compaction
import { CodeAgent, AnthropicModel, AnthropicModels, MessageAssembler } from "@agentkit-js/core";
const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
const assembler = new MessageAssembler({ chunkSizeSteps: 8 });
const agent = new CodeAgent({
tools: [],
model,
maxSteps: 50,
assembler,
});
// Summarise old steps, keep context window in check
await agent.assembler.compact(model, 5);Custom Endpoints & Local Models
Both adapters accept an optional baseURL to point at any compatible endpoint โ local models, third-party proxies, or private deployments.
OpenAI-compatible (Ollama / vLLM / llama.cpp / any proxy)
import { OpenAIModel, OpenAIModels } from "@agentkit-js/core";
// Hosted OpenAI
const gpt4o = new OpenAIModel(OpenAIModels.GPT_4O);
// Local Ollama
const local = new OpenAIModel("mistral-7b", {
baseURL: "http://localhost:11434/v1",
apiKey: "ollama",
samplingParams: { temperature: 0.7, seed: 42 },
});Anthropic-compatible proxy or private deployment
import { AnthropicModel, AnthropicModels } from "@agentkit-js/core";
// Standard usage โ reads ANTHROPIC_API_KEY from environment
const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
// Third-party proxy or private endpoint
const proxied = new AnthropicModel(AnthropicModels.SONNET_LATEST, {
apiKey: "your-proxy-key",
baseURL: "https://your-proxy.example.com",
});Chinese model providers (first-class adapters)
Seven providers ship as dedicated packages with full thinking-mode, reasoning-field, and cache-strategy support:
// Doubao / Volcengine Ark (first-class thinking + effort tiers)
import { DoubaoModel, DoubaoModels } from "@agentkit-js/model-doubao";
const doubao = new DoubaoModel(DoubaoModels.LATEST, process.env.ARK_API_KEY);
for await (const e of doubao.generate(msgs, { thinking: { mode: "enabled", effort: "high" } })) { ... }
// DeepSeek V4 (thinking:{type} + effort, V4_FLASH available)
import { DeepSeekModel, DeepSeekModels } from "@agentkit-js/model-deepseek";
const ds = new DeepSeekModel(DeepSeekModels.V4_PRO, process.env.DEEPSEEK_API_KEY);
// Kimi K2.6 (reasoning field: delta.reasoning, thinking:{type} via extra_body)
import { MoonshotModel, KimiModels } from "@agentkit-js/model-moonshot";
const kimi = new MoonshotModel(KimiModels.LATEST, process.env.MOONSHOT_API_KEY);
// Qwen3 (enable_thinking + thinking_budget, intl region option)
import { QwenModel, QwenModels } from "@agentkit-js/model-qwen";
const qwen = new QwenModel(QwenModels.QWEN3_MAX, { region: "cn" });
// GLM-5 (Zhipu self-hosted, thinking:{type} via extra_body)
import { ZhipuModel, GLMModels } from "@agentkit-js/model-zhipu";
const glm = new ZhipuModel(GLMModels.GLM_5, process.env.ZHIPU_API_KEY);
// MiniMax M3 (reasoning_split=true โ reasoning_details; or <think> tag parsing)
import { MiniMaxModel, MiniMaxModels } from "@agentkit-js/model-minimax";
const mm = new MiniMaxModel(MiniMaxModels.M3, process.env.MINIMAX_API_KEY);Provider capability reference:
Provider | Package | Thinking switch | Reasoning field | Cache strategy | Multi-turn round-trip |
Doubao/Ark |
|
|
|
| tool-turns-only |
DeepSeek V4 |
|
|
|
| tool-turns-only |
Kimi K2.6 |
|
|
|
| tool-turns-only |
Qwen3 |
|
|
|
| never |
GLM-5 |
|
|
|
| never |
MiniMax M3 |
|
|
|
| never |
Note on multi-turn round-trip: DeepSeek/Doubao/Kimi require
reasoning_contentechoed back in assistant messages containingtool_use(not in text-only turns โ that causes a 400 error). The adapters implement this automatically viareasoningRoundTripPolicy: "tool-turns-only".
Deploy to Cloudflare Workers
cd packages/cloudflare-worker
cp wrangler.toml.example wrangler.toml # edit account_id and kv_namespaces
wrangler secret put ANTHROPIC_API_KEY
wrangler deployThe Worker exposes a POST /run endpoint. Session state is stored in KV for cost-efficient prompt caching across requests.
Packages
Package | Description |
| Agent runtime, kernels, models, tools, quality runners, evals, observability, checkpointing, observational memory (A1), skills + lifecycle hooks (A3), judge scorers (A4) |
| Time-travel debugger (A2) โ |
|
|
| Parser for |
| React components: MarkdownCard, D2Card, CardRenderer, ChatMessage |
| Composable prompt fragments + |
| Web search adapters: Tavily, Brave, Perplexity (LRU-cached, readOnly+idempotent) |
|
|
| Browser automation: Playwright session + CDP-bridge session, 5 tools (navigate/click/fill/screenshot/extract) |
|
|
| CPython-in-WASM (Pyodide) |
| QuickJS WASM kernel |
| True WASM sandbox via Javy + WASI (requires |
| Cloudflare Workers HTTP entry point |
| Doubao / Volcengine Ark adapter (thinking tiers, ark-context cache) |
| DeepSeek V4 adapter (thinking:{type}, V4_FLASH) |
| Moonshot / Kimi K2.6 adapter (per-version reasoning field) |
| Qwen3 adapter (enable_thinking, thinking_budget, intl region) |
| Zhipu GLM-5 adapter (thinking:{type} via extra_body) |
| MiniMax M2/M3 adapter (reasoning_split, <think> tag parsing) |
Production APIs
Retry / Resilience (C1)
All model adapters automatically retry 429 / 5xx / network errors with exponential backoff + jitter:
import { AnthropicModel } from "@agentkit-js/core";
const model = new AnthropicModel("claude-sonnet-4-6", {
apiKey: process.env.ANTHROPIC_API_KEY,
retry: { maxRetries: 3, baseDelayMs: 500, maxDelayMs: 30_000 },
});Evals (B1)
import { runEval, exactMatch, toolCallAccuracy } from "@agentkit-js/core";
const results = await runEval(dataset, async function* (task) {
yield* agent.run(task);
}, [exactMatch, toolCallAccuracy]);OpenTelemetry Bridge (C2)
import { OtelBridge, InMemorySpanExporter, withOtel } from "@agentkit-js/core";
const exporter = new InMemorySpanExporter(); // swap for OTLP in production
const bridge = new OtelBridge({ exporter });
for await (const ev of withOtel(agent.run(task), bridge)) {
console.log(ev);
}
bridge.flush();Durable runtime โ Checkpoints, SSE resume, HITL
Pick one KvBackend and use it for checkpoints, the SSE event log, and structured memory โ there is one canonical contract.
import {
CheckpointableRun,
EventLog,
KvCheckpointer,
resumeFromHuman,
applyHumanResponse,
restoreFromSnapshot,
} from "@agentkit-js/core";
// Pick a backend that matches your runtime.
import { CloudflareKvBackend } from "@agentkit-js/cloudflare-worker";
// Other options: DurableObjectKvBackend (CF), RedisKvBackend (Node/Bun),
// RedisRestKvBackend (Upstash, edge-safe), MapKvBackend (tests).
const kv = new CloudflareKvBackend(env.MY_KV);
const checkpointer = new KvCheckpointer(kv);
const log = new EventLog(kv); // SSE Last-Event-ID resume
const wrapper = new CheckpointableRun({ checkpointer }, agent.assembler);
// Stream + persist + tag every event with a monotonic id.
for await (const { eventId, event } of log.tap(
wrapper.run(agent.run(task), task, traceId),
traceId,
)) {
// emit `id: ${eventId}\nevent: ${event.event}\ndata: ${...}\n\n` over SSE
if (event.event === "await_human_input") {
// Snapshot is already persisted; the worker is free to exit.
return;
}
}Resume after a worker recycle (different process, possibly different machine):
const lastId = req.headers.get("Last-Event-ID");
for await (const { eventId, event } of log.replay(traceId, lastId)) { /* re-emit */ }
const startSeq = await log.nextSeq(traceId);
for await (const { eventId, event } of log.tap(agent.run(task, traceId), traceId, { startSeq })) { /* live tail */ }Resume after human approval (could be hours/days later):
// In the /resume HTTP handler โ stateless, returns immediately.
await resumeFromHuman(checkpointer, traceId, promptId, response);
// Later, when a worker picks up the trace:
const snap = await checkpointer.load(traceId);
restoreFromSnapshot(snap, agent.assembler);
applyHumanResponse(snap, agent.assembler); // injects user_message into history
// Then continue with `wrapper.run(agent.run(snap.task, traceId), ...)`.The reference Cloudflare Worker (@agentkit-js/cloudflare-worker) wires all of this for you โ bind AGENTKIT_EVENT_LOG and AGENTKIT_CHECKPOINTS in wrangler.toml and you get Last-Event-ID resume + a POST /resume endpoint out of the box. Full guide: docs/guides/durable-runtime.md.
React Hook (B2)
import { useAgentRun } from "@agentkit-js/react";
function ChatUI() {
const { messages, isRunning, run } = useAgentRun("/api/run");
return (
<>
{messages.map((m) => <div key={m.id}>{m.content}</div>)}
<button onClick={() => run({ task: "What is 2 + 2?" })} disabled={isRunning}>
Ask
</button>
</>
);
}Tool Deferred Loading (L1-1)
Exclude large MCP server tool schemas from the context prefix; load on-demand via Anthropic Tool Search. Reduces token usage by up to 85% on servers with many tools.
import { McpToolCollection, ToolCallingAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
// Option A: defer all tools from an MCP server with many tools.
const tools = await McpToolCollection.fromHttp("https://big-mcp-server.example.com");
tools.deferAll(); // marks all tools as deferLoading: true
// Option B: defer individual tools via the ToolDefinition field.
const myTool = {
name: "my_tool",
deferLoading: true, // excluded from system prefix
// ... other fields
};
const agent = new ToolCallingAgent({
tools: tools.list(),
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
});Tool Use Examples (L1-2)
Provide few-shot examples to improve parameter accuracy from ~72% to ~90%.
const searchTool = {
name: "search",
description: "Search the web for information",
inputSchema: z.object({ query: z.string(), maxResults: z.number().optional() }),
inputExamples: [
{ query: "latest AI research 2026", maxResults: 5 },
{ query: "TypeScript best practices" },
],
// ...
};Context Editing (L2-1)
Truncate old tool outputs reversibly to reduce context size without breaking conversation structure.
import { MessageAssembler, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
const assembler = new MessageAssembler({ chunkSizeSteps: 8 });
const agent = new ToolCallingAgent({ tools, model, assembler, maxSteps: 50 });
// After many steps, truncate old tool outputs that are taking too many tokens.
// Keeps the 3 most recent tool steps verbatim; truncates older ones.
const truncated = agent.assembler.editToolResults({ maxTokens: 4096, keepRecent: 3 });
console.log(`Truncated ${truncated} tool outputs`);Cross-Session Memory Tool (L2-2)
Give agents persistent memory that survives across separate run() calls.
import { createMemoryTool, MapKvBackend, ToolCallingAgent, AnthropicModel, AnthropicModels } from "@agentkit-js/core";
// Use MapKvBackend for in-process use, or KvCheckpointer's backend for persistence.
const memory = createMemoryTool({ backend: new MapKvBackend() });
const agent = new ToolCallingAgent({
tools: [memory, ...otherTools],
model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
});
// Session 1: agent learns something
for await (const ev of agent.run("What's the capital of France? Remember it for later.")) { }
// Session 2: agent recalls it
for await (const ev of agent.run("What did you remember about France's capital?")) {
if (ev.event === "final_answer") console.log(ev.data.answer); // "Paris"
}Programmatic Tool Calling / Self-Hosted PTC (L3-1)
Execute model-generated orchestration scripts inside a kernel; only the final result enters the context window.
import { ProgrammaticOrchestrator, JsKernel, ToolRegistry } from "@agentkit-js/core";
const kernel = new JsKernel();
const registry = new ToolRegistry();
registry.register(searchTool);
registry.register(calcTool);
const orchestrator = new ProgrammaticOrchestrator(kernel, registry, {
extraCapabilities: ["tool:search", "tool:calc"],
});
// Model-generated script โ intermediate results never enter the LLM context.
const script = `
const results = callTool('search', { query: 'AI news 2026' });
const count = callTool('calc', { expr: results.length + ' items' });
count + ' found';
`;
const { finalOutput, toolCallCount } = await orchestrator.run(script);
console.log(finalOutput); // Only this enters the context window.
console.log(toolCallCount); // e.g. 2 โ intermediate results stayed in the kernel.Development
pnpm install
pnpm build
pnpm test
pnpm typecheck
# Reproduce every percentage in the "Differentiated" section above.
pnpm bench
# Cloudflare Worker local dev
cd packages/cloudflare-worker && wrangler devExamples
Example | What it shows |
| Minimal |
|
|
| RAG-style retrieval tool |
| Checkpoint + SSE resume + HITL across three simulated processes (no model needed) |
| Composite scorer over a small dataset |
| Reproducible verification of every README percentage |
| Production-style Worker deployment |
| OTel bridge with Jaeger backend |
| A1 โ measure compression ratio of |
| A2 โ synthetic event trace + |
| A3 โ three lazily-loaded skills + post-hook chain (redact + truncate) |
| A4 โ code-based vs LLM-judge scorer divergence on a synthetic trace |
Documentation
docs/guides/durable-runtime.md โ checkpoints, SSE Last-Event-ID resume, HITL
docs/kernels/comparison.md โ kernel selection decision tree
docs/guides/evals-cookbook.md โ eval design patterns (incl. A4 multi-criterion judges)
docs/guides/memory-patterns.md โ memory namespace + decay patterns
docs/guides/observational-memory.md โ A1 background-observer compression
docs/guides/devtools.md โ A2 time-travel debugger + fork-from-step
docs/guides/skills-and-hooks.md โ A3 progressive disclosure + post-tool hooks
Environment Variables
Variable | Purpose |
| Anthropic model access |
| OpenAI / compatible endpoint |
| CI/CD Worker deployment |
| CI/CD Worker deployment |
Acknowledgements
Inspired by Hugging Face's smolagents. agentkit-js is a ground-up TypeScript reimplementation โ not a port โ targeting async-first execution, WASM sandboxing, and edge deployment.
License
Apache 2.0
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/telleroutlook/agentkit-js'
If you have feedback or need assistance with the MCP directory API, please join our Discord server