mcp-job-queue
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-job-queuesubmit a render job for video.mp4"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-job-queue
A production MCP server + worker daemon for long-running jobs: durable SQLite queue, isolated child-process execution, per-job timeouts, retries with backoff, and backpressure.
LLM agents are great at deciding to do work and terrible at holding it. The moment a tool call kicks off something slow — render a video, transcode audio, crawl a site, call a flaky API — an in-process tool blocks the conversation, and if the process dies the work vanishes with no record. mcp-job-queue is the durable backbone for that: agents submit_job and get an id back instantly; a separate worker daemon runs each job in its own OS process under a hard timeout, retries transient failures, bounds concurrency, and survives restarts. It's the difference between "the agent ran a script" and "the agent dispatched a job to a system that will actually finish it."
Features
Durable queue — jobs live in SQLite (WAL via the built-in
node:sqlite, zero native deps). Submit, crash, restart — nothing is lost.Decoupled server & worker — the MCP server only enqueues/reads; the worker only claims/runs. Either restarts independently; they meet only at the database file.
Isolated execution — every job runs in its own forked child process, so a handler that hangs, leaks, or segfaults can't take down the worker.
Hard timeouts — a per-job wall-clock timeout that ends in a real
SIGKILL, not a hopefulAbortControllerthe job can ignore.Retries with backoff — failed jobs are re-queued with exponential backoff until
maxAttemptsis reached, then fail terminally.Backpressure — a
maxConcurrencycap means a flood of submissions can never exhaust CPU/memory; excess work waits in the queue.Crash recovery — on startup the worker re-queues jobs orphaned mid-run by a previous crash (or fails them if out of attempts).
Allowlisted handlers — clients can only submit a registered job
type; there is no arbitrary command execution. This is the worker's security boundary.Typed errors & JSON logs — tools return structured
{code, message, retryable}instead of throwing; the worker emits one structured log line per job.
Related MCP server: silentwatch-mcp
Architecture
submit_job / get_job / list_jobs / cancel_job / get_stats
┌────────────┐ ┌──────────────────────┐
│ MCP client │ ──stdio──▶ ┌──────────┐ │ worker daemon │
│ (Claude…) │ │ MCP │ │ ┌────────────────┐ │
└────────────┘ │ server │ │ │ poll + claim │ │
│ (enqueue/ │ │ │ (BEGIN │ │
│ read) │ │ │ IMMEDIATE) │ │
└────┬──────┘ │ └───────┬────────┘ │
│ │ │ fork │
▼ │ ▼ │
┌───────────────────────┐ ┌────────────┐ │
│ SQLite (WAL) queue │ │ child proc │ │
│ jobs: state machine │◀─│ handler │ │
└───────────────────────┘ │ (timeout/ │ │
▲ │ SIGKILL) │ │
│ artifact + └─────┬──────┘ │
│ result/status │ │
└────────────────────────┘ ▼ │
artifacts/ │
<id>.json │
└────────────┘The queue is the only shared state. The server process and the worker process never talk directly — they coordinate entirely through atomic SQLite transactions.
Production handling, not a demo
Five patterns pulled straight from the source.
1. Atomic claim — a job goes to exactly one worker
The claim is wrapped in BEGIN IMMEDIATE, which takes SQLite's write lock up front. Even with several workers polling the same database, no two can grab the same job.
// db.ts
this.db.exec("BEGIN IMMEDIATE");
try {
const candidate = this.db
.prepare(
`SELECT id FROM jobs WHERE status = 'queued' AND next_run_at <= ?
ORDER BY priority DESC, created_at ASC LIMIT 1`,
)
.get(now);
if (!candidate) {
this.db.exec("COMMIT");
return undefined;
}
this.db
.prepare(`UPDATE jobs SET status = 'running', attempts = attempts + 1, ... WHERE id = ?`)
.run(/* ... */ candidate.id);
this.db.exec("COMMIT");
return this.get(candidate.id);
} catch (err) {
this.db.exec("ROLLBACK");
throw err;
}Why it matters: this is what makes the queue safe to scale horizontally and safe against double-execution — the hardest correctness property a job queue has to get right.
2. Isolation with a real timeout kill
Each job is a separate process; the timeout ends in SIGKILL, so even a tight CPU loop that ignores cooperative cancellation is stopped.
// runner.ts
const child = spawn(command, [...baseArgs, childScript], { stdio: ["pipe", "pipe", "pipe"] });
const onAbort = () => {
child.kill("SIGKILL");
finish({ ok: false, error: "job exceeded timeout and was killed", timedOut: true });
};
signal.addEventListener("abort", onAbort, { once: true });
child.stdin.write(JSON.stringify({ type: job.type, payload: safeParse(job.payload) }));Why it matters: a worker that can't guarantee it reclaims resources from a stuck job will slowly grind to a halt. Process isolation + SIGKILL is the only reliable answer.
3. Retry with exponential backoff, or terminal failure
On failure the queue decides — re-queue with growing backoff while attempts remain, otherwise fail terminally. One method, one source of truth.
// db.ts
if (job.attempts < job.max_attempts) {
const backoff = this.backoffMs(job.attempts); // base * 2^(attempts-1) + jitter
this.db
.prepare(`UPDATE jobs SET status = 'queued', error = ?, next_run_at = ? ... WHERE id = ?`)
.run(errorMessage, now + backoff, /* ... */ id);
return { job: this.get(id)!, retried: true };
}
this.db
.prepare(`UPDATE jobs SET status = 'failed', error = ?, finished_at = ? ... WHERE id = ?`)
.run(errorMessage, now, /* ... */ id);
return { job: this.get(id)!, retried: false };4. Crash recovery on startup
A worker that dies mid-job leaves rows stuck in running. On boot we reclaim them — re-queue if attempts remain, fail otherwise — so a crash never silently strands work.
// db.ts — called once when the worker starts
recoverOrphaned(): number {
const orphans = this.db.prepare("SELECT * FROM jobs WHERE status = 'running'").all();
for (const job of orphans) {
if (job.attempts < job.max_attempts) /* re-queue */;
else /* terminal fail: "orphaned after worker crash" */;
}
return orphans.length;
}Why it matters: most "simple" queues skip this and quietly lose in-flight jobs on every deploy or crash. Recovery is what makes "durable" actually true.
5. Allowlisted handlers — no arbitrary execution
A client can only submit a type that exists in the handler registry. There is no path from an MCP message to an arbitrary shell command.
// handlers.ts
export const HANDLERS: Record<string, JobHandler> = {
echo: async (payload) => ({ echoed: payload }),
wait: async (payload) => {
/* sleep — exercise timeouts */
},
hash: async (payload) => ({ digest: sha256(payload.text) }),
fibonacci: async (payload) => ({ value: fib(payload.n).toString() }),
fail: async (payload) => {
throw new Error(/* exercise retries */);
},
};Why it matters: "let the agent run a job" must never mean "let the agent run anything." Real work (render/transcode/scrape) is added as a new handler here — the queue machinery around it never changes.
Quickstart
Requires Node ≥ 22 (for the built-in node:sqlite).
git clone https://github.com/tommypj/mcp-job-queue.git
cd mcp-job-queue
npm install
npm run buildThe system is two processes that share a queue file. Start the worker:
npm run worker # node --experimental-sqlite dist/worker.jsThen run the MCP server (normally launched by your MCP client, see below):
npm run server # node --experimental-sqlite dist/server.js (stdio)The
--experimental-sqliteflag is required on Node 22 and accepted (harmless) on Node 24+.
For local hacking without a build, use the dev scripts: npm run dev:worker and npm run dev:server (run TypeScript directly via tsx).
Use it in Claude Desktop / Claude Code
Add this to claude_desktop_config.json (mirrors examples/claude_desktop_config.json) and run the worker separately:
{
"mcpServers": {
"job-queue": {
"command": "node",
"args": ["--experimental-sqlite", "/absolute/path/to/mcp-job-queue/dist/server.js"],
"env": {
"JOBQ_DB_PATH": "/absolute/path/to/queue.db",
"JOBQ_ARTIFACT_DIR": "/absolute/path/to/artifacts"
}
}
}
}Config file locations:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.jsonClaude Code:
claude mcp add job-queue -- node --experimental-sqlite /absolute/path/to/mcp-job-queue/dist/server.js
The server and the worker must point at the same JOBQ_DB_PATH.
Tools reference
submit_job(type, payload?, priority?, maxAttempts?, timeoutMs?)
Enqueue a job; returns it in status queued. type must be a registered handler (echo, wait, hash, fibonacci, fail). timeoutMs/maxAttempts are clamped to safe ceilings.
Errors:
UNKNOWN_JOB_TYPE.
// submit_job { "type": "hash", "payload": { "text": "hello world" } }
{
"id": "081beee6-…",
"type": "hash",
"status": "queued",
"attempts": 0,
"maxAttempts": 3,
"timeoutMs": 30000,
"payload": { "text": "hello world" },
}get_job(id)
Fetch one job: status, result, error, artifact path, timestamps. Errors: JOB_NOT_FOUND.
list_jobs(status?, limit?)
Recent jobs (newest first), optionally filtered by status (queued|running|succeeded|failed|cancelled).
cancel_job(id)
Cancel a still-queued job. Errors: JOB_NOT_FOUND, NOT_CANCELLABLE (running/finished jobs can't be cancelled).
get_stats()
Queue health: counts by status, total, age of the oldest queued job, and active config.
{
"countsByStatus": { "queued": 0, "running": 1, "succeeded": 12, "failed": 1, "cancelled": 0 },
"total": 14,
"oldestQueuedAgeMs": null,
"config": {
"maxConcurrency": 4,
"defaultTimeoutMs": 30000,
"registeredHandlers": ["echo", "wait", "hash", "fibonacci", "fail"],
},
}Configuration
Environment variables, all prefixed JOBQ_ (see .env.example). The server and worker must share JOBQ_DB_PATH.
Variable | Default | Description |
|
| SQLite (WAL) queue file |
|
| Where result artifacts are written |
|
| Max jobs a worker runs at once (backpressure) |
|
| Idle poll interval |
|
| Default per-job timeout |
|
| Hard ceiling for a per-job timeout |
|
| Default attempts incl. the first |
|
| Hard ceiling for attempts |
|
| Base delay for exponential backoff |
|
|
|
Testing
npm test # 30 tests (vitest), incl. a real forked-child integration test
npm run lint # eslint + prettier --checkCoverage targets the production paths: atomic claim + priority ordering, retry-vs-terminal transitions, orphan recovery, cancel rules, queue stats, the worker pool (success, timeout-kill, retry, and a strict concurrency-cap assertion), every handler, the real fork runner (spawns a child, captures failure, SIGKILLs on overrun), and the full MCP tool surface through an in-memory client.
Design decisions
node:sqlite, notbetter-sqlite3— the built-in module means zero native compilation (nonode-gyp), which makes the repo trivial to clone and run. WAL mode gives concurrent readers while the worker writes.Two processes, not one — decoupling the MCP server from the worker is the core design choice: it lets the agent-facing surface and the compute surface scale, deploy, and crash independently. The queue file is the contract.
Child process per job, not a worker thread — a separate OS process is the only isolation strong enough to survive native crashes and guarantee a timeout via
SIGKILL. Worker threads share a heap and can't be force-killed cleanly.Handlers are an allowlist — no arbitrary command execution by design; this is the security boundary and the extension point in one. Plugging in real work is a one-function change.
Errors as values — tools return typed
{code, message, retryable}so an agent can branch programmatically, and the server never crashes on bad input.Cancelling running jobs is intentionally out of scope (v1) — it would require the worker to poll a cancel flag and coordinate the kill; queued-cancel covers the common case cleanly. Documented rather than half-built.
Deterministic jitter — retry backoff uses a small deterministic jitter so tests are reproducible while retries still spread; true randomness isn't needed for correctness here.
License
MIT © Dan Tomescu. See LICENSE.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tommypj/mcp-job-queue'
If you have feedback or need assistance with the MCP directory API, please join our Discord server