CotForce-MCP
CotForce-MCP
"Give brains to your small models."
CotForce enforces step-by-step Chain-of-Thought, turning 4B parameter models into methodical reasoners.
Why this exists
A 4-billion-parameter Gemma cannot solve SEND + MORE = MONEY. It's a classic cryptarithmetic puzzle — 8 unique digits, 5 columns, 4 carry values. A bare 4B model guesses randomly. It hallucinates digits. It loses track of carries after column 2.
The same model, with CotForce:
Step 1: Analyze the leftmost column. S+M+C3 = MO. Max sum is 19998. ∴ M=1.
Step 2: S+1+C3 = 10+O. With M=1 and carry, O must be 0.
Step 3: D+E = Y+10C1 → C1=1. Now R+C1=9 → C1=0→R=9 (used), C1=1→R=8.
...
Step 11: All digits assigned. 9567 + 1085 = 10652. Verified.11 structured reasoning steps. Zero hallucinations. Correct answer.
CotForce doesn't make small models smarter. It forces them to think before they speak — which is often all they need.
⚡ Two modes — one line of config
CotForce uses the MCP sampling protocol (sampling/createMessage) to call LLMs. If your client supports it (Claude Desktop, Cursor), nothing extra is needed.
If not — or if you're using a local model like Gemma via LMStudio — switch to direct HTTP mode:
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["node_modules/@slbdn/cotforce-mcp/index.js"],
"env": {
"MODE": "direct",
"API_BASE_URL": "http://localhost:1234/v1",
"MODEL": "gemma-4-e4b-it-mlx"
}
}
}
}That's it. The same 4B Gemma that couldn't solve SEND+MORE=MONEY above — now with CotForce, working locally through LMStudio.
🚀 Features
Rigid CoT enforcement — forces any LLM to output valid JSON
{reasoning, result}via strict system prompts and few‑shot examples.Adaptive multi‑layer parser — plug-in architecture with 5 built-in parsers (direct JSON, fenced blocks, XML/labels, brace-balanced, truncated recovery) in a priority-sorted pipeline. Add custom parsers via
CotParserinterface. Select parsers viaCOT_PARSERSenv var.Direct JSON (with code‑fence stripping)
JSON inside markdown fenced blocks
XML / heuristic label extraction (
<reasoning>,Reasoning:)Brace‑balancing scanner for nested JSON objects
Zod runtime validation — validates tool arguments and parsed CoT output with strict schemas.
Automatic retry with temperature increase — up to 3 attempts (configurable) with increasing temperature and correction suffixes.
Per‑request rejection memo — no global mutable state; safe under concurrent tool calls.
Token budgeting with tiktoken — accurate token counting using OpenAI's
cl100k_baseencoding, with fallback to character heuristic. Tweak viaREASONING_OVERHEAD.Configurable model — set
MODELenvironment variable to hint a specific model; leave unset for host default.Model-specific prompts — automatically selects tuned system prompts for Claude, GPT-4, Gemini, and Grok based on
MODEL.Universal compatibility — works with MCP sampling (Claude Desktop) or direct LLM HTTP calls (OpenAI, LMStudio, Ollama, any OpenAI-compatible API). Set
API_KEYto use direct mode.Structured logging — timestamped, level‑filtered logs to stderr (supports
LOG_LEVEL).Output truncation detection — detects when the LLM response hits the token limit and retries with a conciseness hint (
TRUNCATION_THRESHOLD).Token usage exposure — every response includes input / output / budget token counts so callers can optimize.
User-supplied result schema — optional
resultSchemaparameter validates theresultfield type‑map; mismatches trigger retry.Structured metrics — in-memory counters for requests, success/fail rates, truncations, retries, latency, and token usage. Logged on shutdown.
Comprehensive test suite — 151 tests covering parser pipeline, token budgeting, metrics, schema validation, retry loop, progress notifications, caching, and MCP server integration.
📦 Installation
npm install @slbdn/cotforce-mcp
# or
git clone https://github.com/islobodan/cotforce-mcp
cd cotforce-mcp
npm install
npm run buildRequires Node.js ≥ 18.
Quick start — Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"cotforce": {
"command": "npx",
"args": ["-y", "@slbdn/cotforce-mcp"],
"env": {
"MODEL": "claude-3-5-sonnet"
}
}
}
}No clone, no build. npx -y pulls and runs directly from npm.
🔧 Configuration
The server is configured via environment variables (all optional):
Variable | Default | Description |
| (not set) | Model name hint (e.g. |
|
| Number of retry attempts before returning raw output. |
|
| Initial sampling temperature. |
|
| Temperature added per retry attempt. |
|
| Sampling timeout in ms (60s). Direct HTTP mode uses longer default (120s) since local models are slower. |
|
| Result cache TTL in ms (default 1 hour). Set to |
|
| Maximum cached results before evicting oldest. |
| (all) | Comma-separated parser names to use (e.g., |
|
| Ratio of output/budget that triggers truncation detection. Attempts truncated JSON recovery first, then retries with 1.5x budget. |
|
| Fixed token overhead added to the budget formula. Increase for verbose models. |
| (not set) | Comma-separated list of fallback models (e.g. |
|
|
|
| (not set) | LLM API key for direct HTTP mode. Optional for local endpoints (LMStudio, Ollama). Required for remote providers (OpenAI, Anthropic, etc.). |
|
| Base URL for direct HTTP mode. Change for LMStudio ( |
|
| One of |
Example
MODEL=gpt-4o MAX_RETRIES=3 BASE_TEMP=0.2 TEMP_INCREMENT=0.15 LOG_LEVEL=DEBUG npx @slbdn/cotforce-mcp🧪 Usage
As an MCP Tool
Add to your MCP client configuration. A .mcp.json file is included in the package for auto-discovery by clients like Cursor, VS Code, and Windsurf. Copy the relevant config below to your client's settings:
With MCP sampling (Claude Desktop):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["/path/to/cotforce-mcp/index.js"],
"env": {
"MODEL": "claude-3-5-sonnet",
"MAX_RETRIES": "2"
}
}
}
}With direct LLM HTTP (LMStudio, OpenAI, Ollama):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["/path/to/cotforce-mcp/index.js"],
"env": {
"MODE": "direct",
"API_BASE_URL": "http://localhost:1234/v1",
"MODEL": "local-model",
"MAX_RETRIES": "2"
}
}
}
}Note:
API_KEYis optional for local endpoints like LMStudio or Ollama. It is required for remote providers like OpenAI or Anthropic.
The root
index.jsis a launcher that delegates todist/index.js. It guards against missing builds with a helpful error message.
🩺 Troubleshooting
Response truncated mid-reasoning
What you see: finish_reason: "length" in the LLM response. The reasoning cuts off before the result field.
Why: The token budget is too tight. Complex reasoning (like SEND+MORE=MONEY) can need 3000+ output tokens, but the default minimum is 4096 — while the default model-level cap can vary.
Fix: Increase the budget overhead:
REASONING_OVERHEAD=1600 # default is 800, raise for verbose modelsOr skip token-heavy parser layers to save budget for reasoning:
COT_PARSERS=direct-json,fenced-block # skip heuristic and brace-balancedMCP client timeout
What you see: MCP error -32001: Request timed out before the solution appears.
Why: Complex CoT reasoning takes time — 60-90 seconds for local models like Gemma. This error can come from two places:
CotForce's own timeout — default 120s for direct HTTP mode. Controlled by the
TIMEOUTenv var.The MCP client's timeout — LM Studio, Claude Desktop, Cursor, etc. each have their own default timeout for tool calls (often 30-60s). This is separate from CotForce's timeout.
Fix — check both sides:
Increase CotForce's timeout:
TIMEOUT=180000 # 3 minutesCheck your MCP client's timeout setting:
LM Studio — add "timeout" to mcp.json (milliseconds):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["index.js"],
"env": {
"TIMEOUT": "180000"
},
"timeout": 300000
}
}
}Claude Desktop — the tool call timeout is not directly configurable. A workaround is to increase CotForce's TIMEOUT to complete within the client's window, or use a faster model.
Cursor / VS Code — check the MCP extension or .vscode/mcp.json for a timeout or requestTimeout setting.
Call the Tool
{
"name": "solve_problem",
"arguments": {
"prompt": "What is 7 * 8 + 2?"
}
}With Result Schema Validation
{
"name": "solve_problem",
"arguments": {
"prompt": "List the prime numbers between 10 and 20",
"resultSchema": {
"primes": "object",
"count": "number"
}
}
}If the result field doesn't match the schema, the server retries with a correction hint.
More Examples
See EXAMPLES.md for 16 diverse examples including:
Logic puzzles, probability, word problems
Code analysis, regex, SQL queries
Creative writing, recipe adaptation
Nested JSON with schema validation
Usage with different models and fallbacks
Example Response
{
"content": [{
"type": "text",
"text": "🤖 Agentic CoT Result:\n\n**Reasoning:** Step 1: Multiply 7 * 8 = 56. Step 2: Add 2 to get 58.\n\n**Answer:** 58\n\n📊 Token Usage: 42 in / 150 out / 4096 budget"
}]
}If parsing fails after all retries, the server returns the raw LLM output with a warning.
🧩 Custom Parsers
The parser is a priority-sorted pipeline of plugins. Five built-in parsers run in order:
Priority | Name | What it does |
10 |
| Parses whole output as JSON (strips |
20 |
| Extracts JSON from markdown code blocks |
30 |
| Looks for |
40 |
| Finds first balanced |
50 |
| Salvages reasoning from truncated JSON (hit token limit) |
Filter parsers via COT_PARSERS env var:
COT_PARSERS=direct-json,fenced-block node index.jsWrite a custom parser:
import { CotParser, AgenticCotSchema } from "@slbdn/cotforce-mcp";
class YamlParser implements CotParser {
name = "yaml";
priority = 35; // runs after heuristic, before brace-balanced
parse(raw: string): { reasoning: string; result: unknown } | null {
// Custom YAML parsing logic here
return null; // return null if this output isn't YAML
}
}Then register it programmatically:
import { defaultParserPipeline, ParserPipeline } from "@slbdn/cotforce-mcp";
const pipeline = defaultParserPipeline();
pipeline.addParser(new YamlParser());
const result = pipeline.parse(rawText);📚 API
Tool: solve_problem
Input:
{ prompt: string }— the problem to solve.Output: either:
Success — structured CoT result.
Soft failure — raw LLM output if parsing fails after all retries.
Sampling / LLM Calling
CotForce supports two modes for calling the LLM:
MCP Sampling (default with compatible clients):
Uses MCP native
sampling/createMessageClient selects and calls the model
Requires client support (Claude Desktop, etc.)
Direct HTTP (for clients without sampling support):
Calls OpenAI-compatible
/v1/chat/completionsdirectlyWorks with OpenAI, LMStudio, Ollama, and any compatible provider
Activated automatically in
MODE=autowhenAPI_KEYis set and client lacks samplingOr force with
MODE=direct
Both modes use the same system prompt with few‑shot examples and strict schema constraints.
🏗️ Architecture
cotforce-mcp/
├── src/
│ ├── index.ts # MCP server, tool handlers, routing logic
│ └── lib/
│ ├── parser.ts # Parser pipeline: CotParser interface + 5 plugin parsers + Zod schemas
│ ├── tokens.ts # tiktoken integration + budget computation
│ ├── prompts.ts # Model-specific system prompts
│ ├── metrics.ts # In-memory request/performance counters
│ └── llm.ts # Direct HTTP LLM client (OpenAI-compatible)
├── tests/
│ ├── cache.test.ts # 10 unit tests for result caching
│ ├── parser.test.ts # 47 unit tests for parser layers
│ ├── tokens.test.ts # 23 unit tests for token budgeting
│ ├── schema.test.ts # 8 unit tests for result schema validation
│ ├── metrics.test.ts # 9 unit tests for metrics tracking
│ ├── prompts.test.ts # 12 unit tests for model-specific prompts
│ ├── llm.test.ts # 6 tests for direct mode detection
│ ├── retry.test.ts # 4 integration tests for retry loop
│ ├── progress.test.ts # 5 unit tests for progress notifications
│ └── server.test.ts # 9 integration tests via @slbdn/mcp-tester
├── index.js # Root launcher (delegates to dist/)
├── dist/ # Compiled TypeScript output
└── package.json🧠 How It Works
System prompt enforces JSON output with
reasoningandresult. Model-specific variants tuned for Claude, GPT-4, Gemini, Grok.Parser pipeline runs 5 built-in parsers in priority order (direct JSON, fenced blocks, XML/labels, brace-balanced, truncated recovery). First valid match wins. Custom parsers can be added via
COT_PARSERSenv var and theCotParserinterface.Retry logic — if parsing fails, injects correction suffix and increases temperature. Supports fallback models (
FALLBACK_MODELS) when primary model refuses.Rejection memory stores a snippet of the last failure to contextualise the next call (scoped per‑request, thread‑safe).
Token budgeting uses
estimateTokens()(lightweight heuristic) for budget math andcountTokens()(tiktoken) for exact counts. SetsmaxTokensdynamically (between 4096 and 8192) via formulaoverhead + inputTokens × 4. Detects truncation viafinish_reason: "length"and attempts JSON recovery before retrying.
🛠️ Development
git clone https://github.com/islobodan/cotforce-mcp
cd cotforce-mcp
npm install
npm run build # compile TypeScript to dist/
npm run dev # tsc --watch
npm run typecheck # type-check src/ and tests/Scripts
Script | Purpose |
| Compile TypeScript ( |
| Watch mode compilation |
| TypeScript type-checking for source and tests |
| Run full Jest test suite (133 tests) |
| Quick smoke test via |
| List available tools via |
Testing
The test suite uses Jest with ts-jest (ESM) and @slbdn/mcp-tester for MCP server integration testing:
Parser tests (
tests/parser.test.ts) — 47 unit tests covering all 5 parser plugins, edge cases, andAgenticCotSchemavalidation.Token tests (
tests/tokens.test.ts) — 16 unit tests fortiktokenintegration, budget computation, andREASONING_OVERHEADtuning.Schema tests (
tests/schema.test.ts) — 8 unit tests for user-suppliedresultSchemavalidation.Metrics tests (
tests/metrics.test.ts) — 9 unit tests for request counters, latency tracking, and token usage averages.Prompt tests (
tests/prompts.test.ts) — 10 unit tests for model-specific prompt selection.LLM tests (
tests/llm.test.ts) — 3 unit tests for direct HTTP mode detection.Server tests (
tests/server.test.ts) — 11 integration tests for tool discovery, argument validation, server lifecycle, and concurrent calls.
Custom Jest matchers are available via @slbdn/mcp-tester:
expect(tools).toHaveTool("solve_problem");
expect(tools).toHaveToolWithSchema("solve_problem");
expect(result).toReturnTextContaining("Reasoning:");⚠️ Limitations & Honest Assessment
No true production monitoring — only structured logs; no aggregated metrics.
Token budget formula is heuristic — may need tuning for very verbose models.
Model hints are suggestions — the MCP host decides which model to use.
See TODO list for planned improvements.
📄 License
MIT © Slobodan Ivkovic
⭐ Support
If you find CotForce-MCP useful, consider starring the repo and sharing your feedback!
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/islobodan/cotforce-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server