mcp-codex-dev

Overview Schema Related Servers Score Discussions

tdd

Implement code using strict Test-Driven Development methodology by writing tests before production code. Supports iterative TDD cycles with session resume functionality.

Instructions

Invoke Codex CLI to implement code using strict Test-Driven Development (Red-Green-Refactor). Injects TDD methodology template that enforces writing tests before production code. Supports session resume for iterative TDD cycles.

Input Schema

TableJSON Schema

Name	Required	Description
`instruction`	Yes	Detailed description of the feature or bug fix to implement using TDD
`sessionId`	No	Resume an existing TDD session by ID (from previous tdd return value)
`workingDirectory`	No	Working directory path
`planReference`	No	Plan file path or content summary, used as coding context
`taskContext`	No	Task position and context within the plan (e.g. 'Task 3 of 5: Implement validation logic. Depends on Task 2 auth module.')
`testFramework`	No	Test framework hint (e.g. 'jest', 'vitest', 'pytest', 'go test')

Implementation Reference

src/tools/codex-tdd.ts:54-231 (handler)

The main handler function codexTdd that implements the TDD tool logic. It loads templates, processes context, executes Codex CLI with TDD methodology, manages sessions, and returns structured results with status tracking.

export async function codexTdd(
  params: CodexTddParams,
  extra?: { signal?: AbortSignal }
): Promise<CodexWriteResult> {
  const config = await loadConfig({
    workingDirectory: params.workingDirectory,
  });
  const toolCfg = getToolConfig(config, "tdd");
  const executor = await CodexExecutor.create({
    workingDirectory: params.workingDirectory,
  });

  // Load and fill the TDD template
  const template = await loadTddTemplate({
    tddTemplate: config.tddTemplate,
    workingDirectory: params.workingDirectory,
  });

  // Build context section: plan reference + task position
  let contextValue = "";
  if (params.planReference) {
    contextValue += params.planReference;
  }
  if (params.taskContext) {
    if (contextValue) contextValue += "\n\n";
    contextValue += `**Current Task Position:** ${params.taskContext}`;
  }
  if (!contextValue) {
    contextValue = "(No plan reference provided)";
  }

  // Strip TEST_FRAMEWORK section from template if not provided
  let processedTemplate = template;
  if (!params.testFramework) {
    // Remove "### Test Framework\n\nUse: {TEST_FRAMEWORK}\n" section
    processedTemplate = processedTemplate.replace(
      /### Test Framework\s*\n+Use: \{TEST_FRAMEWORK\}\s*\n*/g,
      ""
    );
  }

  const fullInstruction = fillTemplate(processedTemplate, {
    INSTRUCTION: params.instruction,
    PLAN_REFERENCE: contextValue,
    ...(params.testFramework
      ? { TEST_FRAMEWORK: params.testFramework }
      : {}),
  });

  try {
    const operationId = `tdd-${crypto.randomUUID()}`;
    progressServer.startOperation(
      operationId,
      "write",
      params.instruction.slice(0, 120)
    );

    // Mark session as active before execution (for resume)
    if (params.sessionId) {
      await sessionManager.updateStatus(params.sessionId, "active", {
        workingDirectory: params.workingDirectory,
      });
    }

    let result;
    try {
      result = await executor.executeWrite(fullInstruction, {
        sessionId: params.sessionId,
        workingDirectory: params.workingDirectory,
        model: toolCfg.model,
        sandbox: toolCfg.sandbox,
        timeout: toolCfg.timeout,
        onLine: (line) => {
          const event = mapCodexLineToProgressEvent(line, operationId);
          if (event) progressServer.emit(event);
        },
        signal: extra?.signal,
      });
    } finally {
      progressServer.endOperation(operationId, result?.exitCode === 0);
    }

    // Parse output
    const parsed = executor.parseOutput(result.stdout);
    if (!params.sessionId && !parsed.sessionId) {
      throw {
        code: CodexErrorCode.CODEX_INVALID_OUTPUT,
        message:
          "Codex CLI did not emit a session_id/thread_id in --json output. Ensure Codex CLI is up to date and that --json output is enabled.",
        recoverable: false,
      } satisfies CodexErrorInfo;
    }

    // Determine status
    let status: CodexWriteResult["status"] = "completed";
    if (result.exitCode !== 0) {
      status = "error";
    } else if (
      parsed.filesCreated.length > 0 ||
      parsed.filesModified.length > 0
    ) {
      status = "needs_review";
    }

    // Track session
    const sessionId = parsed.sessionId || params.sessionId;
    if (!sessionId) {
      throw {
        code: CodexErrorCode.CODEX_INVALID_OUTPUT,
        message:
          "Missing sessionId (neither parsed from Codex output nor provided via params.sessionId).",
        recoverable: false,
      } satisfies CodexErrorInfo;
    }
    const trackedStatus = result.exitCode !== 0 ? "abandoned" : "completed";

    if (params.sessionId) {
      // Resuming existing session
      await sessionManager.markResumed(params.sessionId, {
        workingDirectory: params.workingDirectory,
      });
      await sessionManager.updateStatus(params.sessionId, trackedStatus, {
        workingDirectory: params.workingDirectory,
      });
    } else {
      // New session
      await sessionManager.track(
        {
          sessionId,
          type: "write",
          instruction: params.instruction,
          createdAt: new Date().toISOString(),
          status: trackedStatus,
        },
        { workingDirectory: params.workingDirectory }
      );
    }

    return {
      success: result.exitCode === 0,
      sessionId,
      output: {
        summary: parsed.summary,
        filesModified: parsed.filesModified,
        filesCreated: parsed.filesCreated,
      },
      status,
    };
  } catch (error) {
    // Restore session status if it was set to "active" before failure
    if (params.sessionId) {
      try {
        await sessionManager.updateStatus(params.sessionId, "abandoned", {
          workingDirectory: params.workingDirectory,
        });
      } catch {
        /* best effort */
      }
    }
    const errorInfo = error as CodexErrorInfo;
    return {
      success: false,
      sessionId: params.sessionId || "",
      output: {
        summary: "",
        filesModified: [],
        filesCreated: [],
      },
      status: "error",
      error: {
        code: errorInfo.code || CodexErrorCode.UNKNOWN_ERROR,
        message: errorInfo.message || String(error),
        recoverable: errorInfo.recoverable ?? false,
        suggestion: errorInfo.suggestion,
      },
    };
  }
}

src/tools/codex-tdd.ts:17-52 (schema)

CodexTddParamsSchema defines the input validation schema for the TDD tool including instruction, sessionId, workingDirectory, planReference, taskContext, and testFramework parameters.

export const CodexTddParamsSchema = z.object({
  instruction: z
    .string()
    .describe(
      "Detailed description of the feature or bug fix to implement using TDD"
    ),
  sessionId: z
    .string()
    .uuid()
    .optional()
    .describe("Resume an existing TDD session by ID"),
  workingDirectory: z
    .string()
    .optional()
    .describe("Working directory path"),
  planReference: z
    .string()
    .optional()
    .describe("Plan file path or content summary, used as coding context"),
  taskContext: z
    .string()
    .optional()
    .describe(
      "Task position and context within the plan " +
        "(e.g. 'Task 3 of 5: Implement validation logic. Depends on Task 2 auth module.')"
    ),
  testFramework: z
    .string()
    .optional()
    .describe(
      "Test framework hint (e.g. 'jest', 'vitest', 'pytest', 'go test'). " +
        "Defaults to auto-detect from project."
    ),
});

export type CodexTddParams = z.infer<typeof CodexTddParamsSchema>;

src/index.ts:81-136 (registration)

Registration of the 'tdd' tool with the MCP server. Lines 82-135 define the tool's description, input parameters, and handler binding that connects the MCP tool to the codexTdd implementation.

// ─── tdd ───────────────────────────────────────────────────────────
if (isToolEnabled(config, "tdd")) {
  server.tool(
    "tdd",
    "Invoke Codex CLI to implement code using strict Test-Driven Development (Red-Green-Refactor). " +
      "Injects TDD methodology template that enforces writing tests before production code. " +
      "Supports session resume for iterative TDD cycles.",
    {
      instruction: z
        .string()
        .describe(
          "Detailed description of the feature or bug fix to implement using TDD"
        ),
      sessionId: z
        .string()
        .optional()
        .describe(
          "Resume an existing TDD session by ID (from previous tdd return value)"
        ),
      workingDirectory: z
        .string()
        .optional()
        .describe("Working directory path"),
      planReference: z
        .string()
        .optional()
        .describe(
          "Plan file path or content summary, used as coding context"
        ),
      taskContext: z
        .string()
        .optional()
        .describe(
          "Task position and context within the plan " +
            "(e.g. 'Task 3 of 5: Implement validation logic. Depends on Task 2 auth module.')"
        ),
      testFramework: z
        .string()
        .optional()
        .describe(
          "Test framework hint (e.g. 'jest', 'vitest', 'pytest', 'go test')"
        ),
    },
    async (params, extra) => {
      const result = await codexTdd(params, extra);
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(result, null, 2),
          },
        ],
      };
    }
  );
}

src/tools/codex-tdd.ts:235-305 (helper)

Helper functions for the TDD tool: loadTddTemplate loads the TDD methodology template from config, bundled files, or hardcoded fallback; fillTemplate substitutes placeholders like {INSTRUCTION}, {PLAN_REFERENCE}, and {TEST_FRAMEWORK}.

async function loadTddTemplate(options: {
  tddTemplate?: string;
  workingDirectory?: string;
}): Promise<string> {
  // Priority 1: Config override (file path or inline content)
  if (options.tddTemplate) {
    const templateOverride = options.tddTemplate;
    const candidates: string[] = [];

    if (options.workingDirectory) {
      candidates.push(
        path.resolve(options.workingDirectory, templateOverride)
      );
    }
    candidates.push(path.resolve(templateOverride));

    for (const candidate of candidates) {
      try {
        if (fs.existsSync(candidate)) {
          return await fs.promises.readFile(candidate, "utf-8");
        }
      } catch {
        // continue
      }
    }

    // If it looks like inline template content, use it directly
    if (
      templateOverride.includes("\n") ||
      templateOverride.trimStart().startsWith("#")
    ) {
      return templateOverride;
    }

    console.error(
      `codex-dev: tddTemplate override not found: ${templateOverride}`
    );
  }

  // Priority 2: Bundled template file
  const moduleDir = path.dirname(fileURLToPath(import.meta.url));
  const templatePath = path.join(
    moduleDir,
    "..",
    "..",
    "templates",
    "tdd-developer.md"
  );

  try {
    if (fs.existsSync(templatePath)) {
      return await fs.promises.readFile(templatePath, "utf-8");
    }
  } catch {
    // Fall through to default
  }

  // Priority 3: Hardcoded fallback
  return DEFAULT_TDD_TEMPLATE;
}

function fillTemplate(
  template: string,
  values: Record<string, string>
): string {
  let result = template;
  for (const [key, value] of Object.entries(values)) {
    result = result.replace(new RegExp(`\\{${key}\\}`, "g"), () => value);
  }
  return result;
}

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes key behaviors: 'Injects TDD methodology template that enforces writing tests before production code' and 'Supports session resume for iterative TDD cycles.' However, it doesn't mention important aspects like whether this tool modifies files (destructive), requires specific permissions, has rate limits, or what the return format looks like. The description adds some behavioral context but leaves significant gaps for a tool that presumably writes code and tests.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with three sentences that each add value. The first sentence states the core purpose, the second explains the TDD enforcement mechanism, and the third mentions session resumption capability. There's no wasted text, and the information is front-loaded with the main purpose stated first. It could be slightly more structured by explicitly separating purpose from features.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a complex code implementation tool with 6 parameters and no annotations or output schema, the description provides basic context about TDD methodology and session management but lacks important details. It doesn't explain what the tool returns, how errors are handled, what file modifications occur, or prerequisites for use. For a tool that presumably creates and modifies code files, more behavioral transparency would be expected.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 6 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It mentions 'session resume' which relates to the 'sessionId' parameter, but doesn't provide additional semantics about parameter interactions or usage patterns. The baseline score of 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Invoke Codex CLI to implement code using strict Test-Driven Development (Red-Green-Refactor).' It specifies the exact methodology (TDD with Red-Green-Refactor cycle) and distinguishes it from siblings like 'exec' (general execution) or 'review' (code review). The description goes beyond the tool name 'tdd' by explaining what TDD means in this context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('Supports session resume for iterative TDD cycles') but doesn't explicitly state when to use this tool versus alternatives like 'exec' for general code execution or 'session_list' for session management. It mentions session resumption but doesn't provide clear guidance on when to start a new session versus resume an existing one, or when TDD is preferred over other development approaches.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/FYZAFH/mcp-codex-dev'

If you have feedback or need assistance with the MCP directory API, please join our Discord server