Kilntainers

Official

Overview Schema Related Servers Score Discussions

sandbox_exec

Execute shell commands in an isolated Debian Linux sandbox to run code safely without affecting the host system. Use stdin for data input and set timeouts for long operations.

Instructions

Execute a shell command in an isolated Debian Linux sandbox. Commands run in bash. Each call is independent — no state (shell variables, working directory) persists between calls (however filesystem does persist). Use the working_directory parameter or chain commands with && to control execution context.

To write files or pass data without shell escaping, use the stdin parameter (e.g., command="cat > file.txt" with content in stdin). Commands time out after 120 seconds by default (override with the timeout parameter for long-running operations).

Input Schema

TableJSON Schema

Name	Required	Description
`command`	No	Shell command string (mutually exclusive with args).
`args`	No	List of arguments for direct execution (mutually exclusive with command).
`stdin`	No	Content to pipe to stdin.
`working_directory`	No	Working directory for the command (must be absolute).
`timeout`	No	Timeout in seconds (defaults to server config).

Implementation Reference

src/kilntainers/server.py:276-380 (handler)

The _create_handler function creates and returns the sandbox_exec_handler, which is the core handler that executes tool logic. It sanitizes inputs, validates them, gets/creates the sandbox from context, constructs ExecRequest, executes the command via sandbox.exec(), and returns the formatted CallToolResult.

def _create_handler(config: ServerConfig) -> Callable[..., Any]:
    """Create the sandbox_exec handler with server config bound via closure.

    Args:
        config: The server configuration containing defaults.

    Returns:
        An async handler function for the sandbox_exec tool.
    """

    async def sandbox_exec_handler(
        command: str | None = None,
        args: list[str] | None = None,
        stdin: str | None = None,
        working_directory: str | None = None,
        timeout: int | None = None,
        ctx: Context[ServerSession, SessionContext] | None = None,
    ) -> CallToolResult:
        """Handle a sandbox_exec tool call.

        Args:
            command: Shell command string (mutually exclusive with args).
            args: List of arguments for direct execution (mutually exclusive with command).
            stdin: Content to pipe to stdin.
            working_directory: Working directory for the command (must be absolute).
            timeout: Timeout in seconds (defaults to server config).
            ctx: FastMCP context object (injected automatically).

        Returns:
            A CallToolResult with the execution result or error.
        """
        # --- Input sanitization ---
        if args is not None and len(args) == 0:
            args = None
        if command is not None and len(command) == 0:
            command = None
        if working_directory is not None and len(working_directory) == 0:
            working_directory = None
        if stdin is not None and len(stdin) == 0:
            stdin = None

        # --- Input validation ---
        error = _validate_inputs(command, args, stdin, working_directory, timeout)
        if error is not None:
            return CallToolResult(
                content=[TextContent(type="text", text=error)],
                isError=True,
            )

        # --- Get sandbox from context ---
        # ctx should always be provided by FastMCP, but handle None for safety
        if ctx is None:
            return CallToolResult(
                content=[
                    TextContent(type="text", text="Internal error: no context provided")
                ],
                isError=True,
            )

        session_context = ctx.request_context.lifespan_context

        # --- Lazy sandbox creation ---
        try:
            sandbox = await session_context.get_or_create_sandbox()
        except BackendError as e:
            return CallToolResult(
                content=[TextContent(type="text", text=str(e))],
                isError=True,
            )

        # --- Construct ExecRequest ---
        request = ExecRequest(
            command=command,
            args=args,
            stdin=stdin,
            working_directory=working_directory,
            timeout=timeout if timeout is not None else config.default_timeout,
            output_limit=config.output_limit,
        )

        # --- Execute ---
        try:
            result = await sandbox.exec(request)
        except SandboxDiedError as e:
            return CallToolResult(
                content=[TextContent(type="text", text=str(e))],
                isError=True,
            )

        # --- Format response ---
        response_json = json.dumps(
            {
                "stdout": result.stdout,
                "stderr": result.stderr,
                "exit_code": result.exit_code,
                "exec_duration_ms": result.exec_duration_ms,
            }
        )

        return CallToolResult(
            content=[TextContent(type="text", text=response_json)],
            isError=False,
        )

    return sandbox_exec_handler

src/kilntainers/backends/base.py:32-76 (schema)

ExecRequest dataclass defines the validated input parameters for command execution, including command/args (mutually exclusive), stdin, working_directory, timeout, and output_limit. Includes __post_init__ validation for mutual exclusivity and constraints.

@dataclass(frozen=True, slots=True, kw_only=True)
class ExecRequest:
    """Validated parameters for a command execution.

    Constructed by the MCP layer after input validation. The MCP layer
    resolves defaults (effective timeout, configured output limit) so
    the backend always receives concrete values.

    kw_only=True forces callers to use keyword arguments, which is
    clearer for a dataclass with many optional fields.
    """

    # Exactly one of command or args must be provided
    command: str | None = None
    args: list[str] | None = None

    # Optional parameters
    stdin: str | None = None
    working_directory: str | None = None

    # Always provided by MCP layer (defaults resolved before reaching backend)
    timeout: int  # seconds
    output_limit: int  # bytes

    def __post_init__(self) -> None:
        # Validate mutual exclusivity of command/args
        if self.command is not None and self.args is not None:
            raise ValueError("command and args are mutually exclusive")
        if self.command is None and self.args is None:
            raise ValueError("either command or args must be provided")

        # Validate working_directory is absolute
        if self.working_directory is not None and not self.working_directory.startswith(
            "/"
        ):
            raise ValueError("working_directory must be an absolute path")

        # Validate timeout
        if self.timeout < 1:
            raise ValueError("timeout must be at least 1 second")

        # Validate output_limit
        if self.output_limit < 1:
            raise ValueError("output_limit must be positive")

src/kilntainers/backends/base.py:15-29 (schema)

ExecResult dataclass defines the output schema for execution results, containing stdout, stderr, exit_code, and exec_duration_ms fields.

@dataclass(frozen=True, slots=True)
class ExecResult:
    """Result of a command execution.

    The return type from every exec call. Immutable, with no optional
    fields — every execution produces all four values.

    Maps directly to the MCP response schema (Functional spec §2.2).
    The MCP layer serializes this to JSON for the tool response.
    """

    stdout: str
    stderr: str
    exit_code: int
    exec_duration_ms: int

src/kilntainers/server.py:455-459 (registration)
Tool registration via mcp.add_tool() - registers the sandbox_exec wrapper function with name 'sandbox_exec' and the assembled description.
```
mcp.add_tool(
    sandbox_exec,
    name="sandbox_exec",
    description=description,
)
```

src/kilntainers/server.py:227-270 (helper)

The _validate_inputs helper function validates tool inputs - ensures exactly one of command/args is provided, working_directory is absolute, timeout is positive, and stdin doesn't exceed the 2 MiB limit.

def _validate_inputs(
    command: str | None,
    args: list[str] | None,
    stdin: str | None,
    working_directory: str | None,
    timeout: int | None,
) -> str | None:
    """Validate tool inputs.

    Returns error message or None if valid.

    Args:
        command: The shell command string, if using command mode.
        args: The list of arguments, if using args mode.
        stdin: The stdin content to pipe to the command.
        working_directory: The working directory for the command.
        timeout: The timeout in seconds.

    Returns:
        An error message string if validation fails, None otherwise.
    """
    # Exactly one of command or args
    if command is not None and args is not None:
        return "Cannot provide both 'command' and 'args'. Use 'command' for shell commands or 'args' for direct execution."
    if command is None and args is None:
        return "Must provide either 'command' or 'args'."

    # working_directory must be absolute
    if working_directory is not None and not working_directory.startswith("/"):
        return f"working_directory must be an absolute path, got: {working_directory}"

    # timeout must be positive
    if timeout is not None and timeout < 1:
        return "timeout must be at least 1 second."

    # stdin size limit (D32)
    if stdin is not None and len(stdin.encode("utf-8")) > STDIN_LIMIT:
        return (
            f"stdin content exceeds the 2 MiB limit "
            f"({len(stdin.encode('utf-8'))} bytes). "
            f"Split into smaller chunks or use a different approach."
        )

    return None

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and excels by disclosing key behavioral traits: the sandbox is isolated, commands run in bash, no state persists between calls (except filesystem), default timeout of 120 seconds, and persistence of filesystem. This covers safety, execution environment, and operational constraints comprehensively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by essential behavioral details and usage tips in a logical flow. Every sentence adds value—explaining state persistence, parameter usage, and timeout—with zero waste or redundancy. It's efficiently structured and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (sandboxed execution with multiple parameters), no annotations, and no output schema, the description does an excellent job covering execution environment, behavioral traits, and parameter guidance. However, it lacks details on output format or error handling, which would be beneficial for an agent. It's nearly complete but has a minor gap in output expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters well. The description adds some value by explaining the purpose of stdin and working_directory in practical terms (e.g., 'to write files or pass data without shell escaping'), but it doesn't significantly enhance the parameter understanding beyond the schema. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Execute a shell command'), the environment ('in an isolated Debian Linux sandbox'), and the execution context ('Commands run in bash'). It distinguishes this as a sandboxed execution tool with no sibling tools to differentiate from.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use certain parameters (e.g., 'Use the working_directory parameter or chain commands with && to control execution context', 'To write files or pass data without shell escaping, use the stdin parameter'), but since there are no sibling tools, it cannot offer guidance on alternatives. It effectively explains usage scenarios without exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

sandbox_execA

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Kiln-AI/kilntainers'

If you have feedback or need assistance with the MCP directory API, please join our Discord server