Skip to main content
Glama
Kiln-AI

Kilntainers

Official
by Kiln-AI

sandbox_exec

Execute shell commands in an isolated Debian Linux sandbox to run code safely without affecting the host system. Use stdin for data input and set timeouts for long operations.

Instructions

Execute a shell command in an isolated Debian Linux sandbox. Commands run in bash. Each call is independent — no state (shell variables, working directory) persists between calls (however filesystem does persist). Use the working_directory parameter or chain commands with && to control execution context.

To write files or pass data without shell escaping, use the stdin parameter (e.g., command="cat > file.txt" with content in stdin). Commands time out after 120 seconds by default (override with the timeout parameter for long-running operations).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
commandNoShell command string (mutually exclusive with args).
argsNoList of arguments for direct execution (mutually exclusive with command).
stdinNoContent to pipe to stdin.
working_directoryNoWorking directory for the command (must be absolute).
timeoutNoTimeout in seconds (defaults to server config).

Implementation Reference

  • The _create_handler function creates and returns the sandbox_exec_handler, which is the core handler that executes tool logic. It sanitizes inputs, validates them, gets/creates the sandbox from context, constructs ExecRequest, executes the command via sandbox.exec(), and returns the formatted CallToolResult.
    def _create_handler(config: ServerConfig) -> Callable[..., Any]:
        """Create the sandbox_exec handler with server config bound via closure.
    
        Args:
            config: The server configuration containing defaults.
    
        Returns:
            An async handler function for the sandbox_exec tool.
        """
    
        async def sandbox_exec_handler(
            command: str | None = None,
            args: list[str] | None = None,
            stdin: str | None = None,
            working_directory: str | None = None,
            timeout: int | None = None,
            ctx: Context[ServerSession, SessionContext] | None = None,
        ) -> CallToolResult:
            """Handle a sandbox_exec tool call.
    
            Args:
                command: Shell command string (mutually exclusive with args).
                args: List of arguments for direct execution (mutually exclusive with command).
                stdin: Content to pipe to stdin.
                working_directory: Working directory for the command (must be absolute).
                timeout: Timeout in seconds (defaults to server config).
                ctx: FastMCP context object (injected automatically).
    
            Returns:
                A CallToolResult with the execution result or error.
            """
            # --- Input sanitization ---
            if args is not None and len(args) == 0:
                args = None
            if command is not None and len(command) == 0:
                command = None
            if working_directory is not None and len(working_directory) == 0:
                working_directory = None
            if stdin is not None and len(stdin) == 0:
                stdin = None
    
            # --- Input validation ---
            error = _validate_inputs(command, args, stdin, working_directory, timeout)
            if error is not None:
                return CallToolResult(
                    content=[TextContent(type="text", text=error)],
                    isError=True,
                )
    
            # --- Get sandbox from context ---
            # ctx should always be provided by FastMCP, but handle None for safety
            if ctx is None:
                return CallToolResult(
                    content=[
                        TextContent(type="text", text="Internal error: no context provided")
                    ],
                    isError=True,
                )
    
            session_context = ctx.request_context.lifespan_context
    
            # --- Lazy sandbox creation ---
            try:
                sandbox = await session_context.get_or_create_sandbox()
            except BackendError as e:
                return CallToolResult(
                    content=[TextContent(type="text", text=str(e))],
                    isError=True,
                )
    
            # --- Construct ExecRequest ---
            request = ExecRequest(
                command=command,
                args=args,
                stdin=stdin,
                working_directory=working_directory,
                timeout=timeout if timeout is not None else config.default_timeout,
                output_limit=config.output_limit,
            )
    
            # --- Execute ---
            try:
                result = await sandbox.exec(request)
            except SandboxDiedError as e:
                return CallToolResult(
                    content=[TextContent(type="text", text=str(e))],
                    isError=True,
                )
    
            # --- Format response ---
            response_json = json.dumps(
                {
                    "stdout": result.stdout,
                    "stderr": result.stderr,
                    "exit_code": result.exit_code,
                    "exec_duration_ms": result.exec_duration_ms,
                }
            )
    
            return CallToolResult(
                content=[TextContent(type="text", text=response_json)],
                isError=False,
            )
    
        return sandbox_exec_handler
  • ExecRequest dataclass defines the validated input parameters for command execution, including command/args (mutually exclusive), stdin, working_directory, timeout, and output_limit. Includes __post_init__ validation for mutual exclusivity and constraints.
    @dataclass(frozen=True, slots=True, kw_only=True)
    class ExecRequest:
        """Validated parameters for a command execution.
    
        Constructed by the MCP layer after input validation. The MCP layer
        resolves defaults (effective timeout, configured output limit) so
        the backend always receives concrete values.
    
        kw_only=True forces callers to use keyword arguments, which is
        clearer for a dataclass with many optional fields.
        """
    
        # Exactly one of command or args must be provided
        command: str | None = None
        args: list[str] | None = None
    
        # Optional parameters
        stdin: str | None = None
        working_directory: str | None = None
    
        # Always provided by MCP layer (defaults resolved before reaching backend)
        timeout: int  # seconds
        output_limit: int  # bytes
    
        def __post_init__(self) -> None:
            # Validate mutual exclusivity of command/args
            if self.command is not None and self.args is not None:
                raise ValueError("command and args are mutually exclusive")
            if self.command is None and self.args is None:
                raise ValueError("either command or args must be provided")
    
            # Validate working_directory is absolute
            if self.working_directory is not None and not self.working_directory.startswith(
                "/"
            ):
                raise ValueError("working_directory must be an absolute path")
    
            # Validate timeout
            if self.timeout < 1:
                raise ValueError("timeout must be at least 1 second")
    
            # Validate output_limit
            if self.output_limit < 1:
                raise ValueError("output_limit must be positive")
  • ExecResult dataclass defines the output schema for execution results, containing stdout, stderr, exit_code, and exec_duration_ms fields.
    @dataclass(frozen=True, slots=True)
    class ExecResult:
        """Result of a command execution.
    
        The return type from every exec call. Immutable, with no optional
        fields — every execution produces all four values.
    
        Maps directly to the MCP response schema (Functional spec §2.2).
        The MCP layer serializes this to JSON for the tool response.
        """
    
        stdout: str
        stderr: str
        exit_code: int
        exec_duration_ms: int
  • Tool registration via mcp.add_tool() - registers the sandbox_exec wrapper function with name 'sandbox_exec' and the assembled description.
    mcp.add_tool(
        sandbox_exec,
        name="sandbox_exec",
        description=description,
    )
  • The _validate_inputs helper function validates tool inputs - ensures exactly one of command/args is provided, working_directory is absolute, timeout is positive, and stdin doesn't exceed the 2 MiB limit.
    def _validate_inputs(
        command: str | None,
        args: list[str] | None,
        stdin: str | None,
        working_directory: str | None,
        timeout: int | None,
    ) -> str | None:
        """Validate tool inputs.
    
        Returns error message or None if valid.
    
        Args:
            command: The shell command string, if using command mode.
            args: The list of arguments, if using args mode.
            stdin: The stdin content to pipe to the command.
            working_directory: The working directory for the command.
            timeout: The timeout in seconds.
    
        Returns:
            An error message string if validation fails, None otherwise.
        """
        # Exactly one of command or args
        if command is not None and args is not None:
            return "Cannot provide both 'command' and 'args'. Use 'command' for shell commands or 'args' for direct execution."
        if command is None and args is None:
            return "Must provide either 'command' or 'args'."
    
        # working_directory must be absolute
        if working_directory is not None and not working_directory.startswith("/"):
            return f"working_directory must be an absolute path, got: {working_directory}"
    
        # timeout must be positive
        if timeout is not None and timeout < 1:
            return "timeout must be at least 1 second."
    
        # stdin size limit (D32)
        if stdin is not None and len(stdin.encode("utf-8")) > STDIN_LIMIT:
            return (
                f"stdin content exceeds the 2 MiB limit "
                f"({len(stdin.encode('utf-8'))} bytes). "
                f"Split into smaller chunks or use a different approach."
            )
    
        return None
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and excels by disclosing key behavioral traits: the sandbox is isolated, commands run in bash, no state persists between calls (except filesystem), default timeout of 120 seconds, and persistence of filesystem. This covers safety, execution environment, and operational constraints comprehensively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by essential behavioral details and usage tips in a logical flow. Every sentence adds value—explaining state persistence, parameter usage, and timeout—with zero waste or redundancy. It's efficiently structured and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (sandboxed execution with multiple parameters), no annotations, and no output schema, the description does an excellent job covering execution environment, behavioral traits, and parameter guidance. However, it lacks details on output format or error handling, which would be beneficial for an agent. It's nearly complete but has a minor gap in output expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters well. The description adds some value by explaining the purpose of stdin and working_directory in practical terms (e.g., 'to write files or pass data without shell escaping'), but it doesn't significantly enhance the parameter understanding beyond the schema. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Execute a shell command'), the environment ('in an isolated Debian Linux sandbox'), and the execution context ('Commands run in bash'). It distinguishes this as a sandboxed execution tool with no sibling tools to differentiate from.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use certain parameters (e.g., 'Use the working_directory parameter or chain commands with && to control execution context', 'To write files or pass data without shell escaping, use the stdin parameter'), but since there are no sibling tools, it cannot offer guidance on alternatives. It effectively explains usage scenarios without exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Kiln-AI/kilntainers'

If you have feedback or need assistance with the MCP directory API, please join our Discord server