Skip to main content
Glama
BasisSetVentures

Grok CLI MCP Server

Grok Code Task

grok_code

Generate code or get programming guidance by providing tasks, language hints, and context through the Grok CLI MCP Server.

Instructions

Ask Grok for code or code-related guidance. You can provide a language hint and context (e.g., file snippets or requirements). Returns assistant text by default.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
taskYes
languageNo
contextNo
modelNo
raw_outputNo
timeout_sNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The 'grok_code' tool handler: registers the tool via @server.tool and implements the logic by constructing a code-specific system prompt with task, language, and context, then invokes _run_grok to run Grok CLI and extracts the assistant response.
    @server.tool(
        name="grok_code",
        title="Grok Code Task",
        description=(
            "Ask Grok for code or code-related guidance. You can provide a language hint and context "
            "(e.g., file snippets or requirements). Returns assistant text by default."
        ),
    )
    async def grok_code(
        task: str,
        language: Optional[str] = None,
        context: Optional[str] = None,
        model: Optional[str] = None,
        raw_output: bool = False,
        timeout_s: float = 180.0,
        ctx: Optional[Context] = None,
    ) -> str | dict:
        """
        Ask Grok for code generation or guidance.
    
        Args:
            task: Description of what code/help you need.
            language: Optional language hint (e.g., 'python', 'typescript').
            context: Optional context (repo constraints, file snippets, tests, etc.).
            model: Optional Grok model name.
            raw_output: If true, returns structured output.
            timeout_s: Process timeout in seconds.
            ctx: FastMCP context.
    
        Returns:
            Assistant's text response, or dict with full details if raw_output=True.
        """
        sys_instructions = [
            "You are an expert software engineer.",
            "Respond with clear, correct, directly usable code and concise explanations.",
            "Prefer minimal dependencies and explain tradeoffs when relevant.",
        ]
        if language:
            sys_instructions.append(f"Primary language: {language}")
        if context:
            sys_instructions.append("Context:\n" + context.strip())
    
        prompt = "\n\n".join(
            [
                "\n".join(sys_instructions),
                "Task:",
                task.strip(),
            ]
        )
    
        result = await _run_grok(prompt, model=model, timeout_s=timeout_s, ctx=ctx)
        assistant_text = _collect_assistant_text(result.messages) if result.messages else (result.raw or "")
    
        if raw_output:
            return {
                "text": assistant_text,
                "messages": [m.model_dump() for m in result.messages],
                "raw": result.raw,
                "model": result.model,
            }
    
        return assistant_text
  • Helper function _run_grok invoked by grok_code: runs the Grok CLI as a subprocess with the constructed prompt, handles timeout and errors, parses JSON output into structured messages.
    async def _run_grok(
        prompt: str,
        *,
        model: Optional[str],
        timeout_s: float,
        ctx: Optional[Context] = None,
    ) -> GrokParsedOutput:
        """
        Run Grok CLI in headless mode: `grok -p "<prompt>" [-m <model>]`
    
        Parse JSON output and return a structured response.
    
        Args:
            prompt: The prompt to send to Grok.
            model: Optional Grok model name (passed with -m if provided).
            timeout_s: Process timeout in seconds.
            ctx: Optional FastMCP context for logging.
    
        Returns:
            GrokParsedOutput with messages, model, and raw output.
    
        Raises:
            FileNotFoundError: If Grok CLI binary not found.
            TimeoutError: If CLI execution exceeds timeout.
            RuntimeError: If CLI exits with non-zero code.
        """
        grok_bin = _resolve_grok_path()
        if not shutil.which(grok_bin) and not os.path.exists(grok_bin):
            raise FileNotFoundError(
                f"Grok CLI not found. Checked {grok_bin} and PATH. "
                f"Set {ENV_GROK_CLI_PATH} or install grok CLI."
            )
    
        _require_api_key()
    
        args = [grok_bin, "-p", prompt]
        if model:
            # Only pass -m if caller supplied a model; if CLI rejects, the error will be caught
            args += ["-m", model]
    
        env = os.environ.copy()
        # Ensure GROK_API_KEY is present in the subprocess environment
        env[ENV_GROK_API_KEY] = env[ENV_GROK_API_KEY]
    
        if ctx:
            await ctx.info(f"Invoking Grok CLI {'with model ' + model if model else ''}...")
    
        proc = await asyncio.create_subprocess_exec(
            *args, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, env=env
        )
    
        try:
            stdout_b, stderr_b = await asyncio.wait_for(proc.communicate(), timeout=timeout_s)
        except asyncio.TimeoutError:
            try:
                proc.kill()
            except Exception:
                pass
            raise TimeoutError(f"Grok CLI timed out after {timeout_s:.0f}s")
    
        stdout = (stdout_b or b"").decode("utf-8", errors="replace")
        stderr = (stderr_b or b"").decode("utf-8", errors="replace")
    
        if proc.returncode != 0:
            # Grok CLI error; include stderr to help debugging
            raise RuntimeError(f"Grok CLI failed (exit {proc.returncode}): {stderr.strip() or stdout.strip()}")
    
        # Parse JSON payload
        parsed: Any
        try:
            parsed = _extract_json_from_text(stdout)
        except Exception as e:
            # If JSON parse fails, provide raw output in a structured wrapper
            if ctx:
                await ctx.warning(f"Failed to parse Grok JSON output: {e}. Returning raw output.")
            return GrokParsedOutput(messages=[], model=model, raw=stdout)
    
        # Normalize to list of GrokMessage
        messages: list[GrokMessage] = []
        if isinstance(parsed, dict) and "role" in parsed and "content" in parsed:
            messages = [GrokMessage(**parsed)]
        elif isinstance(parsed, list):
            # Either a list of messages or a list with one message
            for item in parsed:
                if isinstance(item, dict) and "role" in item and "content" in item:
                    messages.append(GrokMessage(**item))
        elif isinstance(parsed, dict) and "messages" in parsed:
            for item in parsed.get("messages", []) or []:
                if isinstance(item, dict) and "role" in item and "content" in item:
                    messages.append(GrokMessage(**item))
        else:
            # Unknown shape: keep raw and empty messages
            if ctx:
                await ctx.warning("Unrecognized JSON shape from Grok CLI. Returning raw output.")
            return GrokParsedOutput(messages=[], model=model, raw=stdout)
    
        return GrokParsedOutput(messages=messages, model=model, raw=stdout)
  • Helper function _collect_assistant_text used in grok_code to extract and concatenate text from assistant messages in the parsed output.
    def _collect_assistant_text(messages: Sequence[GrokMessage]) -> str:
        """
        Collate assistant message text from a sequence of messages.
    
        Handles:
          - content as a plain string
          - content as a list of blocks with 'type'=='text'
          - content as a dict with 'text' field
    
        Args:
            messages: Sequence of GrokMessage objects.
    
        Returns:
            Concatenated text from all assistant messages.
        """
        chunks: list[str] = []
        for m in messages:
            if m.role != "assistant":
                continue
            c = m.content
            if isinstance(c, str):
                chunks.append(c)
            elif isinstance(c, list):
                for block in c:
                    try:
                        if isinstance(block, dict) and block.get("type") == "text" and "text" in block:
                            chunks.append(str(block["text"]))
                        elif isinstance(block, dict) and "content" in block:
                            chunks.append(str(block["content"]))
                    except Exception:
                        continue
            elif isinstance(c, dict) and "text" in c:
                chunks.append(str(c["text"]))
            else:
                # Fallback: stringify structured content
                try:
                    chunks.append(json.dumps(c, ensure_ascii=False))
                except Exception:
                    chunks.append(str(c))
        return "\n".join([s for s in (s.strip() for s in chunks) if s])
  • Pydantic schema for GrokParsedOutput returned by _run_grok and used in grok_code's raw_output mode.
    class GrokParsedOutput(BaseModel):
        """Parsed output from Grok CLI."""
    
        messages: list[GrokMessage] = Field(default_factory=list)
        model: Optional[str] = None
        raw: Optional[str] = None
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions 'Returns assistant text by default,' which hints at output behavior, but doesn't disclose critical traits like whether this is a read-only or mutating operation, authentication needs, rate limits, error handling, or what 'raw_output' and 'timeout_s' parameters imply. For a tool with 6 parameters and no annotations, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with the core purpose stated first. Both sentences add value: the first defines the tool's function, and the second clarifies parameters and output. There's no wasted text, making it efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, no annotations, but with an output schema), the description is moderately complete. It covers the basic purpose and hints at some parameters, but since there's an output schema, it doesn't need to detail return values. However, the lack of behavioral disclosure and incomplete parameter semantics make it inadequate for full understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions 'language hint and context (e.g., file snippets or requirements),' which adds meaning for 'language' and 'context' parameters, but doesn't explain 'task,' 'model,' 'raw_output,' or 'timeout_s.' With 6 parameters, this partial coverage leaves significant gaps in understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Ask Grok for code or code-related guidance.' It specifies the verb ('Ask Grok') and resource ('code or code-related guidance'), making it understandable. However, it doesn't explicitly differentiate from sibling tools like grok_chat or grok_query, which likely handle different types of queries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some implied usage context by mentioning 'code or code-related guidance' and suggesting parameters like language hint and context. However, it lacks explicit guidance on when to use this tool versus alternatives (e.g., grok_chat for general chat, grok_query for non-code queries), and doesn't specify prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/BasisSetVentures/grok-cli-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server