Grok CLI MCP Server

Grok Chat

grok_chat

Send multi-turn conversations to Grok AI by flattening role-based messages into a single prompt for CLI compatibility.

Instructions

Send a list of role/content messages to Grok by flattening into a single prompt. Useful for multi-turn context when the CLI only supports a single '-p' prompt.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`messages`	Yes
`model`	No
`raw_output`	No
`timeout_s`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/grok_cli_mcp/server.py:80-126 (handler)

The main handler function for the grok_chat tool. It flattens a list of GrokMessage objects into a single prompt string with role prefixes, invokes the Grok CLI using the _run_grok helper, collects the assistant's text response, and returns either the text or a structured dict if raw_output is True.

async def grok_chat(
    messages: list[GrokMessage],
    model: Optional[str] = None,
    raw_output: bool = False,
    timeout_s: float = 120.0,
    ctx: Optional[Context] = None,
) -> str | dict:
    """
    Send multi-turn conversation to Grok.

    Args:
        messages: Array of {role, content}; content may be string or structured.
        model: Optional Grok model name.
        raw_output: If true, returns structured output.
        timeout_s: Process timeout in seconds.
        ctx: FastMCP context.

    Returns:
        Assistant's text response, or dict with full details if raw_output=True.
    """
    # Flatten messages to a single prompt with role tags
    # This preserves some conversational context even if CLI accepts only single prompt.
    prompt_lines: list[str] = []
    for m in messages:
        content_str: str
        if isinstance(m.content, str):
            content_str = m.content
        else:
            try:
                content_str = json.dumps(m.content, ensure_ascii=False)
            except Exception:
                content_str = str(m.content)
        prompt_lines.append(f"{m.role.capitalize()}: {content_str}")
    prompt = "\n".join(prompt_lines)

    result = await _run_grok(prompt, model=model, timeout_s=timeout_s, ctx=ctx)
    assistant_text = _collect_assistant_text(result.messages) if result.messages else (result.raw or "")

    if raw_output:
        return {
            "text": assistant_text,
            "messages": [m.model_dump() for m in result.messages],
            "raw": result.raw,
            "model": result.model,
        }

    return assistant_text

src/grok_cli_mcp/types.py:10-15 (schema)
Pydantic model defining the structure of a single message (role and content) used as input for the grok_chat tool's 'messages' parameter.
```
class GrokMessage(BaseModel):
    """A message in a Grok conversation."""

    role: str
    content: Any  # Grok may return str or structured content
```

src/grok_cli_mcp/server.py:72-79 (registration)

The @server.tool decorator that registers the grok_chat function as an MCP tool, specifying its name, title, and description.

@server.tool(
    name="grok_chat",
    title="Grok Chat",
    description=(
        "Send a list of role/content messages to Grok by flattening into a single prompt. "
        "Useful for multi-turn context when the CLI only supports a single '-p' prompt."
    ),
)

src/grok_cli_mcp/utils.py:147-244 (helper)

Core helper function that runs the Grok CLI binary in headless mode with the given prompt and model, parses the JSON output into GrokMessage objects, handles errors and timeouts, and returns a structured GrokParsedOutput.

async def _run_grok(
    prompt: str,
    *,
    model: Optional[str],
    timeout_s: float,
    ctx: Optional[Context] = None,
) -> GrokParsedOutput:
    """
    Run Grok CLI in headless mode: `grok -p "<prompt>" [-m <model>]`

    Parse JSON output and return a structured response.

    Args:
        prompt: The prompt to send to Grok.
        model: Optional Grok model name (passed with -m if provided).
        timeout_s: Process timeout in seconds.
        ctx: Optional FastMCP context for logging.

    Returns:
        GrokParsedOutput with messages, model, and raw output.

    Raises:
        FileNotFoundError: If Grok CLI binary not found.
        TimeoutError: If CLI execution exceeds timeout.
        RuntimeError: If CLI exits with non-zero code.
    """
    grok_bin = _resolve_grok_path()
    if not shutil.which(grok_bin) and not os.path.exists(grok_bin):
        raise FileNotFoundError(
            f"Grok CLI not found. Checked {grok_bin} and PATH. "
            f"Set {ENV_GROK_CLI_PATH} or install grok CLI."
        )

    _require_api_key()

    args = [grok_bin, "-p", prompt]
    if model:
        # Only pass -m if caller supplied a model; if CLI rejects, the error will be caught
        args += ["-m", model]

    env = os.environ.copy()
    # Ensure GROK_API_KEY is present in the subprocess environment
    env[ENV_GROK_API_KEY] = env[ENV_GROK_API_KEY]

    if ctx:
        await ctx.info(f"Invoking Grok CLI {'with model ' + model if model else ''}...")

    proc = await asyncio.create_subprocess_exec(
        *args, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, env=env
    )

    try:
        stdout_b, stderr_b = await asyncio.wait_for(proc.communicate(), timeout=timeout_s)
    except asyncio.TimeoutError:
        try:
            proc.kill()
        except Exception:
            pass
        raise TimeoutError(f"Grok CLI timed out after {timeout_s:.0f}s")

    stdout = (stdout_b or b"").decode("utf-8", errors="replace")
    stderr = (stderr_b or b"").decode("utf-8", errors="replace")

    if proc.returncode != 0:
        # Grok CLI error; include stderr to help debugging
        raise RuntimeError(f"Grok CLI failed (exit {proc.returncode}): {stderr.strip() or stdout.strip()}")

    # Parse JSON payload
    parsed: Any
    try:
        parsed = _extract_json_from_text(stdout)
    except Exception as e:
        # If JSON parse fails, provide raw output in a structured wrapper
        if ctx:
            await ctx.warning(f"Failed to parse Grok JSON output: {e}. Returning raw output.")
        return GrokParsedOutput(messages=[], model=model, raw=stdout)

    # Normalize to list of GrokMessage
    messages: list[GrokMessage] = []
    if isinstance(parsed, dict) and "role" in parsed and "content" in parsed:
        messages = [GrokMessage(**parsed)]
    elif isinstance(parsed, list):
        # Either a list of messages or a list with one message
        for item in parsed:
            if isinstance(item, dict) and "role" in item and "content" in item:
                messages.append(GrokMessage(**item))
    elif isinstance(parsed, dict) and "messages" in parsed:
        for item in parsed.get("messages", []) or []:
            if isinstance(item, dict) and "role" in item and "content" in item:
                messages.append(GrokMessage(**item))
    else:
        # Unknown shape: keep raw and empty messages
        if ctx:
            await ctx.warning("Unrecognized JSON shape from Grok CLI. Returning raw output.")
        return GrokParsedOutput(messages=[], model=model, raw=stdout)

    return GrokParsedOutput(messages=messages, model=model, raw=stdout)

src/grok_cli_mcp/utils.py:105-144 (helper)

Helper function to extract and concatenate text content from assistant messages, handling various content formats (str, list of blocks, dict). Used to produce the final text response.

def _collect_assistant_text(messages: Sequence[GrokMessage]) -> str:
    """
    Collate assistant message text from a sequence of messages.

    Handles:
      - content as a plain string
      - content as a list of blocks with 'type'=='text'
      - content as a dict with 'text' field

    Args:
        messages: Sequence of GrokMessage objects.

    Returns:
        Concatenated text from all assistant messages.
    """
    chunks: list[str] = []
    for m in messages:
        if m.role != "assistant":
            continue
        c = m.content
        if isinstance(c, str):
            chunks.append(c)
        elif isinstance(c, list):
            for block in c:
                try:
                    if isinstance(block, dict) and block.get("type") == "text" and "text" in block:
                        chunks.append(str(block["text"]))
                    elif isinstance(block, dict) and "content" in block:
                        chunks.append(str(block["content"]))
                except Exception:
                    continue
        elif isinstance(c, dict) and "text" in c:
            chunks.append(str(c["text"]))
        else:
            # Fallback: stringify structured content
            try:
                chunks.append(json.dumps(c, ensure_ascii=False))
            except Exception:
                chunks.append(str(c))
    return "\n".join([s for s in (s.strip() for s in chunks) if s])

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. The description mentions flattening messages into a single prompt, which is useful context. However, it doesn't disclose important behavioral traits like authentication requirements, rate limits, error handling, or what the output looks like (though an output schema exists). For a tool with no annotation coverage, this leaves significant gaps in understanding how the tool behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with just two sentences. The first sentence states the core functionality, and the second sentence provides the key usage context. Every word earns its place with zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there's an output schema (which handles return values) and no annotations, the description provides adequate basic context about what the tool does and when to use it. However, with 4 parameters and 0% schema description coverage, the lack of parameter guidance in the description creates a significant gap. The description is complete enough for understanding the tool's purpose but insufficient for effective parameter usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning none of the 4 parameters have descriptions in the schema. The tool description doesn't mention any parameters at all, failing to compensate for the lack of schema documentation. While the description implies the 'messages' parameter is central, it provides no guidance on what 'model', 'raw_output', or 'timeout_s' do or how to use them effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Send a list of role/content messages to Grok by flattening into a single prompt.' This specifies the verb ('send'), resource ('messages to Grok'), and transformation ('flattening'). However, it doesn't explicitly differentiate from sibling tools like grok_code or grok_query, which likely handle different types of interactions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'Useful for multi-turn context when the CLI only supports a single '-p' prompt.' This explains the specific scenario where this tool is valuable (multi-turn conversations with a CLI limitation). However, it doesn't explicitly mention when NOT to use it or provide alternatives to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/BasisSetVentures/grok-cli-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server