How Chat UIs Communicate with MCP Servers

The user interface (UI) of a chatbot is evolving from a simple text-based display to a dynamic client that communicates with a sophisticated backend, often powered by a Model Context Protocol (MCP) agent. This article explores the architecture of a tool-enabled chat interface, focusing on the frontend-to-backend communication flow. We will walk through the process of routing user requests, invoking external tools, and delivering real-time, streamed responses. The goal is to provide a clear understanding of the protocols and design patterns that enable these complex interactions, moving beyond a basic request-response model to a more robust, scalable, and user-friendly system.

Frontend-Backend Communication Flow

A key architectural shift in tool-enabled chatbots is the separation of concerns. The frontend (the chat UI) is responsible for handling user input and rendering responses. The backend (the MCP server) orchestrates the agent's logic, manages the tool context, and interacts with external tools. The communication between these two layers must be robust, efficient, and able to handle multi-turn interactions and real-time updates.

User Request to Agent Action

When a user types a message and hits enter, the UI sends the request to the backend. This is typically an asynchronous call to a server endpoint. The backend’s job is to:

Receive the user's message.
Route the message to the MCP agent.
Initiate the MCP flow, which may involve one or more tool calls.

Here is a basic example of how the frontend might send a request to the backend using a simple fetch API call.

// src/api/chat.ts interface ChatMessage { role: 'user' | 'assistant' | 'tool'; content: string; } export async function sendMessage(message: string): Promise<ChatMessage[]> { try { const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ message }), }); if (!response.ok) { throw new Error(`HTTP error! status: ${response.status}`); } const data: ChatMessage[] = await response.json(); return data; } catch (error) { console.error("Failed to send message:", error); throw error; } }

On the backend, a serverless function or a dedicated API endpoint receives this request. The core logic within this endpoint would then trigger the MCP agent.

// src/server/api/chat.ts import { mcpAgent } from '../mcp/agent'; export async function handleChatRequest(req: Request): Promise<Response> { const { message } = await req.json(); const agentResponse = await mcpAgent.handleRequest(message); return new Response(JSON.stringify(agentResponse), { headers: { 'Content-Type': 'application/json' }, }); }

This simple request-response model works for basic interactions but falls short when dealing with multi-step tool calls and long-running operations. The user is left waiting, with no feedback on the agent's progress.

Streaming for Real-Time Feedback

To provide a better user experience, modern chat UIs use streaming protocols. This allows the backend to send partial responses as they become available, giving the user real-time feedback on the agent's progress. Two common protocols for this are Server-Sent Events (SSE) and WebSockets¹.

Using Server-Sent Events (SSE)

SSE is a lightweight, one-way protocol for real-time communication from the server to the client over a standard HTTP connection. It is an excellent choice for chat UIs that need to display agent progress, such as "Searching for flight information..." or "Booking your reservation...". The frontend subscribes to an SSE endpoint, and the server pushes events to the client as they occur².

On the frontend, the UI subscribes to the event stream:

// src/components/ChatStreamComponent.tsx import { useEffect, useRef } from 'react'; const ChatStreamComponent = ({ userId, message }) => { const eventSourceRef = useRef<EventSource | null>(null); useEffect(() => { // The query parameter is used to send the user's message eventSourceRef.current = new EventSource(`/api/chat-stream?userId=${userId}&message=${encodeURIComponent(message)}`); eventSourceRef.current.onmessage = (event) => { const data = JSON.parse(event.data); // Logic to append message data to the chat history state // This allows the UI to update in real-time as events come in console.log('Received streamed data:', data); }; eventSourceRef.current.onerror = (error) => { console.error('SSE Error:', error); eventSourceRef.current?.close(); }; return () => { eventSourceRef.current?.close(); }; }, [userId, message]); // ... rest of the component };

On the backend, the server keeps the connection open and sends events as the MCP agent progresses through its steps³.

// src/server/api/chat-stream.ts import { mcpAgent } from '../mcp/agent'; import { AgentProgressEvent } from '../mcp/types'; // Define event types export async function handleStreamRequest(req: Request): Promise<Response> { const { userId, message } = req.query; const encoder = new TextEncoder(); const readableStream = new ReadableStream({ async start(controller) { // Stream a message indicating the start of the process controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: 'status', content: 'Thinking...' })}\n\n`)); const progressEvents = mcpAgent.run(userId, message); for await (const event of progressEvents) { if (event.type === 'tool_call') { // Send a specific event for a tool call const data: AgentProgressEvent = { type: 'tool_call_event', tool_name: event.tool_name, status: 'in-progress' }; controller.enqueue(encoder.encode(`data: ${JSON.stringify(data)}\n\n`)); } else if (event.type === 'tool_result') { // Send a specific event when a tool result is received const data: AgentProgressEvent = { type: 'tool_result_event', tool_name: event.tool_name, status: 'completed', result: event.result }; controller.enqueue(encoder.encode(`data: ${JSON.stringify(data)}\n\n`)); } } // Finally, send the final response const finalResponse = await progressEvents.getFinalResponse(); controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: 'final_response', content: finalResponse })}\n\n`)); controller.close(); } }); return new Response(readableStream, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', }, }); }

This streaming model allows the chat UI to display intermediate steps in the conversation flow, which is particularly useful for complex, multi-turn tool interactions where the user might otherwise be waiting for a significant period with no feedback⁴.

Behind the Scenes: The MCP Agent's Role

The MCP agent is the core of this system. It doesn't just process prompts; it orchestrates a series of actions based on a well-defined state. The agent's process includes:

Request Parsing: The agent receives the user's message and the current Tool Context.
Tool Selection: Based on the message and context, the agent decides which tool (if any) is needed and what action to perform.
Tool Call Generation: It generates a structured Tool Call object⁵.
Tool Invocation: The MCP framework executes the Tool Call and awaits the Tool Result.
Context Update: The Tool Result is added to the Tool Context.
Response Generation: The agent uses the updated context to decide its next step: either generating a user-facing response or making another Tool Call.

This multi-step, stateful process is what enables sophisticated, tool-based workflows. The chat UI, with its streaming capabilities, acts as a window into this dynamic process, displaying each step as it occurs. This transparency is crucial for managing user expectations and for debugging purposes⁶.

Designing for Interactivity and Transparency

A key part of the chat UI design for MCP agents is making the internal process transparent to the user. Simply receiving chunks of text and displaying them as a single continuous message is not sufficient. The UI should visually represent the agent's actions to build trust and provide clear feedback⁷.

Action Indicators: Use visual cues like icons or labels to show when the agent is performing a specific action, e.g., a "🔍" icon for a search tool or a "📅" for a calendar tool.
Progress Messages: Display short, informative messages like "Searching for flights..." or "Retrieving user data from the database..." as the streaming events arrive.
Contextual Cues: When a tool result is received, the UI can format the output differently (e.g., in a code block or a structured card) to clearly distinguish it from the agent's prose. This is particularly useful for showing data retrieved from an external API or database⁸.

This design philosophy shifts the chat experience from a simple dialogue to a collaborative process where the user is an informed observer of the agent's problem-solving journey⁹.

My Thoughts

The shift to a tool-enabled chatbot architecture demands a re-evaluation of UI design and communication protocols. Simply returning a final response after all tool calls have completed is a poor user experience for complex tasks. It leaves the user in the dark, wondering if the system is working or has stalled. Streaming protocols like SSE address this by bridging the gap between the agent's internal process and the user's perception. This transparency builds user trust and makes complex interactions feel more natural.

Furthermore, a well-designed UI should not only display the final response but also provide visual cues for tool use. For example, showing a loading indicator specific to a tool call can make the agent's actions more transparent. The future of these interfaces lies in even tighter integration between the frontend and the MCP protocol itself. The UI could become more proactive, perhaps suggesting a tool to the user before the agent even decides to use it, based on the user's input. This would shift the UI from a passive display to an active participant in the tool-enabled dialogue, paving the way for truly intuitive and functional conversational systems¹⁰.

References