Scaling Agentic Commerce: Shopify's Implementation of the Interactive Model Context Protocol (MCP UI)
Written by Om-Shree-0709 on .
- The Core Architecture of Model Context Protocol (MCP)
- MCP UI — The Interactive E-commerce Agent
- Behind the Scenes / How It Works: The Intent System
- My Thoughts
- Acknowledgements
- References
The Model Context Protocol (MCP) is an open standard designed to facilitate secure, bi-directional communication between Large Language Models (LLMs) and external systems, data sources, and tools1. It defines a common language for AI agents (applications powered by LLMs) to discover and invoke capabilities (known as Tools) and retrieve information (known as Resources) from external MCP Servers.
However, the initial success of MCP in domains like coding and data retrieval highlighted a critical limitation in highly visual and interactive industries: the user experience was confined to text-based conversation, often resulting in a wall of text (the "text-wall" problem) when dealing with complex objects like product variants, image galleries, or complex forms2.
For a commerce platform like Shopify, where rich UI is central to the conversion funnel, this limitation posed a direct threat to the usability of its AI agents. The solution was the development and open-sourcing of MCP UI, an extension that allows MCP Servers to return fully interactive web components, bridging the gap between conversational AI and traditional application user interfaces2. This advancement represents a foundational technical lesson in scaling agentic capabilities beyond mere text generation into true, interactive workflow automation3.
The Core Architecture of Model Context Protocol (MCP)
To understand MCP UI, one must first grasp the three fundamental components of the MCP architecture4: the Host, the Client, and the Server. The protocol abstracts away network and LLM specifics, focusing purely on context exchange.
1. The Host and Client
The MCP Host is the AI application or environment (e.g., an LLM-powered IDE, a custom chat interface) that contains the LLM and coordinates the overall interaction. The Host initiates the request and manages the conversation history, user consent, and security policies.
Within the Host resides the MCP Client. The Client acts as the crucial protocol translator and session manager.
Translation: It converts the LLM's request (often a function-call type object generated by the model) into a structured JSON-RPC 2.0 message for the server.
Session Management: The Client maintains a dedicated, stateful connection (1:1 relationship) with a specific MCP Server, managing lifecycle events, capability negotiation, and secure boundaries4.
2. The MCP Server and Primitives
The MCP Server is the external service that provides specialized context and capabilities. It operates independently and focuses on a defined functional domain (e.g., a "Product Catalog Server" or a "Git Repository Server"). Servers expose capabilities using three main primitives:
Tools: Executable functions that allow the LLM to perform an action or retrieve data with a side-effect (e.g.,
checkout_cart,update_inventory). They are defined using a JSON Schema contract for input and output.Resources: Structured data or content that provides context to the LLM (e.g., a database schema, the contents of a file). Resources are often referenced by URI and can be subscribed to by the Client.
Prompts: Reusable templates or few-shot examples that guide the LLM's use of Tools and Resources for specific, complex workflows.
3. The Transport Layer: JSON-RPC 2.0
MCP standardizes the data exchange using JSON-RPC 2.0 over two main transport methods3:
Standard Input/Output (STDIO): Ideal for local, synchronous connections (e.g., a file system server running on the same machine as the Host).
Server-Sent Events (SSE) / HTTP POST: Used for remote, asynchronous, event-driven interactions, where the Client uses HTTP POST for requests and the Server uses SSE for streaming responses and notifications.
MCP UI — The Interactive E-commerce Agent
Shopify recognized that while the core MCP allowed an agent to reason about commerce—e.g., "The user wants a large red shirt"—it could not deliver the necessary visual confirmation and complex interaction required to complete the commerce loop. MCP UI addresses this by extending the concept of Resources to include dynamic, interactive web components2.
Technical Challenge: The Complexity of Commerce UI
The challenge in e-commerce is not rendering a static image; it is handling conditional logic that underpins core interactions:
Variant Selection: Changing a size option updates the available colors, inventory count, and price in real-time.
Bundles and Subscriptions: Displaying complex pricing rules or frequency selectors within the flow.
Real-time Constraints: Inventory updates or geo-localized pricing must be reflected instantly.
Forcing the LLM to model and generate this complex UI logic through text tokens is infeasible and error-prone. MCP UI delegates the rendering and state management of this complexity back to the specialized MCP Server, which is connected to the live commerce API.
The UIResource Primitive
MCP UI is built on top of the existing MCP embedded resources specification, introducing a UIResource interface. When an AI agent needs to display an interactive component (e.g., after searching for products), the commerce MCP Server returns a JSON response that includes a reference to this UIResource2.
The UIResource can be delivered through several technical mechanisms:
Inline HTML ( Best for lightweight, self-contained components. The HTML/CSS/JS payload is sent directly within the MCP response and rendered by the Client Host inside a secure, sandboxed
iframe.// TypeScript/JSON structure for a simple, inline UI component const inlineResource = { uri: 'ui://product-status/12345', content: { type: 'rawHtml', htmlString: '<div><p>Item added to cart successfully!</p></div>' }, encoding: 'text', mimeType: 'text/html' };External URL/Remote Resources: For complex applications like a full cart view or product configurator, the server provides an external URL (e.g.,
https://shopify-mcp.com/product-selector/id=123) which the Client loads into a sandboxediframe. This is the pattern used for highly interactive, full-featured components (as discussed in the talk3).Remote DOM: A more advanced, performant method where the UI structure is sent and rendered client-side, often using dedicated library or worker processes for maximum security and integration.
In all cases, the use of a sandboxed iframe is non-negotiable, ensuring that the third-party UI code from the MCP Server cannot access the Host application's DOM or sensitive data, prioritizing the security and privacy principles of MCP5.
Behind the Scenes / How It Works: The Intent System
The core technical hurdle of introducing interactive UI is synchronization: how does the Agent (LLM) stay in control of the conversation state when the user is interacting with a UI component running in a separate, isolated iframe?
MCP UI solves this with a robust, intent-based message passing system2.
The Interaction Flow
Agent Invokes Tool: The LLM, based on the user's prompt ("I want a new t-shirt"), decides to call the
product_searchTool on the Shopify MCP Server.Server Returns UI: The MCP Server executes the search and, instead of returning only text, returns an array of search results, each wrapped in a
UIResource(e.g., three product cards with images, variants, and an "Add to Cart" button).User Interaction: The user interacts directly with a product card component within the rendered iframe and clicks the "Add to Cart" button.
Intent Message: The UI component, running in the iframe, does not talk directly to the commerce backend. Instead, it sends a structured
postMessage(known as an Intent) to the Host application's MCP Client.
The Intent Schema
The Intent message is a structured signal containing the user's explicit action and relevant payload. Common Intent types for commerce include:
Intent Name | Description | Required Payload Fields |
| User confirms the order and wishes to proceed to final checkout. |
|
| User requests more information about a specific object. |
,
|
| A non-critical event occurred (e.g., cart quantity updated). |
,
|
| The iframe needs to request resizing from the host application. |
,
|
This Intent is received by the MCP Client and converted into a standard MCP message that is fed back into the Host's conversation context. The Host LLM can then observe this Intent, reason about the next step ("The user has initiated checkout"), and invoke the next sequential Tool (e.g., process_payment) to continue the workflow. The Agent, not the UI component, remains the ultimate orchestrator of the user journey.
This architecture fundamentally contrasts with frameworks like ReAct (Reasoning and Acting) or traditional function calling in that the output is not just an action, but a highly contextual, interactive state that can also generate new actions back to the model.
My Thoughts
The advent of MCP UI, driven by real-world enterprise needs at Shopify, is a technical inflection point for agentic systems. By standardizing the communication of visual context, it solves the most significant user experience problem for conversational AI in commerce.
Limitations and Future Improvements
Security and Trust in Generative UI: While sandboxed iframes ensure security isolation, the server-controlled nature of the UI poses a trust challenge. Future iterations of the protocol must standardize mechanisms for the Host to visually audit the UI code or apply cryptographic signatures to the UI payload before rendering, ensuring the component is not malicious.
Performance Overheads: Loading content via
iframeand the latency introduced by the Intent-handling loop (User action -> UI Intent -> Host LLM Reasoning -> New Tool Call) can degrade perceived performance. Optimizing the latency of the Host LLM’s contextual response is paramount.Generative vs. Component-Based UI: MCP UI currently focuses on delivering pre-built, production-ready components. The next leap, Generative UI (where the LLM dynamically constructs novel UI elements based on user intent), remains challenging but could be supported by MCP through highly-structured component schema definitions that the Host's renderer interprets, rather than raw HTML.
MCP UI is a pragmatic, scalable bridge between today's LLMs and tomorrow's fully agentic web. Its success will be measured not just by its technical elegance, but by its widespread adoption across other visually complex domains like financial dashboards and content management systems.
Acknowledgements
Gratitude is extended to Samuel Path, Software Engineer - Shopify and Bret Little, Staff Engineer - Shopify for their work on the MCP UI extension and for sharing their implementation lessons during their talk, "Scaling Commerce Interactivity: Lessons from Shopify's Implementation of MCP UI in AI Agents" at the MCP Developers Summit Europe3. We also thank the broader MCP and AI community for their commitment to developing open standards that drive the future of agentic technology.
References
Written by Om-Shree-0709 (@Om-Shree-0709)