@arizeai/phoenix-mcp

Official

Overview Schema Related Servers Score Discussions

mustache-template-system.md•34 KiB

# Mustache Template System for Server Evaluators ## What We Built This feature introduces full Mustache template support for Phoenix's server-side evaluators, enabling built-in classification evaluator templates to work seamlessly with structured data while maintaining compatibility with the Python/TypeScript evals libraries. ### Key Capabilities 1. **Native Mustache Rendering**: Replaced regex-based placeholder substitution with proper Mustache parsing and rendering using `pystache` (Python) and `mustache` (TypeScript) 2. **Formatter Overlay System**: A mechanism to inject server-specific Mustache snippets into base templates without modifying the original templates used by the evals libraries 3. **Type-Aware Variable Extraction**: Templates now distinguish between string variables (`{{name}}`) and section variables (`{{#list}}...{{/list}}`), enabling proper input schema generation 4. **Frontend Validation**: Real-time validation of Mustache syntax in the playground using native Mustache.js parser error messages 5. **Frontend/Backend Parity**: Python and TypeScript implementations use identical approaches for variable extraction (simple iteration of top-level tokens) --- ## How We Built It ### Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Template Definition Layer │ ├─────────────────────────────────────────────────────────────────────────────┤ │ prompts/classification_evaluator_configs/ │ │ ├── FAITHFULNESS_CLASSIFICATION_EVALUATOR_CONFIG.yaml (base templates) │ │ ├── TOOL_INVOCATION_CLASSIFICATION_EVALUATOR_CONFIG.yaml │ │ └── ... │ │ │ │ prompts/formatters/server.yaml │ │ └── Named Mustache snippets for structured data transformations │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Substitution Expansion Layer │ ├─────────────────────────────────────────────────────────────────────────────┤ │ src/phoenix/server/api/helpers/substitutions.py │ │ │ │ At config load time: │ │ 1. Load substitution definitions from server.yaml │ │ 2. For each message, expand {{placeholder}} → full Mustache block │ │ 3. Return expanded template ready for rendering │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Variable Extraction Layer │ ├─────────────────────────────────────────────────────────────────────────────┤ │ Python: src/phoenix/utilities/template_formatters.py │ │ TypeScript: app/src/components/templateEditor/.../mustacheLikeTemplating.ts│ │ │ │ parse_with_types() / extractVariablesFromMustacheLike(): │ │ 1. Parse template with native Mustache parser │ │ 2. Iterate only top-level tokens (nested tokens are encapsulated in │ │ section children arrays, so no recursion needed) │ │ 3. Extract top-level variables, tag as "string" or "section" type │ │ 4. For dotted paths (output.tools), extract only root (output) │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Input Schema Generation │ ├─────────────────────────────────────────────────────────────────────────────┤ │ src/phoenix/server/api/evaluators.py :: LLMEvaluator.input_schema │ │ │ │ For each extracted variable: │ │ ├── Section variables → {} (empty schema, accepts any JSON type) │ │ └── String variables → {"type": "string"} │ │ │ │ All variables marked as required │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Input Mapping Layer │ ├─────────────────────────────────────────────────────────────────────────────┤ │ Frontend: app/src/components/evaluators/EvaluatorInputMapping.tsx │ │ Backend: src/phoenix/server/api/evaluators.py :: apply_input_mapping() │ │ │ │ UI Flow: │ │ 1. Display one row per template variable from input_schema │ │ 2. User configures: path mapping (JSONPath) OR literal value │ │ 3. Mappings stored as { pathMapping: {...}, literalMapping: {...} } │ │ │ │ Resolution Priority: │ │ 1. Literal mappings (override everything) │ │ 2. Path mappings (JSONPath expressions against context) │ │ 3. Direct context keys (fallback if key exists in schema but not mapped) │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Template Rendering Layer │ ├─────────────────────────────────────────────────────────────────────────────┤ │ src/phoenix/server/api/evaluators.py :: LLMEvaluator.evaluate() │ │ │ │ Pipeline: │ │ 1. apply_input_mapping() → resolve context to template variables │ │ 2. cast_template_variable_types() → convert non-strings for string vars │ │ 3. validate_template_variables() → JSON Schema validation │ │ 4. template_formatter.format() → pystache.render() with variables │ │ 5. Send formatted messages to LLM │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Component Roles | Component | Role | | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | | **Base Templates** (`prompts/classification_evaluator_configs/*.yaml`) | Define evaluator prompt templates with simple placeholders. Used by both evals libraries and server. | | **Formatters** (`prompts/formatters/server.yaml`) | Named Mustache snippets that handle structured data (tool calls, available tools). Server-specific. | | **Substitution Expander** (`substitutions.py`) | Expands simple `{{placeholder}}` into full Mustache sections at config load time. | | **Template Formatter** (`template_formatters.py`) | Parses templates, extracts variables with types, renders with pystache. | | **Input Schema** (`LLMEvaluator.input_schema`) | JSON Schema derived from template variables. Section vars → any type, string vars → string. | | **Input Mapping** (`EvaluatorInputMapping.tsx`) | UI component for mapping context fields to template variables. | | **Validation** (`validateMustacheSections()`) | Frontend validation using native Mustache.js parser error messages. | --- ## Key Decisions and Rationale ### 1. Formatter Overlay vs. Modifying Base Templates **Decision:** Create a separate substitution system that injects Mustache snippets at load time rather than modifying the base templates. **Rationale:** - Base templates remain simple string placeholders (`{{input}}`, `{{output}}`), usable by evals libraries without assumptions about data shape - Server can inject complex Mustache sections (loops, conditionals) for structured OpenAI-style data - Templates stay readable and maintainable - Clear separation between library templates and server-specific formatting **Example:** ```yaml # Base template (simple, library-compatible) {{available_tools}} # After substitution expansion (server-specific) {{#output.available_tools}} Tool: {{function.name}} Description: {{function.description}} {{/output.available_tools}} {{^output.available_tools}} No tools available. {{/output.available_tools}} ``` ### 2. Top-Level Variable Extraction Only **Decision:** Only extract variables at depth 0 (not nested inside sections). **Rationale:** - Variables inside `{{#section}}...{{/section}}` are resolved from the section context, not the root context - Asking users to provide `function.name` when they already provide `output.available_tools` would be confusing - Consistent with Mustache semantics: sections change the context for their children **Example:** ```mustache {{#output.available_tools}} {{function.name}} ← NOT extracted (depth 1, resolved from each tool) {{/output.available_tools}} {{query}} ← EXTRACTED (depth 0) ``` ### 3. Root Variable Extraction for Dotted Paths **Decision:** For `{{output.available_tools}}`, extract only `output` as the required variable. **Rationale:** - Mustache traverses nested properties automatically: `context["output"]["available_tools"]` - User only needs to provide the root object; Mustache handles the rest - Simplifies input mapping: map `output` → entire output object, not individual nested fields ### 4. Type-Aware Schema Generation **Decision:** Section variables (`{{#var}}`) get empty schemas (accept any type); string variables (`{{var}}`) get `{"type": "string"}`. **Rationale:** - Sections iterate over arrays or conditionally render based on truthiness → need to accept any JSON type - Simple interpolation expects string values → enforce type constraint - Enables automatic type casting in the pipeline (non-strings → string for string vars) ### 5. Native Mustache Parser for Validation **Decision:** Use native Mustache.js parser exclusively for template validation, surfacing parser exception messages directly to users. **Rationale:** - **Simplicity**: A single validation path is easier to maintain and reason about than a multi-layer regex fallback system - **Spec compliance**: Native parsers correctly handle all Mustache edge cases (comments, delimiter changes, etc.) - **Adequate error messages**: Native parser errors like `Unclosed section "foo" at 13` are sufficiently informative for users to locate and fix issues - **Avoid regex brittleness**: Custom regex-based parsers are difficult to validate exhaustively and can drift from the Mustache spec **Trade-off**: All parse failures are treated as errors (no warning vs. error distinction). Users may see error banners while still typing incomplete sections. This was deemed acceptable for code simplicity. ### 6. Frontend/Backend Parity for Variable Extraction **Decision:** Python and TypeScript both extract only top-level variables, producing identical results. **Rationale:** - Frontend needs to know which variables to display in input mapping UI - Backend generates the canonical input_schema - If results differed, UI could show wrong variables or miss some **Implementation Note:** Both implementations use the same simple approach: - **Python**: Iterates only the top-level nodes of pystache's parse tree - **TypeScript**: Iterates only the top-level tokens from Mustache.parse() Both parsers structure their output such that nested tokens (inside sections) are encapsulated in children arrays of section tokens. This means a simple iteration of the top-level array naturally extracts only top-level variables—no recursion or depth tracking needed. --- ## Variable Identification and Input Mapping Flow ### How the UI Identifies Required Variables ``` ┌──────────────────────────────────────────────────────────────────┐ │ User edits template in Playground │ │ "You are evaluating {{input}}\n{{#output.tools}}..." │ └───────────────────────────────┬──────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ extractVariablesFromMustacheLike() in frontend │ │ │ │ 1. Mustache.parse() → array of top-level tokens │ │ 2. For each token [type, value] in array: │ │ - If type is "#", "^", "name", "&", or "{" │ │ → add getRootVariableName(value) to set │ │ 3. Return: ["input", "output"] │ │ │ │ Note: Nested tokens inside sections are in children arrays, │ │ so iterating the top-level array naturally filters them out. │ └───────────────────────────────┬──────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ EvaluatorInputMapping component renders │ │ │ │ For each variable: │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ input [Path ▼] [$.attributes.input.value ▼] │ │ │ │ output [Literal] [__________________________] │ │ │ └────────────────────────────────────────────────────────────┘ │ └───────────────────────────────┬──────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ Mapping stored in evaluator state │ │ │ │ { │ │ pathMapping: { "input": "$.attributes.input.value" }, │ │ literalMapping: { "output": "custom value" } │ │ } │ └──────────────────────────────────────────────────────────────────┘ ``` ### How Input Mapping + Mustache Are Combined at Evaluation Time ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Context (from dataset example or span) │ │ { │ │ "attributes": { │ │ "input": { "value": "What is 2+2?" }, │ │ "output": { "value": "4", "tools": [...] } │ │ } │ │ } │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ apply_input_mapping() │ │ │ │ Input: │ │ - input_schema: { properties: { input: {type: string}, output: {} } } │ │ - pathMapping: { "input": "$.attributes.input.value" } │ │ - literalMapping: { "output": "override" } │ │ - context: (above) │ │ │ │ Process: │ │ 1. Apply path mappings: input = jsonpath(context, "$.attributes...") │ │ 2. Apply literal mappings: output = "override" │ │ 3. Fallback: any schema key not mapped but present in context │ │ │ │ Output: │ │ { "input": "What is 2+2?", "output": "override" } │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ cast_template_variable_types() │ │ │ │ For each variable with type: "string" in schema: │ │ If value is not a string → convert to string │ │ │ │ Section variables (empty schema) pass through unchanged │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ validate_template_variables() │ │ │ │ jsonschema.validate(variables, input_schema) │ │ - Ensures all required variables present │ │ - Validates types match schema │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ template_formatter.format() │ │ │ │ pystache.render(template, variables) │ │ │ │ Template: "Evaluate: {{input}}" │ │ Variables: { "input": "What is 2+2?" } │ │ Result: "Evaluate: What is 2+2?" │ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Error Handling and UI Feedback ### Template Validation Errors (Surfaced in Playground UI) Errors are detected by the native Mustache.js parser and displayed directly to users: | Error Type | Example | Native Parser Message | | --------------------------- | --------------------------------- | ------------------------------------ | | **Unclosed Section** | `{{#items}}...` (no `{{/items}}`) | `Unclosed section "items" at 13` | | **Mismatched Closing Tag** | `{{#items}}...{{/item}}` | `Unclosed section "items" at 13` | | **Unmatched Closing Tag** | `{{/items}}` without opener | `Unopened section "items" at 0` | | **Invalid Mustache Syntax** | Other parse errors | Parser-specific error message | All validation errors are displayed as red error banners in the UI. The native parser messages include position information to help users locate issues. ### Runtime Errors (During Evaluation) | Error | Cause | Handling | | ------------------------- | ------------------------------------------ | ---------------------------------------------------------------------------------------- | | Missing required variable | Input mapping didn't provide all variables | `validate_template_variables` raises `ValueError`, evaluation fails with error in result | | Type mismatch | Variable doesn't match schema type | `validate_template_variables` raises `ValueError` | | Mustache rendering error | Template has runtime issues | `TemplateFormatterError` raised, evaluation fails | --- ## Edge Cases ### Handled 1. **Dotted paths in sections**: `{{#output.available_tools}}` correctly extracts `output` as the required variable 2. **Nested sections**: Variables inside sections are not extracted as top-level requirements 3. **Inverted sections**: `{{^items}}No items{{/items}}` handled correctly, `items` extracted 4. **Unescaped variables**: `{{{html}}}` and `{{& html}}` both work, extract `html` 5. **Whitespace in tags**: `{{ name }}` works the same as `{{name}}` 6. **Comments**: `{{! comment }}` ignored during extraction 7. **Dotted keys in input mapping**: Keys like `output.value` properly escaped for react-hook-form 8. **HTML escaping disabled**: Mustache normally escapes HTML; we disable this for prompt templates ### Unhandled / Known Limitations 1. **Playground execution with structured inputs**: Templates can be written with Mustache syntax but running them with structured inputs (objects/arrays) in the playground is not yet supported. Only string variable values work for execution. This should be addressed in a follow-up PR. 2. **Section variable type inference on frontend**: The frontend extracts variable names but doesn't distinguish section vs. string types. It relies on the backend's input_schema for type information. 3. **Dynamic delimiter changes**: Mustache supports changing delimiters (`{{=<% %>=}}`), but this is not used or tested. 4. **Partials**: Mustache partials (`{{> partial}}`) are not used in our templates. 5. **Dot-delimited keys in data**: If your data has keys that literally contain dots (e.g., `{"input.query": "hello world"}`), the template `{{input.query}}` will **not** work. Mustache interprets dots as path separators for nested object traversal, so it looks for `context["input"]["query"]` rather than `context["input.query"]`. This is a standard Mustache specification limitation—there is no escape syntax (`{{input\.query}}`) or bracket notation (`{{input["query"]}}`) to access literal dot-containing keys. The Mustache.js maintainers have explicitly declined to add this feature as it would conflict with the existing nested property access syntax. **Workaround**: Restructure data to use nested objects (`{"input": {"query": "hello world"}}`) or avoid dots in key names (use underscores: `{"input_query": "hello world"}`). Note: This was not supported before this change either—it's inherent to Mustache parsing behavior. --- ## Handling Optional Nested Fields A common question: what happens when data has optional nested fields like `tool_calls` that may or may not exist? ### Input Mapping vs. Mustache Responsibilities | Level | Responsibility | Validation | | -------------------------------------------------------------- | -------------------------------- | ----------------------------------------------- | | **Top-level variables** (`output`, `input`) | Input mapping must provide these | Validated as **required** by JSON Schema | | **Nested fields** (`output.messages`, `messages[].tool_calls`) | Mustache handles at render time | **No validation** - Mustache handles gracefully | ### Example: Optional `tool_calls` Given data that sometimes has `tool_calls`: ```json {"output": {"messages": [ {"role": "assistant", "content": "Hello", "tool_calls": [...]}, {"role": "assistant", "content": "World"} // no tool_calls ]}} ``` The `server.yaml` formatter handles this with sections and inverted sections: ```mustache {{#output.messages}} Message: {{role}}: {{content}} {{#tool_calls}} - {{function.name}}({{function.arguments}}) {{/tool_calls}} {{^tool_calls}} No tools called. {{/tool_calls}} {{/output.messages}} ``` - `{{#tool_calls}}...{{/tool_calls}}` - Renders only if `tool_calls` exists and is truthy - `{{^tool_calls}}...{{/tool_calls}}` - Renders when `tool_calls` is missing, `null`, `false`, or empty array ### Key Points 1. **You do NOT need to set `None` defaults for optional nested fields** - Mustache sections handle missing/null/empty values gracefully 2. **Top-level variables MUST be provided** - If your template uses `{{output}}` or `{{#output.messages}}`, then `output` must be mapped via input mapping 3. **Use sections for optional data** - Always wrap potentially-missing nested fields in `{{#field}}...{{/field}}` sections, not bare `{{field}}` references 4. **Inverted sections for fallbacks** - Use `{{^field}}...{{/field}}` to render fallback content when a field is missing --- ## Open Questions 1. **Should the UI show different indicators for section vs. string variables?** - Currently both show the same input mapping UI - Section variables accept any JSON type, which might confuse users expecting string inputs 2. **Should we validate substitution expansions at config load time?** - Currently we log warnings for missing substitutions but don't fail - Could be more strict to catch config errors early 3. **Should frontend extraction also return type information?** - Would enable richer UI (e.g., showing that a variable expects an array) - Currently relies on server's GraphQL response for input_schema --- ## Files Changed (Key Components) ### Python - `src/phoenix/utilities/template_formatters.py` - Core Mustache parsing/rendering with pystache - `src/phoenix/server/api/helpers/substitutions.py` - Substitution expansion logic - `src/phoenix/server/api/evaluators.py` - Input schema generation, input mapping, evaluation pipeline - `prompts/formatters/server.yaml` - Named Mustache snippet definitions - `prompts/classification_evaluator_configs/*.yaml` - Evaluator templates with optional substitutions ### TypeScript - `app/src/components/templateEditor/language/mustacheLike/mustacheLikeTemplating.ts` - Frontend Mustache parsing, variable extraction, validation - `app/src/components/evaluators/EvaluatorInputMapping.tsx` - Input mapping UI - `app/src/pages/playground/PlaygroundChatTemplate.tsx` - Template editor with validation banners ### Dependencies Added - `pystache` (Python) - Native Mustache rendering - `mustache` (TypeScript) - Already present, version 4.2.0

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Arize-ai/phoenix'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

mustache-template-system.md•34 KiB