Optimizely DXP MCP Server

compare_logs

Analyze production vs deployment slot logs to determine deployment safety. Compares error rates, performance metrics, and health scores to provide proceed/investigate/abort recommendations.

Instructions

🔍 Compare baseline vs slot logs to make deployment decisions. ANALYSIS: <5s. Takes output from two analyze_logs_streaming() calls (baseline=production, slot=deployment slot). Returns safety recommendation (proceed/investigate/abort) with detailed reasoning based on error rate changes, performance degradation, and health score delta. Use in deployment workflow: analyze baseline → deploy → analyze slot → compare → decide to complete or reset. Required: baseline, slot objects. Returns decision and supporting metrics.

Input Schema

TableJSON Schema

Name	Required	Description
`baseline`	Yes	Baseline log analysis (from analyze_logs_streaming)
`slot`	Yes	Slot log analysis (from analyze_logs_streaming)
`thresholds`	No	Threshold overrides. Defaults: 50% error increase, 20 point score drop, 100ms latency increase

Implementation Reference

lib/tools/log-analysis-tools.ts:1016-1060 (handler)

The primary handler for the 'compare_logs' MCP tool. Validates baseline and slot log analysis inputs, invokes the compareLogs helper, generates a markdown report with metrics table and reasons, and returns structured data with decision/recommendation.

static async handleCompareLogs(args: CompareLogsArgs): Promise<any> {
    try {
        const { baseline, slot, thresholds } = args;

        // Validate inputs
        if (!baseline || !slot) {
            return ResponseBuilder.error('Both baseline and slot analysis results are required');
        }

        // Perform comparison
        const comparison = compareLogs(baseline, slot, thresholds);

        // Build human-readable message
        let message = `# 🔍 Log Comparison Report\n\n`;
        message += `**Decision:** ${comparison.decision.toUpperCase()} ${LogAnalysisTools.getDecisionEmoji(comparison.decision)}\n`;
        message += `**Recommendation:** ${comparison.recommendation.toUpperCase()}\n\n`;

        message += `## 📊 Metrics Comparison\n\n`;
        message += `| Metric | Baseline | Slot | Delta |\n`;
        message += `|--------|----------|------|-------|\n`;
        message += `| **Errors** | ${comparison.baseline.totalErrors} | ${comparison.slot.totalErrors} | ${LogAnalysisTools.formatDelta(comparison.deltas.errorDelta)} (${LogAnalysisTools.formatPercent(comparison.deltas.errorDeltaPercent)}) |\n`;
        message += `| **Health Score** | ${comparison.baseline.healthScore} | ${comparison.slot.healthScore} | ${LogAnalysisTools.formatDelta(comparison.deltas.scoreDelta)} pts |\n`;
        message += `| **P95 Latency** | ${comparison.baseline.p95Latency}ms | ${comparison.slot.p95Latency}ms | ${LogAnalysisTools.formatDelta(comparison.deltas.latencyDelta)}ms |\n\n`;

        if (comparison.reasons.length > 0) {
            message += `## ${comparison.decision === 'safe' ? '✅' : '⚠️'} Analysis\n\n`;
            for (const reason of comparison.reasons) {
                message += `- ${reason}\n`;
            }
            message += '\n';
        }

        message += `## 🎯 Thresholds Applied\n\n`;
        message += `- **Max Error Increase:** ${comparison.thresholdsApplied.maxErrorIncrease}%\n`;
        message += `- **Max Score Decrease:** ${comparison.thresholdsApplied.maxScoreDecrease} points\n`;
        message += `- **Max Latency Increase:** ${comparison.thresholdsApplied.maxLatencyIncrease}ms\n`;

        // Return with structured data
        return ResponseBuilder.successWithStructuredData(comparison, message);

    } catch (error: any) {
        OutputLogger.error(`Log comparison error: ${error}`);
        return ResponseBuilder.internalError('Failed to compare logs', error.message);
    }
}

lib/tools/log-analysis-tools.ts:167-175 (schema)

TypeScript interface defining the expected input parameters for the compare_logs tool handler.

interface CompareLogsArgs {
    baseline: any;
    slot: any;
    thresholds?: {
        maxErrorIncrease?: number;
        maxScoreDecrease?: number;
        maxLatencyIncrease?: number;
    };
}

lib/log-analysis/log-comparator.ts:63-163 (helper)

Core helper function implementing the log comparison logic. Extracts metrics from baseline/slot analyses, computes deltas (errors, health score, latency), evaluates against configurable thresholds, determines safety decision and deployment recommendation, and returns structured comparison result.

function compareLogs(
    baseline: LogAnalysisResult,
    slot: LogAnalysisResult,
    thresholds: ComparisonThresholds = {}
): ComparisonResult {
    // Apply default thresholds
    const maxErrorIncrease = thresholds.maxErrorIncrease ?? 0.5;  // 50%
    const maxScoreDecrease = thresholds.maxScoreDecrease ?? 20;   // 20 points
    const maxLatencyIncrease = thresholds.maxLatencyIncrease ?? 100; // 100ms

    // Extract key metrics
    const baselineErrors = baseline.errors?.total ?? 0;
    const slotErrors = slot.errors?.total ?? 0;
    const baselineScore = baseline.summary?.healthScore ?? 100;
    const slotScore = slot.summary?.healthScore ?? 100;
    const baselineLatency = baseline.performance?.p95ResponseTime ?? 0;
    const slotLatency = slot.performance?.p95ResponseTime ?? 0;

    // Calculate deltas
    const errorDelta = slotErrors - baselineErrors;
    const errorDeltaPercent = baselineErrors > 0
        ? errorDelta / baselineErrors
        : (slotErrors > 0 ? 1.0 : 0);  // If baseline=0, slot>0 = 100% increase
    const scoreDelta = slotScore - baselineScore;
    const latencyDelta = slotLatency - baselineLatency;

    // Evaluate thresholds
    const reasons: string[] = [];
    let decision: 'safe' | 'warning' | 'critical' = 'safe';

    // Check error rate increase
    if (errorDeltaPercent > maxErrorIncrease) {
        const percentText = (errorDeltaPercent * 100).toFixed(1);
        reasons.push(`Error rate increased by ${percentText}% (${baselineErrors} → ${slotErrors})`);
        decision = 'critical';
    }

    // Check health score decrease
    if (scoreDelta < 0 && Math.abs(scoreDelta) > maxScoreDecrease) {
        reasons.push(`Health score dropped from ${baselineScore} to ${slotScore} (${scoreDelta} points)`);
        decision = decision === 'critical' ? 'critical' : 'warning';
    }

    // Check latency increase
    if (latencyDelta > maxLatencyIncrease) {
        reasons.push(`P95 latency increased by ${latencyDelta}ms (${baselineLatency}ms → ${slotLatency}ms)`);
        decision = decision === 'critical' ? 'critical' : 'warning';
    }

    // If no issues found, add positive reasons
    if (reasons.length === 0) {
        if (scoreDelta > 0) {
            reasons.push(`Health score improved from ${baselineScore} to ${slotScore}`);
        }
        if (errorDelta <= 0) {
            reasons.push(`Error rate maintained or decreased (${baselineErrors} → ${slotErrors})`);
        }
        if (latencyDelta <= 0) {
            reasons.push(`Latency maintained or improved (${baselineLatency}ms → ${slotLatency}ms)`);
        }
    }

    // Make recommendation
    let recommendation: 'proceed' | 'investigate' | 'rollback';
    if (decision === 'safe') {
        recommendation = 'proceed';
    } else if (decision === 'warning') {
        recommendation = 'investigate';
    } else {
        recommendation = 'rollback';
    }

    return {
        decision,
        recommendation,
        baseline: {
            totalErrors: baselineErrors,
            healthScore: baselineScore,
            avgLatency: baseline.performance?.avgResponseTime ?? 0,
            p95Latency: baselineLatency
        },
        slot: {
            totalErrors: slotErrors,
            healthScore: slotScore,
            avgLatency: slot.performance?.avgResponseTime ?? 0,
            p95Latency: slotLatency
        },
        deltas: {
            errorDelta,
            errorDeltaPercent: parseFloat((errorDeltaPercent * 100).toFixed(2)),
            scoreDelta,
            latencyDelta
        },
        reasons,
        thresholdsApplied: {
            maxErrorIncrease: maxErrorIncrease * 100,  // Convert to percentage for display
            maxScoreDecrease,
            maxLatencyIncrease
        }
    };
}

lib/log-analysis/log-comparator.ts:27-54 (schema)

Output schema defining the structure returned by the compareLogs helper, used in the tool response.

interface ComparisonResult {
    decision: 'safe' | 'warning' | 'critical';
    recommendation: 'proceed' | 'investigate' | 'rollback';
    baseline: {
        totalErrors: number;
        healthScore: number;
        avgLatency: number | null;
        p95Latency: number | null;
    };
    slot: {
        totalErrors: number;
        healthScore: number;
        avgLatency: number | null;
        p95Latency: number | null;
    };
    deltas: {
        errorDelta: number;
        errorDeltaPercent: number;
        scoreDelta: number;
        latencyDelta: number;
    };
    reasons: string[];
    thresholdsApplied: {
        maxErrorIncrease: number;
        maxScoreDecrease: number;
        maxLatencyIncrease: number;
    };
}

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: execution time ('ANALYSIS: <5s'), input requirements ('Takes output from two analyze_logs_streaming() calls'), return values ('Returns safety recommendation... with detailed reasoning'), and decision criteria ('based on error rate changes, performance degradation, and health score delta'). It doesn't mention error handling or rate limits, but covers most essential behavioral aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, followed by key behavioral details. Some sentences could be more concise (e.g., 'Required: baseline, slot objects' is redundant with schema), but overall it's efficient with zero wasted sentences. The emoji and formatting (ANALYSIS, Returns) enhance readability without adding bulk.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters with nested objects, no output schema, no annotations), the description does well to explain the workflow, input sources, decision logic, and return format. It covers the essential context needed to use the tool correctly in the deployment workflow. The main gap is the lack of output schema, but the description compensates by detailing what the tool returns (decision and supporting metrics).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal parameter semantics beyond the schema: it clarifies that baseline and slot objects come from analyze_logs_streaming calls and that they're required. It mentions the thresholds parameter indirectly through 'error rate changes, performance degradation, and health score delta' but doesn't add syntax or format details beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: compare baseline vs slot logs to make deployment decisions. It specifies the verb 'compare' and resources 'baseline vs slot logs', distinguishing it from sibling tools like analyze_logs_streaming (which provides input) and complete_deployment/reset_deployment (which execute decisions). The description explicitly mentions it returns a safety recommendation with reasoning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Use in deployment workflow: analyze baseline → deploy → analyze slot → compare → decide to complete or reset.' It names the required sibling tool (analyze_logs_streaming) and specifies when to use this tool (after analyzing both baseline and slot logs). It also indicates the tool's role in the decision-making process (proceed/investigate/abort).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JaxonDigital/optimizely-dxp-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server