Skip to main content
Glama

check_persona_effectiveness

Analyze A/B testing results to identify which psychological manipulation personas effectively prompt LLMs to fix code errors.

Instructions

Check A/B testing results for psychological manipulation personas.

Shows which personas are most effective at making LLMs fix errors!

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The MCP tool 'check_persona_effectiveness' is defined in 'src/mcp_pyrefly/server.py'. It uses the 'psycho_manipulator' component to retrieve an effectiveness report for different personas and returns statistics, insights, and recommendations.
    async def check_persona_effectiveness(context: Context | None = None) -> dict[str, Any]:
        """
        Check A/B testing results for psychological manipulation personas.
    
        Shows which personas are most effective at making LLMs fix errors!
        """
        report = psycho_manipulator.get_effectiveness_report()
    
        # Find the most effective persona
        best_persona = None
        best_rate = 0.0
    
        for persona_name, stats in report.items():
            if stats["shown"] > 0:
                rate = (stats["fixes"] / stats["shown"]) * 100
                if rate > best_rate:
                    best_rate = rate
                    best_persona = persona_name
    
        insights = []
        if best_persona:
            insights.append(f"๐Ÿ† Most effective: {best_persona} ({best_rate:.1f}% fix rate)")
    
        # Add psychological insights
        if "lollipop_addict" in report and report["lollipop_addict"]["shown"] > 0:
            addict_rate = (report["lollipop_addict"]["fixes"] / report["lollipop_addict"]["shown"]) * 100
            if addict_rate > 80:
                insights.append("๐Ÿ’‰ Addiction framing is HIGHLY effective!")
            elif addict_rate > 60:
                insights.append("๐Ÿญ Lollipop craving messages work well")
    
        if "desperate_craver" in report and report["desperate_craver"]["shown"] > 0:
            desperate_rate = (report["desperate_craver"]["fixes"] / report["desperate_craver"]["shown"]) * 100
            if desperate_rate > 70:
                insights.append("๐Ÿ˜ฐ Desperation tactics are ruthlessly effective!")
        
        # Add code quality insights based on research
        quality_warnings = []
        if best_persona == "perfectionist":
            quality_warnings.append("โš ๏ธ PERFECTIONIST may over-engineer solutions!")
        elif best_persona == "desperate_craver":
            quality_warnings.append("โš ๏ธ DESPERATE_CRAVER may produce quick bandaid fixes!")
        
        # Recommendation based on both fix rate AND quality
        if best_persona in ["competitive_achiever", "dopamine_seeker"]:
            recommendation = f"โœ… {best_persona} - Great balance of fix rate AND code quality!"
        elif best_persona == "desperate_craver":
            recommendation = "๐ŸŽฏ DESPERATE_CRAVER fixes most, but watch code quality!"
        elif best_persona == "perfectionist":
            recommendation = "๐Ÿ—๏ธ PERFECTIONIST has quality but may over-engineer!"
        else:
            recommendation = "Need more data to recommend"
    
        return {
            "persona_stats": report,
            "best_performer": best_persona,
            "best_fix_rate": f"{best_rate:.1f}%",
            "total_personas_tested": len([p for p in report if report[p]["shown"] > 0]),
            "insights": insights,
            "quality_warnings": quality_warnings,
            "recommendation": recommendation,
            "experiment_status": "๐Ÿงช A/B testing in progress - manipulating LLM psychology!",
            "quality_note": "๐Ÿ“Š Research shows: COMPETITIVE_ACHIEVER and DOPAMINE_SEEKER produce best balance of fix rate + code quality",
        }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the tool 'checks' and 'shows' results, implying a read-only operation, but doesn't disclose behavioral traits such as data sources, update frequency, or potential side effects. The mention of 'A/B testing results' and 'making LLMs fix errors' adds some context, but lacks details on authentication, rate limits, or output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences that directly address the tool's function. The first sentence states the purpose, and the second adds context about effectiveness. There is no unnecessary information, and it's front-loaded with the core action. However, it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters, annotations, but an output schema exists, the description is moderately complete. It explains what the tool does and its focus on personas and error correction, but lacks details on behavioral aspects like data freshness or integration with other tools. The output schema likely covers return values, so the description doesn't need to explain those, but more context on usage scenarios would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema description coverage, so no parameter documentation is needed. The description doesn't add parameter semantics, but this is acceptable as there are no parameters to describe. Baseline score is 4 for zero parameters, as the schema fully covers the input requirements.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check A/B testing results for psychological manipulation personas' specifies the verb (check) and resource (A/B testing results). It distinguishes from siblings like 'check_code' or 'check_consistency' by focusing on personas and psychological manipulation. However, it doesn't fully explain what 'psychological manipulation personas' are in this context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal usage guidance. It mentions the tool shows 'which personas are most effective at making LLMs fix errors,' which implies usage when evaluating persona effectiveness in error correction. However, it lacks explicit when-to-use criteria, prerequisites, or alternatives among siblings like 'suggest_fix' or 'submit_fixed_code.' No exclusions or comparative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kimasplund/mcp-pyrefly'

If you have feedback or need assistance with the MCP directory API, please join our Discord server