Excel MCP Server (uses Excel App, 100% Compatibility)

Overview Schema Related Servers Score Discussions

llm-testing-philosophy.instructions.md•13.6 KiB

--- applyTo: "llm-tests/**" --- # LLM Testing Philosophy > **⚠️ CORE PRINCIPLE: Tests simulate real users. Failures expose product gaps, not test gaps.** ## What Are LLM Tests? LLM tests use an AI coding agent (with its own default system prompt) to exercise our CLI and MCP tools. The agent receives a **natural language prompt** — the same kind a real user would type — and must figure out how to accomplish the task using our tools. These tests do NOT test the LLM. They test **our product's usability surface**: - Skill documentation (SKILL.md) - CLI `--help` output - MCP tool descriptions (XML `/// <summary>`) - Error messages and recovery hints - Parameter naming and discoverability - Workflow coherence ## The Golden Rule > **If the LLM can't figure it out, fix the product — never fix the test.** When an LLM test fails, the root cause is ALWAYS one of: 1. Our **skill docs** don't explain the workflow clearly enough 2. Our **tool descriptions** are misleading or incomplete 3. Our **CLI --help** output doesn't show the right examples 4. Our **error messages** don't guide recovery 5. Our **parameter names** are confusing or undiscoverable 6. The test itself is unreasonable (rare — only fix if a human couldn't do it either) ## What NEVER Belongs in a Test ### ❌ xfail or skip Markers **NEVER use `@pytest.mark.xfail` or `@pytest.mark.skip` to hide failing tests.** Tests either **pass or fail**. There is no middle ground. - `xfail` masks real failures and creates a false sense of progress - `skip` hides broken code instead of fixing it - If a test fails, **fix the product** — the test is exposing a real problem (Golden Rule) - If a test is flaky, fix the flakiness — don't paper over it with xfail ### ❌ CLI Command Guidance in Prompts A real user doesn't know our CLI syntax. Neither should the test prompt. ```python # ❌ WRONG: Teaching the LLM how to use our CLI prompt = """ Create a PivotTable then set layout to Compact using 'excelcli pivottablecalc' (run --help to see options). The compact layout uses row-layout value 0. """ # ✅ CORRECT: Natural user request prompt = """ Create a PivotTable with Compact layout showing Department and Team as rows, Hours as values. """ ``` ### ❌ Tool-Specific Hints in Prompts ```python # ❌ WRONG: Directing the LLM to a specific command prompt = """ Use the 'excelcli chartconfig' command to change the chart title. """ # ✅ CORRECT: What the user wants prompt = """ Change the chart title to "Q1 Sales Report" without deleting and recreating the chart. """ ``` ### ❌ System Prompt Engineering The system prompt belongs to the agent, not to us. We don't control what system prompt VS Code, Cursor, Claude Desktop, or any other host uses. ```python # ❌ WRONG: Adding our own guidance to the system prompt agent = Agent( system_prompt=( "Run 'excelcli <command> --help' when unsure about parameter names\n" "Always use -q flag for clean JSON output" ), ) # ✅ CORRECT: No system prompt (use skill only) or minimal role context agent = Agent( skill=excel_cli_skill, # Our product — this IS the right place for guidance ) ``` ### ❌ Session Recovery Instructions ```python # ❌ WRONG: Teaching the LLM our session model prompt = """ IMPORTANT: First, run 'excelcli session list' to confirm which file is open, then use that file path. """ # ✅ CORRECT: If session discovery is hard, fix the product: # - Better error messages: "No active session. Run 'excelcli session list' to see open files." # - Better skill docs: Add "Session Management" section with recovery patterns # - Better --help: Show session workflow in command help ``` ## What DOES Belong in a Test ### Natural Language Prompts Write prompts as a knowledgeable Excel user would. They know Excel concepts but NOT our specific CLI/MCP tool syntax. ```python prompt = f""" Create a new Excel file at {unique_path('sales-analysis')} Enter this sales data: Region, Product, Sales North, Widget, 15000 South, Gadget, 12000 Create a PivotTable showing Region as rows and Sum of Sales as values. Add a slicer for the Region field. Filter to show only North region. Save and close the file. """ ``` ### Reasonable Assertions Assert on **outcomes** the user cares about, not implementation details: ```python # ✅ Good: Did the operation succeed? assert result.success assert_cli_exit_codes(result) # ✅ Good: Did the LLM report key results? assert_regex(result.final_response, r"(?i)(pivot|region|north)") # ⚠️ Fragile: Exact numeric values across 5 conversation turns assert_regex(result.final_response, r"\$?43,500\.00") # Requires perfect execution of ALL prior steps ``` ### Appropriate Complexity Match test complexity to what a real user would attempt in one conversation: | Complexity | Turns | Example | |-----------|-------|---------| | Simple | 1 | Create file → write data → read back → close | | Medium | 1-2 | Create file → build table → add chart → save | | Complex | 2-3 | Build data model → create measures → analyze | | Unreasonable | 5+ | 13-step workflow with exact numeric assertions on final state | ## Where to Fix Failures When a test fails, investigate in this order: ### 1. Skill Documentation (`skills/excel-cli/SKILL.md`, `skills/excel-mcp/SKILL.md`) The skill IS our product's interface to LLMs. If the LLM doesn't know how to: - Set a PivotTable layout → Add workflow patterns to the skill - Use `--values-file` instead of `--values` → Document when to use file params - Discover `chartconfig` commands → Add chart modification patterns ```markdown ## Chart Modification Patterns To change chart properties WITHOUT deleting the chart: - Title: `chartconfig set-title` - Type: `chartconfig set-type` ``` ### 2. CLI `--help` Output The `--help` text is what the LLM sees when it runs `excelcli <command> --help`. If a parameter is hard to discover, improve the help text. Look at: - `[Description]` attributes on Settings properties - Parameter ordering and grouping - Whether common workflows are clear from help alone ### 3. MCP Tool Descriptions and Skill References For MCP tests, the tool description IS the API documentation. If the LLM picks wrong tools or wrong parameters, improve the XML docs. For detailed guidance (workflows, quirks, best practices), update `skills/shared/*.md`. These are **auto-synced** to MCP prompts at build time — Claude Desktop and skill-based clients get identical guidance. ### 4. Error Messages When a CLI command fails, does the error message tell the LLM how to fix it? ``` # ❌ Bad error message "Parameter 'mCode' is required" # ✅ Good error message "Parameter 'mCode' is required for 'create' action. Provide --m-code with inline M code or --m-code-file with a file path." ``` ### 5. Parameter Naming If the LLM consistently gets a parameter name wrong, the name might be confusing. Consider renaming or adding aliases. ## Test Structure Conventions ### Agent Configuration ```python agent = Agent( name="descriptive-test-name", provider=Provider(model=f"azure/{DEFAULT_MODEL}", rpm=DEFAULT_RPM, tpm=DEFAULT_TPM), cli_servers=[excel_cli_server], # CLI tests # OR mcp_servers=[excel_mcp_server], # MCP tests skill=excel_cli_skill, # Our product documentation max_turns=DEFAULT_MAX_TURNS, # Always set explicitly ) ``` - **Always set `max_turns`** explicitly — relying on defaults creates silent failures - **No custom `system_prompt`** unless testing a specific user persona (rare) - **Always include `skill`** — this is our product's documentation ### Multi-Turn Tests Each turn should be a natural continuation of the conversation: ```python # Turn 1: Set up result = await aitest_run(agent, "Create file and enter data...") messages = result.messages # Turn 2: Analyze (natural continuation) result = await aitest_run(agent, "Now create a PivotTable from that data...", messages=messages) ``` Keep multi-turn tests to **2-3 turns maximum**. If you need 5 turns, the test is testing too many features at once — split it into separate tests. ## MCP/CLI Test Sync Rule (CRITICAL) **Every test scenario MUST exist for BOTH CLI and MCP.** The two entry points are equal citizens — if a scenario is worth testing for one, it's worth testing for the other. ### Current Test Mapping | Test Scenario | CLI File | MCP File | |---------------|----------|----------| | calculation_mode | `test_cli_calculation_mode.py` | `test_mcp_calculation_mode.py` | | chart | `test_cli_chart.py` | `test_mcp_chart.py` | | chart_positioning | `test_cli_chart_positioning.py` | `test_mcp_chart_positioning.py` | | file_worksheet | `test_cli_file_worksheet.py` | `test_mcp_file_worksheet.py` | | financial_report_automation | `test_cli_financial_report_automation.py` | `test_mcp_financial_report_automation.py` | | modification_patterns | `test_cli_modification_patterns.py` | `test_mcp_modification_patterns.py` | | pivottable_layout | `test_cli_pivottable_layout.py` | `test_mcp_pivottable_layout.py` | | powerquery_datamodel | `test_cli_powerquery_datamodel.py` | `test_mcp_powerquery_datamodel.py` | | range | `test_cli_range.py` | `test_mcp_range.py` | | sales_report_workflow | `test_cli_sales_report_workflow.py` | `test_mcp_sales_report_workflow.py` | | slicer | `test_cli_slicer.py` | `test_mcp_slicer.py` | | table | `test_cli_table.py` | `test_mcp_table.py` | ### Rules for Creating / Updating / Deleting Tests 1. **Creating a new test:** ALWAYS create BOTH the CLI and MCP version. Name them identically except for the prefix (`test_cli_` vs `test_mcp_`). The prompt text should be identical — only the agent configuration differs (cli_servers vs mcp_servers, cli skill vs mcp skill). 2. **Updating a test:** Update BOTH the CLI and MCP version. If you change the prompt or assertions in one, apply the same change to the other. The test scenario must remain equivalent. 3. **Deleting a test:** Delete BOTH versions. Never leave an orphaned test in only one entry point. 4. **Test parity check:** Before committing, verify that `llm-tests/cli/` and `llm-tests/mcp_tests/` have matching test files (same scenarios, same count minus the prefix). ### Agent Configuration Differences The ONLY differences between CLI and MCP versions of a test should be: ```python # CLI version agent = Agent( name="test-name-cli", cli_servers=[excel_cli_server], skill=excel_cli_skill, ... ) # MCP version agent = Agent( name="test-name-mcp", mcp_servers=[excel_mcp_server], skill=excel_mcp_skill, ... ) ``` Prompts, assertions, and test logic should be **identical**. ## SKILL.md and MCP Prompt Generation Awareness (CRITICAL) **SKILL.md files and MCP skill prompts are auto-generated. Never edit them directly.** ### Generation Pipelines **SKILL.md (for skill-based clients like VS Code, Cursor):** ``` C# Interfaces (XML /// docs) → Roslyn Source Generator (ServiceRegistryGenerator) → _SkillManifest.g.cs (JSON const) → MSBuild Task (GenerateSkillFile.cs) → Scriban Templates (.sbn) → Generated SKILL.md ``` **MCP Skill Prompts (for Claude Desktop and MCP-only clients):** ``` skills/shared/*.md (source of truth) → MSBuild EmbeddedResource with Link (embedded in assembly) → MSBuild GenerateSkillPromptsClass inline task → ExcelSkillPrompts.g.cs (14 [McpServerPrompt] methods) → Claude Desktop sees identical guidance as skill clients ``` ### Where to Fix What | Problem | Fix Location | NOT Here | |---------|-------------|----------| | Wrong tool/command description | `I*Commands.cs` XML `/// <summary>` | `SKILL.md` | | Wrong parameter docs | `I*Commands.cs` XML `/// <param>` | `SKILL.md` | | Wrong skill prose/rules/workflows | `skills/templates/SKILL.cli.sbn` or `SKILL.mcp.sbn` | `SKILL.md` | | Wrong reference doc content | `skills/shared/*.md` | `skills/excel-*/references/*.md` | | Wrong MCP prompt content | `skills/shared/*.md` | `Prompts/Content/Skills/` | | Wrong Tool Selection table (MCP) | `skills/templates/SKILL.mcp.sbn` | `SKILL.md` | | New skill reference needed | Add `.md` to `skills/shared/` + description in `.csproj` | Don't create separate prompt | ### Key Files - **Templates:** `skills/templates/SKILL.cli.sbn`, `skills/templates/SKILL.mcp.sbn` - **Reference docs (source of truth):** `skills/shared/*.md` → auto-synced to BOTH skill refs AND MCP prompts - **Generated files (NEVER edit):** `skills/excel-cli/SKILL.md`, `skills/excel-mcp/SKILL.md`, `obj/.../ExcelSkillPrompts.g.cs` - **Description overrides:** `src/ExcelMcp.McpServer/ExcelMcp.McpServer.csproj` → `GenerateSkillPromptsClass` task - **Build command:** `dotnet build -c Release` regenerates SKILL.md, copies references, and generates prompt class ### Testing Impact When a test fails because the LLM misuses a tool or parameter: 1. Check the SKILL template (`skills/templates/SKILL.*.sbn`) — is the guidance correct? 2. Check the reference doc (`skills/shared/*.md`) — are the parameter values correct? 3. Check the C# interface XML docs — is the tool description accurate? 4. Fix in the source file, rebuild (`dotnet build -c Release`), then re-run tests. **Never fix a SKILL.md directly** — the fix will be lost on next build. ## Quick Checklist Before submitting an LLM test PR: - [ ] Prompts read like natural user requests, not CLI tutorials - [ ] No CLI command names, flag names, or syntax in prompts - [ ] No system prompt (or minimal role-setting only) - [ ] `max_turns` set explicitly on Agent - [ ] Assertions check outcomes, not implementation details - [ ] At most 2-3 conversation turns - [ ] If a test needs "hints" to pass, the hint belongs in the skill/help/tool docs instead - [ ] **BOTH CLI and MCP versions of the test exist** (sync rule) - [ ] **No direct edits to generated SKILL.md files** (template awareness)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sbroenne/mcp-server-excel'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

llm-testing-philosophy.instructions.md•13.6 KiB