Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
test_pathNoCustom path to the directory containing test cases (passed as --test-path). Defaults to 'tests/'.
OPENAI_API_KEYNoAPI key for OpenAI, used for semantic similarity scoring, LLM-as-judge evaluation, and embeddings.
ANTHROPIC_API_KEYNoAPI key for Anthropic, used for running skill tests with the default provider.
SKILL_TEST_API_KEYNoAPI key specifically for the skill test provider if different from global keys.
SKILL_TEST_BASE_URLNoBase URL for the skill test provider (e.g., for DeepSeek, Groq, or Together API).
SKILL_TEST_PROVIDERNoProvider for skill tests (e.g., 'openai', 'anthropic').
EVALVIEW_SLACK_WEBHOOKNoWebhook URL for Slack alerts used during production monitoring.

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
create_test

Create a new EvalView test case YAML file for an agent. Call this when the user asks to add a test, or when you want to capture expected agent behavior. After creating a test, call run_snapshot to establish the baseline. No YAML knowledge required — just describe the test. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If found, use it. Otherwise use 'tests'.

run_check

Check for regressions against the golden baseline. Returns a diff summary for each test: PASSED, OUTPUT_CHANGED, TOOLS_CHANGED, or REGRESSION. REGRESSION means the score dropped significantly — treat this as a blocking failure. TOOLS_CHANGED / OUTPUT_CHANGED are warnings: the agent's behavior shifted but may be intentional. Use this after any code change (prompt, model, tools) to confirm nothing broke. If you see a regression, show the diff to the user and offer to fix it before moving on. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If it exists, pass it as test_path. If the project has a custom test location, use that instead.

run_snapshot

Run tests and save passing results as the new golden baseline. Use this to establish or update the expected behavior after an intentional change. Future run_check calls will compare against this snapshot. Call this: (1) after creating a new test with create_test, (2) after confirming a behavioral change is intentional, (3) before making large refactors so you have a clean rollback point. Only passing tests are saved — failing tests are skipped with a warning. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If it exists, pass it as test_path.

list_tests

List all available golden baselines in this EvalView project. Shows test names, variant counts, and when each baseline was last updated.

validate_skill

Validate a SKILL.md file for correct structure, naming conventions, and completeness. Call this after writing or editing a SKILL.md before running tests. Returns a list of issues found and whether the skill is valid.

generate_skill_tests

Auto-generate test cases from a SKILL.md file. Call this when the user asks to create tests for a skill — it reads the skill definition and generates a ready-to-run YAML test suite covering explicit, implicit, contextual, and negative test categories. After generating, call run_skill_test to execute them.

run_skill_test

Run a skill test suite against a SKILL.md. Executes two evaluation phases: Phase 1 (deterministic) checks tool calls, file operations, commands run, output content, and token budgets. Phase 2 (rubric) uses LLM-as-judge to score output quality against a defined rubric. Call this after writing skill tests or after any change to the skill or agent. Use --no-rubric for fast Phase 1-only checks with no LLM cost.

generate_visual_report

Generate a beautiful self-contained HTML visual report from the latest evalview check or run results. Opens automatically in the browser. Call this after run_check or run_snapshot to give the user a visual breakdown of traces, diffs, scores, and timelines. Returns the absolute path to the generated HTML file.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hidai25/eval-view'

If you have feedback or need assistance with the MCP directory API, please join our Discord server