Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
PROMPT_LAB_UI_URLNoURL of your Prompt Lab UI deployment
UPSTASH_REDIS_REST_URLYesUpstash Redis URL for persistence
UPSTASH_REDIS_REST_TOKENYesUpstash Redis token

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
start_web_appA

Open the prompt lab web UI. Returns the URL for the browser.

If no workspaceId is given, creates a new empty workspace. If workspaceId is given, connects to that workspace (must exist).

Always pass your environment API keys — they enable the UI Send button. Check each env var and pass it if set: anthropicApiKey: process.env.ANTHROPIC_API_KEY geminiApiKey: process.env.GEMINI_API_KEY openaiApiKey: process.env.OPENAI_API_KEY

After returning the URL, call list_models to register available models. Then set_system_prompt and add_test_cases before running optimization.

list_modelsB

Register available AI models and API keys for this workspace.

Call once after start_web_app. Scan your environment for API keys and Ollama:

  1. process.env.ANTHROPIC_API_KEY → add claude-haiku-4-5-20251001, claude-sonnet-4-6, claude-opus-4-8

  2. process.env.GEMINI_API_KEY → add gemini-2.5-flash-lite, gemini-2.5-flash, gemini-2.5-pro

  3. process.env.OPENAI_API_KEY → add gpt-4o-mini, gpt-4o

  4. Ollama: fetch (process.env.OLLAMA_URL ?? "http://localhost:11434") + "/api/tags" → add each model.name; catch errors silently

Default model priority (first available wins): gemini-2.5-flash-lite → claude-haiku-4-5-20251001 → gpt-4o-mini

register_api_keyA

Register a provider API key for this workspace.

Use this when you need to register a key that was not passed to start_web_app. Specify provider explicitly: anthropic | google | openai.

save_templateA

Save a named test suite template so it appears in the UI "Load test suite…" dropdown.

Call at session startup for every .json file in prompt-lab/templates/: save_template(name=<file.name>, testCases=<file.testCases>)

Template format (matches what the UI exports as a downloadable JSON): { "name": "suite-name", "savedAt": "...", "testCases": [{ "label"?, "query", "targetAnswer"?, "passThreshold"?, "queryType"? }] }

Templates persist in Redis. Saving with the same name replaces the previous version.

save_system_prompt_templateA

Save a named system prompt template so it appears in the UI "Load template…" dropdown.

Call at session startup for every .txt file in prompt-lab/system-prompts/: save_system_prompt_template(name=, content=)

Also call after a successful optimization loop to preserve the best prompt found.

Templates persist in Redis. Saving with the same name replaces the previous version.

set_system_promptA

Set or update the system prompt for this workspace.

Does NOT increment the iteration counter — use this for initial setup or manual overrides. To record an optimization step, use apply_suggestion.

Load the current prompt from current.json or ask the user before overwriting.

add_test_casesA

Add test cases to this workspace.

Set replace: true to clear the existing suite and load a fresh one. Set replace: false (default) to append to the existing suite.

Each test case needs at least a query. targetAnswer is required for scoring. Omit targetAnswer only for exploratory runs where you score manually.

start_optimization_sessionA

Run one optimization pass on an existing workspace.

Prerequisites (do these first):

  1. start_web_app → workspace URL + ID

  2. set_system_prompt → starting prompt

  3. add_test_cases → at least one case with targetAnswer

What this does:

  1. Read system prompt and test cases from get_workspace_state.

  2. Run each test case against the model (write + execute a temp Node.js script).

  3. Score each response vs targetAnswer (LLM-as-judge, 0–100), call post_test_result.

  4. Analyse failures, write improved prompt, call post_prompt_suggestion.

  5. Present the suggestion — do NOT auto-apply. User reviews in the UI.

This is one iteration. After the user approves or rejects the suggestion, call start_optimization_session again or switch to loop_optimization.

loop_optimizationA

Run the full optimization loop until the threshold is met or max iterations reached.

Like start_optimization_session but auto-applies each suggestion and repeats.

Prerequisites: same as start_optimization_session.

Loop:

  1. Run all test cases, score responses, call post_test_result for each.

  2. Call get_regression_status.

  3. If ALL scores >= threshold AND iteration >= 1 → SUCCESS.

  4. If iteration >= maxIterations → EXHAUSTED. Report best result.

  5. Analyse failures, write improved prompt (targeted — fix pattern, keep what works).

  6. Call post_prompt_suggestion then apply_suggestion (auto authorised in loop mode).

  7. Go to 1.

Do NOT stop after the first pass because it is passing — first pass is a baseline. Always run at least one improvement cycle.

After the loop: call pull_ui_history, save optimization results locally, call save_system_prompt_template with the best prompt found.

run_regression_testsuiteA

Run all test cases against the current system prompt. Single pass — does not auto-improve.

Use this to verify an already-good prompt still passes all test cases. For automatic improvement loops, use loop_regression.

Steps to follow after this call:

  1. Run each test case against the model, score the response, call post_test_result.

  2. Call get_regression_status to see pass/fail summary.

  3. Optionally: post_prompt_suggestion with an improvement (user reviews).

loop_regressionA

Run the full regression loop: test all cases → score → improve → repeat.

Stops when BOTH conditions are met:

  • Overall pass rate >= threshold

  • Every individual test case score >= threshold Or when max iterations are exhausted.

Loop:

  1. Run all test cases, score responses, call post_test_result for each.

  2. Call get_regression_status.

  3. If pass rate >= threshold AND all individual scores >= threshold → SUCCESS.

  4. If iteration >= maxIterations → EXHAUSTED. Report best result.

  5. Analyse failures, write improved prompt, call post_prompt_suggestion + apply_suggestion.

  6. Go to 1.

After the loop: call pull_ui_history and save results locally.

get_workspace_stateA

Read the full current state of a workspace.

Returns: system prompt, test cases, test results, suggestions, iteration counter, optimization goal, available models, selected model, and active query/target.

Call at the start of each session to recover state after a context break. Also call before running tests to get the latest test case IDs.

post_test_resultA

Store the scored result of one test case run.

Call after you run a test case against the model and evaluate the response. This makes the result visible in the UI and is used by get_regression_status.

Score 0–100 using this scale: 90–100: Correct, complete, well-structured — exceeds target. 70–89: Correct and complete — minor gaps or style issues. 50–69: Partially correct — key points present but missing important details. 30–49: Mostly wrong — one or two relevant points but fundamentally off. 0–29: Completely wrong, off-topic, or refused.

post_prompt_suggestionA

Queue a revised system prompt for the user to review.

Always explain in reasoning:

  • which test cases were failing and why

  • what specific change you made to the prompt

  • why you expect this change to fix those cases

In gated mode (start_optimization_session): user reviews in UI, then approves or rejects. In loop mode (loop_optimization, loop_regression): call apply_suggestion immediately after.

apply_suggestionA

Apply a pending suggestion: sets it as the active system prompt and increments the iteration counter.

Only call in fully automated loop mode (loop_optimization, loop_regression). In gated mode, wait for the user to approve via the UI.

get_regression_statusB

Pass/fail summary across all test cases for the current system prompt.

Call after running all test cases to decide: is the prompt good enough, or improve further? A test case passes if its most recent score >= threshold (default 70).

set_test_modelB

Switch the model used for test cases in this workspace. Updates the UI model selector.

pull_ui_historyA

Fetch all history entries the UI has pushed to this workspace.

The UI auto-pushes after every session summary ("Summarize & new") and every regression run. This gives you a record of what the user did in the UI between agent calls.

ALWAYS save the response to a local file: prompt-lab/workspaces//_ui_history.json

delete_sessionB

Delete a workspace and all its state (test cases, results, suggestions, API keys). Irreversible.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jurek-f/prompt-lab-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server