How do I use Confident AI?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Confident AI show me my latest evaluation results" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Confident AI

Official

by confident-ai

Overview Schema Related Servers Score Discussions

Python

Hybrid

Confident AI MCP Server

License: MIT Python 3.12+ MCP Confident AI

The Confident AI MCP Server connects AI-powered tools to Confident AI, a platform to evaluate, observe, and iterate on AI quality. It gives you full control over your resources directly from your editor:

Cloud evaluations and metric collections
Evaluation datasets
Prompt versioning and management
Production tracing and observability
Human annotations and feedback

For users of DeepEval, Confident AI is also the native backend and persistence layer for your evaluation results. This MCP server gives you the ability to iterate on your AI application by bringing all of that data directly into tools like Cursor and Claude Code.

WARNING

This MCP server is currently inbeta. We invite everyone to try it out but also reach out to the Confident AI team before doing so to avoid any surprises in functionality.

Use Cases

Built for developers who want to iterate faster on their AI applications from inside editors like Cursor, Claude Code, and Windsurf — from simple queries to fully automated improvement workflows:

10x your iteration speed. Run an eval, check if a set of prompts are better — in one continuous workflow instead of scattered across tools. What used to take an hour of tab-switching now takes one conversation.
Go from eval results to action plan automatically. Your AI assistant can pull eval results, read what failed and why, and draft a plan for what to improve next — no manual analysis needed.
Use production traces for iteration. Pull the trace, see what went in and what came out, read what users said — and fix it before anyone else notices.
Let human feedback drive your next iteration. Pull annotation data your team left on production traces and have your AI assistant use it to decide what to fix and how.

Every time you leave your editor to check eval results, tweak a prompt in a dashboard, or look up what your team annotated — you lose context and iteration speed.

How is this different from the platform?

Confident AI has a full web UI where you can do all of this with a mouse. This MCP server is the same platform, accessed from your editor instead. Think: AWS web console vs. AWS CLI — same resources, different interface.

The server speaks the Model Context Protocol (MCP), so any compatible client connects out of the box. The web UI isn't going anywhere. This is just another way in.

Jump Ahead

Prerequisites — What you need before you start
Quickstart — Get up and running in under a minute
- Cursor · Claude Code (or Desktop) · Windsurf · Run Locally
Configuration — Environment variables for regions, on-prem, and advanced setup
Available Tools — Full reference of all 27 tools
License

Related MCP server: passoff

Prerequisites

A Confident AI API key.
An MCP-compatible client — Cursor, Claude, Windsurf, or any other client that supports the Model Context Protocol.

Quickstart

Confident AI hosts the MCP server for you. Pick your region:

Region	MCP Server URL
US (default)	`https://mcp.confident-ai.com/mcp`
EU	`https://eu.mcp.confident-ai.com/mcp`
Self-hosted	Use your own deployment URL

TIP

The examples below use theUS server URL. For other regions, swap the URL:

EU: https://eu.mcp.confident-ai.com/mcp
Self-hosted / On-prem: If you're running your own instance of Confident AI, you can run this MCP server yourself and point it at your deployment. See Running the Server Locally for setup instructions.

🖥️ Cursor

Add the following to your .cursor/mcp.json file:

{
  "mcpServers": {
    "confident-ai": {
      "url": "https://mcp.confident-ai.com/mcp",
      "headers": {
        "Authorization": "Bearer <YOUR_CONFIDENT_API_KEY>"
      }
    }
  }
}

🤖 Claude Code (or Desktop)

Claude Code — run the following command in your terminal:

claude mcp add --transport http confident-ai https://mcp.confident-ai.com/mcp --header "Authorization: Bearer <YOUR_CONFIDENT_API_KEY>"

Claude Desktop — add the following to your claude_desktop_config.json file:

{
  "mcpServers": {
    "confident-ai": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mcp.confident-ai.com/mcp",
        "--header",
        "Authorization: Bearer <YOUR_CONFIDENT_API_KEY>"
      ]
    }
  }
}

🏄 Windsurf

Add the following to your Windsurf MCP configuration:

{
  "mcpServers": {
    "confident-ai": {
      "serverUrl": "https://mcp.confident-ai.com/mcp",
      "headers": {
        "Authorization": "Bearer <YOUR_CONFIDENT_API_KEY>"
      }
    }
  }
}

🛠️ Running the Server Locally

If you're self-hosting or contributing to this project, you can run the server from source.

Prerequisites: Python >= 3.12, Poetry

poetry install
poetry run python server.py

The server will start on http://0.0.0.0:8081. It uses the Streamable HTTP transport — a single /mcp endpoint that handles both GET and POST.

The /mcp endpoint requires a Bearer token in the Authorization header (your Confident AI API key).

When running locally, point your MCP client to http://localhost:8081/mcp instead of the hosted URLs above.

To run in stdio mode instead (for MCP clients that communicate over stdin/stdout), uncomment the relevant block at the bottom of server.py:

if __name__ == "__main__":
    mcp.run(transport="stdio")

Configuration

NOTE

This section is only relevant if you'rerunning the server locally. If you're using the hosted server, the only thing you need is your API key in the quickstart configs above.

The server is configured through environment variables. You can set these in a .env file in the project root.

Variable	Description	Default
`CONFIDENT_API_KEY`	Your Confident AI API key	Required
`CONFIDENT_ENVIRONMENT`	`LOCAL`, `PROD`, or `ON_PREM`	`LOCAL`
`CONFIDENT_REGION`	`US`, `EU`, or `AU` (only used when `CONFIDENT_ENVIRONMENT=PROD`)	`US`
`CONFIDENT_BACKEND_LOCAL_URL`	Backend URL for local development	—
`CONFIDENT_BACKEND_US_PROD_URL`	US production backend URL	—
`CONFIDENT_BACKEND_EU_PROD_URL`	EU production backend URL	—
`CONFIDENT_BACKEND_AU_PROD_URL`	AU production backend URL	—
`CONFIDENT_BACKEND_ON_PREM_URL`	On-prem backend URL (required when `CONFIDENT_ENVIRONMENT=ON_PREM`)	—

Available Tools

Manage prompt templates with full version control — pull, push, version, and interpolate.

Tool	Description
`pull_prompt`	Fetch a prompt by alias, version, label, or commit hash
`push_prompt`	Create or update a prompt template on Confident AI
`interpolate_prompt`	Locally render a prompt template by replacing placeholders with values
`create_prompt_version`	Assign a version string to a specific prompt commit
`list_prompt_versions`	List all formal versions of a prompt
`list_prompt_commits`	List the full commit history of a prompt
`list_prompts`	List all prompts in your project

Pull evaluation datasets for use in local test runs or agent workflows, with full version control to pin runs to immutable snapshots of goldens.

Tool	Description
`pull_dataset`	Fetch a dataset (single-turn or multi-turn) by alias, optionally pinned to a `version`
`push_dataset`	Create or update datasets by adding new goldens, optionally onto a specific `version`
`list_datasets`	List all datasets in your project
`create_dataset_version`	Snapshot the current dataset state as a new immutable version
`list_dataset_versions`	List all versions of a dataset (newest first)

Trigger cloud evaluations and simulate multi-turn conversations.

Tool	Description
`run_llm_evals`	Run cloud evaluations on a batch of test cases against a metric collection
`simulate_conversation`	Simulate the next turn of a multi-turn conversation using a scenario and expected outcome

Browse, inspect, and evaluate production observability data at every level of your LLM pipeline.

Tool	Description
`list_traces`	List traces with filtering by environment, time range, and sort order
`get_trace`	Get full details of a specific trace, including all spans
`list_threads`	List conversation threads with filtering and pagination
`get_thread`	Get full details of a thread, including all traces and thread-level metrics
`list_spans`	List spans with filtering by type, error state, prompt version, and more
`get_span`	Get full details of a span, including I/O, cost, metrics, and annotations
`evaluate_trace`	Trigger a cloud evaluation on a specific trace
`evaluate_thread`	Trigger a cloud evaluation on a conversation thread
`evaluate_span`	Trigger a cloud evaluation on a specific span

Create and manage human feedback on traces, spans, and threads.

Tool	Description
`list_annotations`	List annotations with filtering by target, type, and rating range
`get_annotation`	Get full details of a specific annotation
`create_annotation`	Create a new annotation (thumbs rating or star rating) on a trace, span, or thread
`update_annotation`	Update an existing annotation's rating, explanation, or expected output

Inspect past evaluation runs and their results.

Tool	Description
`list_test_runs`	List test runs with filtering by status, time range, and multi-turn type
`get_test_run`	Get full details of a test run, including per-test-case metric scores and reasoning

Discover available metric collections before triggering evaluations.

Tool	Description
`list_metric_collections`	List all metric collections, including their metrics and thresholds

Public Endpoint

CAUTION

The hosted/mcp endpoint is strictly for internal development and experimental use. It is not designed for public consumption. The API and its underlying data structures are unstable and subject to change, breaking updates, or removal at any time without prior notice. Do not build production applications or rely on this public endpoint for any critical workflows.

License

This project is licensed under the terms of the MIT License.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/confident-ai/confident-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server