Skip to main content
Glama

Langfuse MCP Server

License: MIT Python 3.10+

Model Context Protocol server for Langfuse observability. Query traces, analyze accuracy, detect failures, track costs, debug latency, manage prompts and datasets.

56 tools across data access and analytics. Multi-project support so one instance can serve several Langfuse projects. Works with Claude Code, Codex CLI, Cursor, and any MCP-compatible client.

Why this MCP server?

Comparison with official Langfuse MCP (as of March 2026):

Capability

This server

Official Langfuse MCP

Traces & Observations

Yes

No

Sessions & Users

Yes

No

Exception Tracking

Yes

No

Prompt Management

Yes

Yes

Dataset Management

Yes

No

Annotation Queues

Yes

No

Scores v2 API

Yes

No

Score Write-back

Yes

No

Multi-project support

Yes

No

Accuracy Metrics

Yes

No

Failure Detection

Yes

No

Token Percentiles

Yes

No

Cost Breakdown

Yes

No

Latency Analysis

Yes

No

Session Analytics

Yes

No

Context Breach Scanning

Yes

No

User Group Aggregation

Yes

No

The official MCP focuses on prompt management. This server provides a full observability and analytics toolkit — traces, observations, sessions, scores, exceptions, prompts, datasets, annotation queues, plus 9 built-in analytics tools that compute insights server-side and return LLM-sized summaries. Multi-project routing lets a single instance serve several Langfuse projects behind one connector URL.


Quick Start

1. Get your API keys

  • Langfuse Cloud: cloud.langfuse.com → Settings → API Keys

  • Self-hosted: Your Langfuse instance → Settings → API Keys. Set LANGFUSE_HOST to your instance URL (e.g., https://langfuse.yourcompany.com)

2. Add the MCP server

Claude Code

claude mcp add \
  -e LANGFUSE_PUBLIC_KEY=pk-lf-... \
  -e LANGFUSE_SECRET_KEY=sk-lf-... \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  --scope project \
  langfuse-mcp -- uvx langfuse-mcp-server

Codex CLI

codex mcp add langfuse-mcp \
  --env LANGFUSE_PUBLIC_KEY=pk-lf-... \
  --env LANGFUSE_SECRET_KEY=sk-lf-... \
  --env LANGFUSE_HOST=https://cloud.langfuse.com \
  -- uvx langfuse-mcp-server

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "langfuse-mcp": {
      "command": "uvx",
      "args": ["langfuse-mcp-server"],
      "env": {
        "LANGFUSE_PUBLIC_KEY": "pk-lf-...",
        "LANGFUSE_SECRET_KEY": "sk-lf-...",
        "LANGFUSE_HOST": "https://cloud.langfuse.com"
      }
    }
  }
}

3. Verify

Restart your CLI, then test with /mcp (Claude Code) or codex mcp list (Codex).

Manual install (alternative to uvx)

pip install langfuse-mcp-server
langfuse-mcp-server

Hosting as a remote service

Run as a long-lived HTTP service so multiple users connect to a single instance — required for Claude.ai custom Connectors, and useful for team-wide access without distributing Langfuse API keys per user.

Enabled via env vars; no code changes.

Minimum setup

MCP_TRANSPORT=streamable-http
MCP_BASE_URL=https://mcp.yourcompany.com
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://your-langfuse-instance.example

Without OAuth env vars, the endpoint is unauthenticated — suitable only for local testing. See Google OAuth setup below for production.

Docker

A production-ready Dockerfile is checked into the repo (non-root user, pinned base, .dockerignore to prevent secret leakage). Each tagged release auto-publishes a multi-arch image to GitHub Container Registry via .github/workflows/docker-publish.yml.

Pull the published image:

docker pull ghcr.io/drishtantkaushal/langfusemcp:latest

Or build from source:

docker build -t langfuse-mcp .

Run (all secrets injected via -e, never baked into the image):

docker run -d \
  --name langfuse-mcp \
  --restart unless-stopped \
  -p 8000:8000 \
  -e MCP_TRANSPORT=streamable-http \
  -e MCP_BASE_URL=https://mcp.yourcompany.com \
  -e LANGFUSE_PUBLIC_KEY=pk-lf-... \
  -e LANGFUSE_SECRET_KEY=sk-lf-... \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  -e GOOGLE_CLIENT_ID=... \
  -e GOOGLE_CLIENT_SECRET=... \
  -e ALLOWED_EMAIL_DOMAINS=yourcompany.com \
  ghcr.io/drishtantkaushal/langfusemcp:latest

Reverse proxy

Terminate TLS in front (nginx, Caddy, Cloudflare). MCP endpoint is at /mcp/ (trailing slash). Because responses stream, the proxy must:

  • Disable response buffering — nginx: proxy_buffering off;

  • Allow read timeout ≥ 5 minutes — some analytics queries legitimately run several minutes

  • Speak HTTP/1.1 with keepalive upstream

Google OAuth

In your Google Cloud project:

  1. APIs & Services → OAuth consent screen

    • User type: Internal (restricts sign-in to your Google Workspace domain)

    • Scopes: openid, https://www.googleapis.com/auth/userinfo.email

  2. Credentials → Create OAuth client ID → Web application

    • Authorized redirect URI: https://{your-base-url}/auth/callback

    • Copy the Client ID and Client Secret

Set:

GOOGLE_CLIENT_ID=....apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=GOCSPX-...

OAuth activates when GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, and MCP_BASE_URL are all set. With an Internal consent screen, Google rejects non-Workspace sign-ins at the identity layer — the server never sees those attempts.

Optional email allowlist

For narrower control than "anyone in the Workspace":

# either, or both
ALLOWED_EMAIL_DOMAINS=yourcompany.com
ALLOWED_EMAILS=alice@yourcompany.com,bob@yourcompany.com

When set, every tool call verifies the caller's email_verified claim and checks membership before proceeding. When unset, the server trusts whatever the OAuth provider returns.

Adding to Claude.ai

Once hosted at https://mcp.yourcompany.com:

  1. Claude.ai → Settings → Connectors → Add custom connector

  2. Remote MCP server URL: https://mcp.yourcompany.com/mcp/

  3. Leave the OAuth Client ID / Secret fields empty — the server uses Dynamic Client Registration; those fields are for a different deployment pattern.

  4. Click Add → Google sign-in popup → done.

Verifying the deploy

Auth enabled, expect 401:

curl -i -X POST https://mcp.yourcompany.com/mcp/ \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"curl","version":"0"}}}'

OAuth metadata endpoint returns JSON (used by Claude.ai to auto-register):

curl https://mcp.yourcompany.com/.well-known/oauth-authorization-server

Liveness/readiness probe — unauthenticated GET /health returns HTTP 200 with {"status": "ok"}, suitable for Kubernetes probes:

curl -i https://mcp.yourcompany.com/health

Multi-project support

A single server instance can route to multiple Langfuse projects. Every tool accepts an optional project argument; when omitted, the server-configured default is used. Call list_projects to discover what's available.

Configuring projects

Declare each project via indexed env vars. Project names are data, not part of variable names — use whatever scheme you like.

LANGFUSE_PROJECT_1_NAME=production
LANGFUSE_PROJECT_1_PUBLIC_KEY=pk-lf-...
LANGFUSE_PROJECT_1_SECRET_KEY=sk-lf-...
LANGFUSE_PROJECT_1_HOST=https://cloud.langfuse.com

LANGFUSE_PROJECT_2_NAME=staging
LANGFUSE_PROJECT_2_PUBLIC_KEY=pk-lf-...
LANGFUSE_PROJECT_2_SECRET_KEY=sk-lf-...
LANGFUSE_PROJECT_2_HOST=https://cloud.langfuse.com

LANGFUSE_DEFAULT_PROJECT=production

Usage from the client

Claude: "Show me failing traces in production today."
→ fetch_traces(project="production", ...) routed to project 1's credentials.

Claude: "Compare that with staging."
→ fetch_traces(project="staging", ...) routed to project 2's credentials.

Each project has its own cache, rate limiter, and connection pool. Claude.ai sees one connector; users authenticate once via OAuth and can query any configured project within the session.

Single-project (legacy) mode

If LANGFUSE_PROJECT_1_NAME is not set, the server falls back to the legacy LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY / LANGFUSE_HOST vars and registers them as a project called default. Existing deployments keep working without changes.


Configuration

Env Variable

Default

Description

LANGFUSE_PUBLIC_KEY

(required)

Langfuse public API key

LANGFUSE_SECRET_KEY

(required)

Langfuse secret API key

LANGFUSE_HOST

https://cloud.langfuse.com

Langfuse instance URL (cloud or self-hosted)

LANGFUSE_INTERNAL_DOMAINS

""

Comma-separated internal domains to exclude from analytics (e.g., mycompany.com,test.com). Applies when using group_by='domain'.

LANGFUSE_MCP_READ_ONLY

false

Disable write operations (score_traces, create_dataset, etc.)

LANGFUSE_PAGE_LIMIT

100

Traces per API page

LANGFUSE_PROJECT_{N}_NAME

(unset)

Multi-project: name for project N (e.g. production). See Multi-project.

LANGFUSE_PROJECT_{N}_PUBLIC_KEY

(unset)

Public key for project N.

LANGFUSE_PROJECT_{N}_SECRET_KEY

(unset)

Secret key for project N.

LANGFUSE_PROJECT_{N}_HOST

https://cloud.langfuse.com

Host URL for project N.

LANGFUSE_DEFAULT_PROJECT

first configured

Default project name used when a tool call omits project.

MCP_TRANSPORT

stdio

stdio or streamable-http. HTTP mode listens on a port instead of stdin/stdout. See Hosting.

MCP_HOST

0.0.0.0

Bind address when MCP_TRANSPORT=streamable-http.

MCP_PORT

8000

Port when MCP_TRANSPORT=streamable-http.

MCP_BASE_URL

(unset)

Public base URL of the hosted server. Required for Google OAuth.

GOOGLE_CLIENT_ID

(unset)

Google OAuth client ID. OAuth activates when all three Google vars are set.

GOOGLE_CLIENT_SECRET

(unset)

Google OAuth client secret.

ALLOWED_EMAILS

(unset)

Comma-separated emails allowed to call tools. Requires OAuth.

ALLOWED_EMAIL_DOMAINS

(unset)

Comma-separated email domains allowed to call tools. Requires OAuth.


Tools

Analytics (9 tools)

Tools that compute insights server-side and return compact summaries. These go beyond raw data access — they aggregate, detect patterns, and compute statistics so the LLM can reason over results without hitting context window limits.

Tool

Description

Key Parameters

aggregate_by_group

Aggregate trace metrics by user group. Returns per-group: trace count, unique sessions, unique users, accuracy rate, average latency, total cost.

group_by (name/userId/domain/tag), time_range, top_n, exclude_internal

compute_accuracy

Compute accuracy from feedback scores. Accuracy = correct / (correct + incorrect). Supports grouping and time bucketing for trend analysis.

group_by, bucket_by (week/day), score_name, time_range

detect_failures

Detect LLM output quality failures using pattern matching ("unable to", "I can't", etc.) and negative feedback scores. NOT Python exceptions — use find_exceptions for those.

group_by, include_examples, max_examples, time_range

compute_token_percentiles

Compute token usage percentiles (TP50/TP90/TP95/TP99) at trace level. Fetches generation observations for accurate per-trace token counts.

group_by, percentiles, time_range

detect_context_breaches

Scan for traces where any single generation exceeds a token threshold. Catches context window overflow causing degraded LLM performance or silent truncation.

threshold (default 256000), check_per_generation, time_range

analyze_sessions

Analyze multi-turn session behavior. Returns session count, depth distribution (single vs multi-turn), engagement metrics, and session-level cost/latency.

group_by, time_range

estimate_costs

Compute cost breakdown using Langfuse's built-in totalCost field (model-aware, computed by Langfuse). Groups by user, agent, or time bucket.

group_by, bucket_by (week/day), time_range

analyze_latency

Analyze latency distribution at trace level and optionally per LLM generation. Identifies which model is the bottleneck.

group_by, percentiles, include_per_generation, time_range

score_traces

Write scores back to Langfuse. Use after analysis to annotate traces with findings — tag failures for review, mark high-quality traces for dataset creation.

trace_ids, score_name, score_value, comment

Data Access (25 tools)

Full Langfuse API coverage for querying and managing your observability data.

Traces

Tool

Description

fetch_traces

List traces with filters — user ID, name, tags, time range, ordering. Returns paginated results.

fetch_trace

Get a single trace by ID with full details including all observations (spans, generations, events).

diff_traces

Compare two traces side-by-side (name, user, latency, cost, tags, release, version).

Observations

Tool

Description

fetch_observations

List observations with filters — trace ID, type (GENERATION/SPAN/EVENT), name, time range.

fetch_observation

Get a single observation by ID. Returns input/output, token usage, model, latency, and cost.

Sessions

Tool

Description

fetch_sessions

List sessions with optional time filters.

get_session_details

Get full details of a session including all its traces.

get_user_sessions

Get sessions for a specific user. Fetches user's traces and extracts unique sessions.

Errors

Tool

Description

find_exceptions

Find observations with error status. For LLM output quality issues, use detect_failures instead.

get_exception_details

Get full error details for a trace — returns all observations with error status highlighted.

get_error_count

Get total error count within a time period.

Scores

Tool

Description

fetch_scores

List scores/evaluations with filters — trace ID, score name, time range.

list_scores_v2

v2 Scores API with richer filters (session ID, dataset run ID, queue ID, config ID, operator/value, etc.).

get_score_v2

Get a single score by ID via the v2 Scores API.

Prompts

Tool

Description

list_prompts

List all prompts in the project with optional name filter.

get_prompt

Fetch a specific prompt by name, version, or label.

get_prompt_unresolved

Fetch a prompt with placeholders/dependencies intact (debugging prompt composition).

create_text_prompt

Create a new text prompt version with optional labels and model config.

create_chat_prompt

Create a new chat prompt version with message array and optional config.

update_prompt_labels

Update labels for a specific prompt version (e.g., promote to "production").

Datasets

Tool

Description

list_datasets

List all datasets in the project.

get_dataset

Get metadata for a specific dataset.

list_dataset_items

List items in a dataset with pagination.

get_dataset_item

Get a single dataset item by ID.

create_dataset

Create a new dataset with optional description and metadata.

create_dataset_item

Create or upsert a dataset item. Supports linking to source traces.

delete_dataset_item

Delete a dataset item by ID.

Annotation Queues

Tool

Description

list_annotation_queues

List all annotation queues in the project.

create_annotation_queue

Create a new annotation queue with attached score configs.

get_annotation_queue

Get a queue by ID.

list_annotation_queue_items

List items in a queue (optionally filtered by status).

get_annotation_queue_item

Get a queue item by ID.

create_annotation_queue_item

Add a trace or observation to a queue for review.

update_annotation_queue_item

Change a queue item's status (PENDING / COMPLETED).

delete_annotation_queue_item

Remove an item from a queue.

create_annotation_queue_assignment

Assign a reviewer to a queue.

delete_annotation_queue_assignment

Remove a reviewer from a queue.

Metrics

Tool

Description

get_daily_metrics

Langfuse's pre-aggregated daily rollup (trace count, cost, tokens per day). Faster than per-trace aggregation for long windows.

Users

Tool

Description

list_users

Top users by trace count over a time window (defaults to last 30 days). Wraps the Langfuse /metrics query API.

Comments

Tool

Description

list_comments

List comments attached to traces/observations/sessions/prompts, with filters.

get_comment

Get a single comment by ID.

create_comment

Create a markdown comment on a trace/observation/session/prompt.

Models

Tool

Description

list_models

List model definitions in Langfuse's models registry (pricing + tokenizer config).

get_model

Get a single model definition by ID.

Projects

Tool

Description

list_projects

Discovery: returns the list of configured Langfuse projects and the default project.

Schema

Tool

Description

get_data_schema

Get the data schema for the Langfuse project — available fields and types for traces, observations, scores, sessions.


Sample Questions

Once connected, ask your AI assistant questions like these:

Agent & Pipeline Health

  • "Which agents failed the most this week?"

  • "What's the failure rate by agent name?"

  • "Which agent has the worst accuracy?"

  • "Show me the top 5 agents by trace volume"

  • "Are any agents consistently slower than others?"

  • "Compare all agents by accuracy, latency, and cost"

Accuracy & Quality

  • "What's our overall accuracy this week?"

  • "What's the accuracy trend by week for the last 30 days?"

  • "Compare accuracy across different agents"

  • "What's the daily accuracy breakdown?"

  • "Which users are getting the worst accuracy?"

  • "What percentage of traces have feedback scores?"

Failures & Debugging

  • "Show me failure examples from today"

  • "What are the most common failure patterns?"

  • "Which users are seeing the most failures?"

  • "What's the failure rate by agent?"

  • "Are failures increasing or decreasing this week vs last?"

  • "Show me traces where the LLM said 'unable to' or 'I can't'"

Token Usage

  • "What are the P90 and P99 token usage stats?"

  • "Which agents consume the most tokens?"

  • "Compare token usage across user groups"

  • "Are any users hitting unusually high token counts?"

Context Window Breaches

  • "Are any generations exceeding the 128K context window?"

  • "Show me traces with token usage above 200K per generation"

  • "What's the breach severity distribution?"

  • "Which users trigger the most context window breaches?"

Sessions & Engagement

  • "What's our multi-turn rate?"

  • "How deep are sessions on average?"

  • "Which users have the deepest sessions?"

  • "How many single-turn vs multi-turn sessions this week?"

  • "What's the average session cost?"

Cost

  • "How much are we spending per day this week?"

  • "What's the weekly cost trend for the last 30 days?"

  • "Which agent is the most expensive?"

  • "Which users are costing the most?"

  • "What's the average cost per trace?"

Latency

  • "What's the P95 latency?"

  • "Is latency getting worse over time?"

  • "Which model is the slowest?"

  • "Compare latency across agents"

  • "Show me per-generation latency breakdown by model"

  • "Which users are experiencing the highest latency?"

Annotation & Write-back

  • "Score all failing traces from today with 'needs-review'"

  • "Tag these trace IDs as 'high-quality' for dataset creation"

  • "Mark trace abc-123 with a score of 0 and comment 'hallucinated output'"

Lookups & Exploration

  • "Fetch the last 20 traces"

  • "Show me trace abc-123 with all its observations"

  • "List sessions for user alice@example.com"

  • "What errors happened in the last 24 hours?"

  • "How many errors occurred this week?"

  • "Show me all prompts in the project"

  • "List all datasets"

  • "What fields are available on traces and observations?"


Grouping Options

The group_by parameter controls how traces are segmented in analytics tools:

Value

What it groups by

When to use

name

Trace/agent name (default)

Compare performance across different agents or pipelines

userId

Per-user breakdown

Identify users with issues or high usage

domain

Email domain extracted from userId

Multi-tenant apps where users have email-based IDs (e.g., user@acme.comacme.com)

tag

Trace tags

Compare across tagged environments, versions, or experiments


Selective Tool Loading

Load only the tool groups you need to reduce token overhead:

# Only load traces and analytics tools
LANGFUSE_TOOLS=traces,analytics langfuse-mcp-server

# Only load prompts and datasets
LANGFUSE_TOOLS=prompts,datasets langfuse-mcp-server

# In Claude Code
claude mcp add \
  -e LANGFUSE_PUBLIC_KEY=pk-lf-... \
  -e LANGFUSE_SECRET_KEY=sk-lf-... \
  -e LANGFUSE_TOOLS=traces,observations,analytics \
  langfuse-mcp -- uvx langfuse-mcp-server

Available groups:

Group

Tools

Count

traces

fetch_traces, fetch_trace

2

observations

fetch_observations, fetch_observation

2

sessions

fetch_sessions, get_session_details, get_user_sessions

3

errors

find_exceptions, get_exception_details, get_error_count

3

scores

fetch_scores

1

prompts

list_prompts, get_prompt, create_text_prompt, create_chat_prompt, update_prompt_labels

5

datasets

list_datasets, get_dataset, list_dataset_items, get_dataset_item, create_dataset, create_dataset_item, delete_dataset_item

7

annotation_queues

All 10 annotation queue tools

10

metrics

get_daily_metrics

1

users

list_users

1

comments

list_comments, get_comment, create_comment

3

models

list_models, get_model

2

projects

list_projects

1

schema

get_data_schema

1

analytics

All 9 analytics tools

9

If LANGFUSE_TOOLS is not set, all 56 tools are loaded.


Read-Only Mode

Disable write operations (score_traces, create_dataset, create_dataset_item, delete_dataset_item, create_text_prompt, create_chat_prompt):

LANGFUSE_MCP_READ_ONLY=true

How it Compares

vs Official Langfuse MCP

Capability

This server

Official Langfuse MCP

Traces & Observations

Yes

No

Sessions & Users

Yes

No

Exception Tracking

Yes

No

Prompt Management

Yes

Yes

Dataset Management

Yes

No

Score Write-back

Yes

No

Selective Tool Loading

Yes

No

Accuracy Metrics

Yes

No

Failure Detection

Yes

No

Token Percentiles

Yes

No

Cost Breakdown

Yes

No

Latency Analysis

Yes

No

Session Analytics

Yes

No

Context Breach Scanning

Yes

No

User Group Aggregation

Yes

No

The official Langfuse MCP (5 tools) focuses on prompt management. This server provides full observability coverage plus 9 analytics tools.

vs Other Langfuse MCP Implementations

Capability

This server

Others

Data access (traces, observations, sessions)

Yes

Yes

Prompt & dataset management

Yes

Yes

Exception tracking

Yes

Yes

Annotation queues

Yes

Partial

Selective tool loading

Yes

Yes

Multi-project support

Yes

No

Accuracy metrics

Yes

No

LLM failure detection

Yes

No

Token percentiles (TP50/P90/P95/P99)

Yes

No

Cost breakdown by group/time

Yes

No

Latency analysis with per-model breakdown

Yes

No

Multi-turn session analytics

Yes

No

Context window breach scanning

Yes

No

User/tenant group aggregation

Yes

No

Score write-back

Yes

No

Other implementations provide data access (fetching raw traces, observations, sessions) using synchronous HTTP clients. This server adds a compute layer — analytics tools that aggregate, detect patterns, and compute statistics server-side — plus an async architecture that's fundamentally faster.

Architecture

This server

Others

Async HTTP client

Yes (httpx.AsyncClient)

No (sync requests/httpx)

Concurrent observation fetching

Yes (asyncio.gather)

No (sequential per-trace)

TTL caching

Yes (live 5min, historical 1hr)

No

Adaptive rate limiting

Yes (token bucket, 429 backoff)

No (fixed sleep)

Batch observation queries

Yes (with auto-fallback)

No (N+1 per-trace)

Claude Code sub-agent

Yes (.claude/agents/)

No

vs Platform-Embedded AI (Braintrust Loop, LangSmith Insights, Arize Alyx)

Capability

This server

Platform AI assistants

Open source

Yes

No

Works with any MCP client

Yes

Platform-locked

Self-hosted Langfuse support

Yes

N/A

Real-time conversational

Yes

Varies (some batch-only)

Custom grouping/segmentation

Yes

Limited

Write-back to Langfuse

Yes

Platform-specific

Free

Yes

Paid tiers


Architecture

Why async httpx instead of the Langfuse SDK?

The Langfuse Python SDK is excellent for writing traces (it batches and sends asynchronously in the background). But for reading traces at scale — which is what an analytics MCP server does — the SDK has a limitation: its read API is synchronous, built on the requests library.

This server uses httpx.AsyncClient instead, which enables:

  • Concurrent observation fetching — fetch observations for 100 traces simultaneously via asyncio.gather, not one-by-one

  • Non-blocking pagination — paginate through thousands of traces without blocking the event loop

  • Rate-limited concurrencyasyncio.Semaphore + token bucket controls throughput without time.sleep() blocking

Measured impact: analyze_latency with per-generation breakdown dropped from 110s to 20s (5.4x faster) on a self-hosted instance with 2.4M daily observations.

Caching strategy

Two-tier in-memory TTL cache using cachetools.TTLCache:

Data age

TTL

Rationale

Today's data

5 minutes

Still changing, short cache

Historical data (before today)

1 hour

Won't change, cache aggressively

The cache operates at the API page level. If you call aggregate_by_group then compute_accuracy for the same time range, the second call hits cache for all trace pages — only scores are fetched fresh.

Configure via LANGFUSE_CACHE_TTL and LANGFUSE_CACHE_TTL_HISTORICAL (seconds).

Rate limiting

A global token bucket rate limiter respects Langfuse API limits:

Instance type

Default RPM

Behavior

Self-hosted

Unlimited (0)

No artificial throttling. Full speed, limited only by your server.

Langfuse Cloud (Hobby)

30 req/min

Conservative default for Hobby tier

Langfuse Cloud (Pro/Team)

Set LANGFUSE_RATE_LIMIT_RPM=1000

Higher throughput for paid plans

On HTTP 429 responses, the limiter automatically halves the RPM and reads the Retry-After header. This means the server adapts to any rate limit — cloud or self-hosted — without manual configuration.

Observation fetching: batch vs concurrent

Analytics tools that need per-generation data (token percentiles, context breaches, latency breakdown) face the N+1 problem: one API call per trace to fetch its observations.

This server uses a two-step strategy:

  1. Try batch fetch — fetch ALL observations for the time range in one paginated call, group by traceId in memory

  2. If volume is too high (>5000 pages / 500K+ observations) — fall back to concurrent per-trace fetch using asyncio.gather with semaphore-controlled concurrency

This means the server handles both small projects (batch is faster) and large-scale deployments (concurrent targeted fetching avoids downloading millions of irrelevant observations).

Context isolation via sub-agent

The server ships with a Claude Code custom agent at .claude/agents/langfuse-analyst.md. When a user asks a Langfuse-related question, Claude Code can delegate to this agent, which:

  • Only loads Langfuse MCP tools (not other tools in the session)

  • Has a specialized system prompt with tool taxonomy and workflow patterns

  • Runs in an isolated context window, keeping the main conversation clean

  • Returns a summary to the parent conversation

This prevents 33 tool schemas (~5000 tokens) from polluting every conversation.


Contributing

See CONTRIBUTING.md for development setup, code style guidelines, and areas for contribution.

Security

See SECURITY.md for the security policy, vulnerability reporting, and API key handling.

Code of Conduct

See CODE_OF_CONDUCT.md.

License

MIT — see LICENSE.

A
license - permissive license
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DrishtantKaushal/LangfuseMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server