Skip to main content
Glama
TANTIOPE

Datadog MCP Server

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
DD_SITENoDatadog site (default: datadoghq.com). Use datadoghq.eu for EU, etc.datadoghq.com
MCP_HOSTNoHTTP host when using http transport (default: 0.0.0.0)0.0.0.0
MCP_PORTNoHTTP port when using http transport (default: 3000)3000
DD_API_KEYYesYour Datadog API key
DD_APP_KEYYesYour Datadog Application key
MCP_TRANSPORTNoTransport mode (stdio or http, default: stdio)stdio
MCP_DEFAULT_LIMITNoGeneral tools default limit (default: 50)50
MCP_DEFAULT_LOG_LINESNoLogs tool default limit (default: 200)200
MCP_DEFAULT_TIME_RANGENoDefault time range in hours (default: 24)24
MCP_DEFAULT_METRIC_POINTSNoMetrics timeseries data points (default: 1000)1000

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
monitorsA

Manage Datadog monitors. Actions: list, get, search, create, update, delete, mute, unmute, top, history, preview, test_notification. Filters: name, tags, groupStates (alert/warn/ok/no data). get/create/update return the full options object so callers can safely read-then-patch.

create/update accept a config object validated against a typed schema covering the documented Datadog Monitor fields:

  • Top-level: name, type, query, message, tags, priority (1-5, nullable), restrictedRoles, multi, options.

  • options.* validated keys grouped by category:

    • notification: notifyNoData, noDataTimeframe, notifyAudit, notificationPresetName.

    • evaluation/delay: newHostDelay, newGroupDelay, evaluationDelay, requireFullWindow, onMissingData.

    • renotification: renotifyInterval (nullable), renotifyOccurrences, renotifyStatuses, escalationMessage.

    • lifecycle: timeoutH (nullable), includeTags, locked, silenced (record of timestamps/null), groupRetentionDuration.

    • thresholds: thresholds (critical/warning/ok/criticalRecovery/warningRecovery/unknown), thresholdWindows.

    • scheduling: schedulingOptions. Unknown keys (top-level or under options) are forwarded to Datadog as-is and surfaced via an optional warnings array on the response, so the schema does not lag the API. snake_case aliases are accepted on input and normalized to camelCase before validation. Validation errors short-circuit before any HTTP call and surface as 'EINVALID_MONITOR_CONFIG: : '. Reference: https://docs.datadoghq.com/api/latest/monitors/

top: Ranked monitors by alert frequency with real monitor names and context breakdown.

  • Returns: {rank, monitor_id, name (with {{template.vars}}), message (template), total_count, by_context}

  • Perfect for weekly/daily alert reports

  • Gets real monitor names from monitors API (not event titles)

  • WARNING: total_count is the raw alert-event count and INCLUDES renotifies/re-evaluations. For monitors stuck in Alert state, Datadog emits a renotify event every renotify_interval minutes, which inflates this count well beyond the number of real fires. When the question is "how many times did this monitor actually fire", use action=history instead.

history: Count and list real state transitions for one monitor over a time window.

  • Inputs: id (required, monitor ID), from/to (optional time range), transitionType (optional filter, defaults to ["alert","alert recovery"]), group (optional multi-alert group filter).

  • Returns: {transitions: [{timestamp, monitorId, monitorName, group, fromState, toState, transitionType, eventId}], count, meta}

  • count = transitions.length — the number of REAL state changes (fires + recoveries by default), NOT the renotify-inflated count returned by action=top or events action=search.

  • Backed by Datadog v2 events search with a hardcoded source:alert + @monitor.transition. transition_type filter that excludes renotifies by default. To include renotifies, pass transitionType including "renotify".

preview: Render a Datadog monitor message template against a context (read-only safe).

  • Inputs: either inline 'message' OR 'monitor_id' (or existing 'id'); plus optional 'context' { variables, conditionals }.

  • Supported syntax: {{variable.name}} substitution and conditional blocks {{#name}}...{{/name}} / {{^name}}...{{/name}} where name is one of: is_alert, is_warning, is_no_data, is_recovery, is_alert_to_warning, is_warning_to_alert.

  • Missing variables render as {{undefined:name}} markers and are reported in 'variablesMissing'.

  • Loops ({{#each ...}}) and partials ({{> ...}}) return EUNSUPPORTED_TEMPLATE_SYNTAX.

  • Allowed under --read-only (no mutation; at most a getMonitor load).

test_notification: KNOWN LIMITATION — always returns ENOT_SUPPORTED.

  • Datadog's public REST API exposes no monitor test-notification endpoint at v1 or v2 (audited against the official OpenAPI specs). The v1 SDK has no notifyMonitor / testMonitor method.

  • Allowed under --read-only because no Datadog HTTP call is attempted.

  • If Datadog publishes such an endpoint in future, this action will be reimplemented to invoke it.

  • Workaround: use the 'Test Notifications' button in the Datadog monitor UI.

For generic event grouping (deployments, configs), use events tool instead. Note that the events tool's action=search with source:alert ALSO includes renotifies; use its transitionType filter (or this action=history) for fires-only counts.

dashboardsA

Access Datadog dashboards and visualizations.

Actions:

  • list: Filter dashboards by name/tags

  • get: Retrieve full dashboard config including widgets (useful for learning patterns)

  • create: Create new dashboard

  • update: Modify existing dashboard

  • delete: Remove dashboard

  • validate: Test dashboard config without creating (helps debug widget definitions)

Widget formats supported:

  • Simple: { "type": "timeseries", "requests": [{ "q": "avg:metric{*}" }] }

  • Advanced: { "type": "timeseries", "requests": [{ "queries": [...], "formulas": [...] }] }

Tags must use key:value format (e.g., ["team:ops", "env:prod"]).

logsA

Search Datadog logs with grep-like text filtering. Actions: search (find logs), aggregate (count/group). Key filters: keyword (text grep), pattern (regex), service, host, status (error/warn/info). Time ranges: "1h", "3d@11:45:23". CORRELATION: Logs contain dd.trace_id in attributes for linking to traces and APM metrics. SAMPLING: Use sample:"diverse" for error investigation (dedupes by message pattern), sample:"spread" for time distribution. TOKEN TIP: Use compact:true to reduce payload size (strips heavy fields) when querying large volumes.

logs_pipelinesA

Manage Datadog Logs pipelines (parsing & processor chains). Actions: list, get, create, update, delete, reorder, get_order. Pipelines run sequentially on incoming logs; reorder changes the structure of downstream data. Mutations are blocked when the server is in read-only mode. Unknown processor types in 'config.processors' are forwarded to Datadog unchanged.

logs_indexesA

Manage Datadog Logs indexes (filters, retention, exclusion filters, daily limits). Actions: list, get, update, reorder, get_order. Datadog identifies indexes by 'name', not 'id'. Note: create/delete are UI-only per Datadog and not supported through the API. Mutations (update, reorder) are blocked when the server is in read-only mode.

logs_archivesA

Manage Datadog Logs archives (long-term log retention to S3 / GCS / Azure Blob). Actions: list, get, create, update, delete, reorder, get_order. Archives accept destinations of type 's3', 'gcs', or 'azure_storage'; per-provider credential and integration fields (S3 IAM role ARN, GCS service account, Azure tenant/secret) are forwarded unchanged. Mutations (create, update, delete, reorder) are blocked when the server is in read-only mode.

metricsA

Query Datadog metrics. Actions:

  • query: Get timeseries data (requires from/to time range, PromQL query)

  • search: Find metrics by name (grep-like, NO time param needed)

  • list: Get recently active metrics (last 24h, optionally filter by tag)

  • metadata: Get metric details (unit, type, description)

APM METRICS (auto-generated from traces): Keyed by OPERATION name (e.g. express.request, pg.query), NOT service name. Filter by service using tags: {service:my-service}

PERCENTILES (p50/p75/p90/p95/p99) — use the ROOT metric (distribution type): p95:trace.express.request{service:my-service}

AVG/SUM/MIN/MAX — use the .duration SUFFIX (pre-aggregated gauge): avg:trace.express.request.duration{service:my-service}

Other trace metrics (gauges):

  • trace..hits - Request count

  • trace..errors - Error count

  • trace..apdex - Apdex score

To discover operation names for a service, use: traces tool with action "services"

tracesA

Analyze APM traces for request flow and latency debugging. Actions: search (find spans), aggregate (group stats), services (list APM services). Key filters: minDuration/maxDuration ("500ms", "2s"), httpStatus ("5xx", ">=400"), status (ok/error), errorMessage (grep). APM METRICS: Traces auto-generate metrics in trace..* namespace (e.g. trace.express.request). Use metrics tool to query: avg:trace.express.request.duration{service:my-service}. For percentiles (p95), use the root metric WITHOUT .duration suffix: p95:trace.express.request{service:my-service}

eventsA

Track Datadog events. Actions: list, get, create, search, aggregate, top, timeseries, incidents, discover, histogram. For monitor alerts, use tags: ["source:alert"].

IMPORTANT — re-evaluation vs transition:

  • source:alert events INCLUDE renotifies and re-evaluations (every Datadog re-evaluation of an alerting monitor emits an event). A "how many times did monitor X fire" question answered with source:alert alone over-counts.

  • To restrict to real state transitions, pass transitionType (e.g. ["alert","alert recovery"]). This appends @monitor.transition.transition_type:(...) to the query and matches the design's live investigation.

  • For a fires-only numeric count rooted in a single monitor ID, prefer the higher-level primitive monitors action=history — it returns {transitions, count, meta} with the same filter applied for you.

transitionType: Optional array of monitor transition types (alert, alert recovery, warning, warning recovery, no data, no data recovery, renotify). Empty array is treated as undefined. top: Generic event grouping by any fields (groupBy parameter). Returns groups ranked by count with optional context breakdown.

  • Example: {groupBy: ["service"], message: "...", service: "api", total_count: 50, by_context: [{context: "queue:X", count: 30}]}

  • Use for deployments, configs, custom events, or monitor alerts

  • Returns "message" field (event title), NOT monitor name (use monitors tool for real names)

  • total_count includes renotifies when source:alert is used without transitionType — see monitors action=history for fires-only counts discover: Returns available tag prefixes from events. aggregate: Custom groupBy, returns pipe-delimited keys. search: Full event details. timeseries: Time-bucketed trends with interval. incidents: Deduplicate alerts with dedupeWindow. histogram: Bucket events by local hour_of_day / day_of_week / day_of_month in the requested IANA timezone (DST-safe). Pass bucket_by (required) and optional timezone (default UTC) and cursor (for continuation). Caps at limits.maxEventsForHistogram (default 5000); when reached returns bucketCountIncomplete:true + nextCursor.

incidentsA

Manage Datadog incidents for incident response. Actions: list, get, search, create, update, delete. Use for: incident management, on-call response, postmortems, tracking MTTR/MTTD.

slosB

Manage Datadog Service Level Objectives. Actions: list (with SLI status & error budget), get, create, update, delete, history. SLO types: metric-based, monitor-based. Each list/get/create/update response includes a url field deep-linking to the Datadog UI. Use for: reliability tracking, error budgets, SLA compliance, performance targets.

syntheticsA

Manage Datadog Synthetic tests (API and Browser). Actions: list, get, create, update, delete, trigger, results. Use for: uptime monitoring, API testing, user journey testing, performance testing, canary deployments.

hostsA

Manage Datadog infrastructure hosts. Actions: list (with filters), totals (counts), mute (silence alerts), unmute. Use for: infrastructure inventory, host health, silencing noisy hosts during maintenance.

downtimesA

Manage Datadog scheduled downtimes for maintenance windows. Actions: list, get, create, update, cancel, listByMonitor. Use for: scheduling maintenance, preventing false alerts during deployments, managing recurring maintenance windows.

rumA

Query Datadog Real User Monitoring (RUM) data. Actions: applications (list RUM apps), events (search RUM events), aggregate (group and count events), performance (Core Web Vitals: LCP, FCP, CLS, FID, INP), waterfall (session timeline with resources/actions/errors). Use for: frontend performance, user sessions, page views, errors, resource loading.

securityB

Query Datadog Security Monitoring. Actions: rules (list detection rules), signals (search security signals), findings (list security findings). Use for: threat detection, compliance, security posture, incident investigation.

notebooksB

Manage Datadog Notebooks. Actions: list (search notebooks), get (by ID with cells), create (new notebook), update (modify notebook), delete (remove notebook). Use for: runbooks, incident documentation, investigation notes, dashboards as code.

usersB

Manage Datadog users. Actions: list (with filters), get (by ID). Use for: access management, user auditing, team organization.

teamsB

Manage Datadog teams. Actions: list (with filters), get (by ID), members (list team members). Use for: team organization, access management, collaboration.

tagsA

Manage Datadog host tags. Actions: list (all host tags), get (tags for specific host), add (create tags), update (replace tags), delete (remove all tags). Use for: infrastructure organization, filtering, grouping.

usageA

Query Datadog usage metering data. Actions: summary (overall usage), hosts (infrastructure), logs, custom_metrics, indexed_spans, ingested_spans. Use for: cost management, capacity planning, usage tracking, billing analysis.

authA

Validate Datadog API credentials. Use this to verify that the API key and App key are correctly configured before performing other operations.

schemaA

Get valid enum values for Datadog API fields. Returns palettes, widget types, aggregators, comparators, time spans, and other valid values for constructing dashboards, monitors, metrics queries, and SLOs. Use this to discover valid options before creating or updating Datadog resources.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/TANTIOPE/datadog-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server