monitors
Manage alert configurations: list, search, create, update, delete, mute, unmute, and view top alerts. Also check monitor history and preview message templates.
Instructions
Manage Datadog monitors. Actions: list, get, search, create, update, delete, mute, unmute, top, history, preview, test_notification. Filters: name, tags, groupStates (alert/warn/ok/no data). get/create/update return the full options object so callers can safely read-then-patch.
create/update accept a config object validated against a typed schema covering the documented Datadog Monitor fields:
Top-level: name, type, query, message, tags, priority (1-5, nullable), restrictedRoles, multi, options.
options.* validated keys grouped by category:
notification: notifyNoData, noDataTimeframe, notifyAudit, notificationPresetName.
evaluation/delay: newHostDelay, newGroupDelay, evaluationDelay, requireFullWindow, onMissingData.
renotification: renotifyInterval (nullable), renotifyOccurrences, renotifyStatuses, escalationMessage.
lifecycle: timeoutH (nullable), includeTags, locked, silenced (record of timestamps/null), groupRetentionDuration.
thresholds: thresholds (critical/warning/ok/criticalRecovery/warningRecovery/unknown), thresholdWindows.
scheduling: schedulingOptions. Unknown keys (top-level or under options) are forwarded to Datadog as-is and surfaced via an optional warnings array on the response, so the schema does not lag the API. snake_case aliases are accepted on input and normalized to camelCase before validation. Validation errors short-circuit before any HTTP call and surface as 'EINVALID_MONITOR_CONFIG: : '. Reference: https://docs.datadoghq.com/api/latest/monitors/
top: Ranked monitors by alert frequency with real monitor names and context breakdown.
Returns: {rank, monitor_id, name (with {{template.vars}}), message (template), total_count, by_context}
Perfect for weekly/daily alert reports
Gets real monitor names from monitors API (not event titles)
WARNING: total_count is the raw alert-event count and INCLUDES renotifies/re-evaluations. For monitors stuck in Alert state, Datadog emits a renotify event every renotify_interval minutes, which inflates this count well beyond the number of real fires. When the question is "how many times did this monitor actually fire", use action=history instead.
history: Count and list real state transitions for one monitor over a time window.
Inputs: id (required, monitor ID), from/to (optional time range), transitionType (optional filter, defaults to ["alert","alert recovery"]), group (optional multi-alert group filter).
Returns: {transitions: [{timestamp, monitorId, monitorName, group, fromState, toState, transitionType, eventId}], count, meta}
count = transitions.length — the number of REAL state changes (fires + recoveries by default), NOT the renotify-inflated count returned by action=top or events action=search.
Backed by Datadog v2 events search with a hardcoded source:alert + @monitor.transition. transition_type filter that excludes renotifies by default. To include renotifies, pass transitionType including "renotify".
preview: Render a Datadog monitor message template against a context (read-only safe).
Inputs: either inline 'message' OR 'monitor_id' (or existing 'id'); plus optional 'context' { variables, conditionals }.
Supported syntax: {{variable.name}} substitution and conditional blocks {{#name}}...{{/name}} / {{^name}}...{{/name}} where name is one of: is_alert, is_warning, is_no_data, is_recovery, is_alert_to_warning, is_warning_to_alert.
Missing variables render as {{undefined:name}} markers and are reported in 'variablesMissing'.
Loops ({{#each ...}}) and partials ({{> ...}}) return EUNSUPPORTED_TEMPLATE_SYNTAX.
Allowed under --read-only (no mutation; at most a getMonitor load).
test_notification: KNOWN LIMITATION — always returns ENOT_SUPPORTED.
Datadog's public REST API exposes no monitor test-notification endpoint at v1 or v2 (audited against the official OpenAPI specs). The v1 SDK has no notifyMonitor / testMonitor method.
Allowed under --read-only because no Datadog HTTP call is attempted.
If Datadog publishes such an endpoint in future, this action will be reimplemented to invoke it.
Workaround: use the 'Test Notifications' button in the Datadog monitor UI.
For generic event grouping (deployments, configs), use events tool instead. Note that the events tool's action=search with source:alert ALSO includes renotifies; use its transitionType filter (or this action=history) for fires-only counts.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Action to perform | |
| id | No | Monitor ID (required for get/update/delete/mute/unmute) | |
| query | No | Search query (for search action) | |
| name | No | Filter by name (for list action) | |
| tags | No | Filter by tags | |
| groupStates | No | Filter multi-alert monitors by group states (e.g., alert by host). Does NOT filter by overall monitor status. Values: alert, warn, no data, ok | |
| limit | No | Maximum number of monitors to return (default: 50) | |
| config | No | Monitor configuration (for create/update) | |
| message | No | Mute message (for mute action) OR inline template source for the preview action. For preview, supply either this inline string or `monitor_id` (or the existing `id` field) so the action can load the monitor message via getMonitor. | |
| end | No | Mute end timestamp (for mute action) | |
| monitor_id | No | Numeric monitor ID used by the preview action when no inline `message` is supplied. Equivalent to passing the existing `id` field as a numeric string. | |
| context | No | Substitution context for monitors.preview (variables + conditionals). | |
| from | No | Start time (ISO 8601, relative like "1h", or Unix timestamp) | |
| to | No | End time (ISO 8601, relative like "1h", or Unix timestamp) | |
| contextTags | No | Tag prefixes for context breakdown in top action (default: queue, service, ingress, pod_name, kube_namespace, kube_container_name) | |
| maxEvents | No | Maximum events to fetch for top action (default: 5000, max: 5000) | |
| transitionType | No | For history action: filter by monitor state transition types. Default: ["alert","alert recovery"] (real fires + recoveries, excludes renotifies). Pass ["alert"] for fires only, or include "renotify" for full chronological audit. | |
| group | No | For history action: filter transitions to a specific multi-alert monitor group (e.g., "pod_name:foo,kube_namespace:bar"). Optional; omit for all groups. | |
| dry_run | No | When create + dry_run=true, validate the monitor body via POST /api/v1/monitor/validate without creating it. Allowed under --read-only because no monitor is created. Returns { valid, dryRun, monitor }. 400 responses surface verbatim like a failed create. | |
| timezone | No | Optional IANA timezone (e.g. "UTC", "Europe/Paris"). When supplied on get/list, the response adds sibling createdLocal/modifiedLocal ISO 8601 strings next to created/modified. Omit for byte-identical legacy shape. Invalid zones return EINVALID_TIMEZONE. |