kafka_consumer_lag
Monitor consumer group lag and diagnose operational issues, detecting Flink crashes, down consumers, or uneven lag distribution.
Instructions
Monitor consumer group lag with operational diagnosis.
Checks committed offsets vs log-end-offsets for each partition and generates diagnosis based on actual incident patterns:
Flink groups with "no active members" is NORMAL (checkpoint-based offset management)
Flink groups with no active members AND growing lag = likely Flink crash (matches 2026-02-21 incident: MySQL DELETE → tombstone → NPE → 50h outage)
Non-Flink groups with no active members = consuming application is down
Uneven lag distribution = possible hot partition or stuck consumer
Args: group: Consumer group ID. Empty = all groups.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| group | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |