Skip to content

Alert rules are constantly updated - causing alert state reset #1899

Open
@lupa95

Description

@lupa95

Describe the bug
I am using grafana-operator to deploy Grafana alert rules.

The alert rule groups are synced to the Grafana instances every 10 minutes (default). Sometimes, the alert rules are updated in a manner, that their fingerprint changes. This causes the alert rule state to reset, which again causes a lot of noise by triggering new notifications after the pending period for already firing alerts.

I tried to debug the issue by checking if there is constant drift between the alert rules in the CRs of the Grafana operator and the Grafana instances itself. But i can not find anything wrong. I did also check the alert_rule_version table in the Grafana DB. The only columns that are different between the alert rule versions are:

  • id (makes sense)
  • parent_version (makes sense)
  • version (makes sense)
  • created (makes sense)
  • rule_group_idx (not sure what that is, some ID for the whole rule group that changes?)

How can i debug this further? Any ideas to why the alert rules are updated constantly?

Version
Grafana operator: v5.17.0
Grafana: v11.5.2

To Reproduce
Steps to reproduce the behavior:

  1. Create alert rule groups with multiple alert rules with grafana operator
  2. Check alert rule history in Grafana to see them being updated constantly

Expected behavior
Alert rules only update, when they actually change.

Additional context
Output of the Grafana alert_rule_version table for one specific alert rule, that has this problem:

   id   | rule_org_id |          rule_uid            |          rule_namespace_uid          | rule_group   | parent_version | restored_from | version |       created       |            title             | condition |                                                data| interval_seconds | no_data_state | exec_err_state |     for      |                                          annotations                                                                                                                                  |                                  labels                   | rule_group_idx | is_paused |     notification_settings | record |                                                    metadata

 664168 |           1 | example-dev-k8s-mem-pressure | 477b7cd3-2fee-4393-a40f-be43fc80ef4c | example-dev  |             21 |             0 |      22 | 2025-03-14 12:54:15 | KubernetesNodeMemoryPressure | B         | [{"refId":"A","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"prometheus-example-dev","model":{"datasource":{"type":"prometheus","uid":"prometheus-example-dev"},"editorMode":"code","expr":"kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"}","instant":true,"intervalMs":1000,"legendFormat":"__auto","maxDataPoints":43200,"range":false,"refId":"A"}},{"refId":"B","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[0],"type":"gt"},"operator":{"type":"and"},"query":{"params":["C"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"A","intervalMs":1000,"maxDataPoints":43200,"refId":"B","type":"threshold"}}] |              600 | NoData        | Error          | 600000000000 | {"description":"Node {{ $labels.node }} has MemoryPressure condition\nVALUE = {{ $value }}\nLABELS = {{ $labels }}","summary":"Kubernetes memory pressure (node {{ $labels.node }})"} | {"project":"example","severity":"critical","stage":"dev"} |              1 | f         | [{"receiver":"opsgenie"}] |        | {"editor_settings":{"simplified_query_and_expressions_section":false,"simplified_notifications_section":false}}
 664230 |           1 | example-dev-k8s-mem-pressure | 477b7cd3-2fee-4393-a40f-be43fc80ef4c | example-dev  |             22 |             0 |      23 | 2025-03-14 13:04:15 | KubernetesNodeMemoryPressure | B         | [{"refId":"A","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"prometheus-example-dev","model":{"datasource":{"type":"prometheus","uid":"prometheus-example-dev"},"editorMode":"code","expr":"kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"}","instant":true,"intervalMs":1000,"legendFormat":"__auto","maxDataPoints":43200,"range":false,"refId":"A"}},{"refId":"B","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[0],"type":"gt"},"operator":{"type":"and"},"query":{"params":["C"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"A","intervalMs":1000,"maxDataPoints":43200,"refId":"B","type":"threshold"}}] |              600 | NoData        | Error          | 600000000000 | {"description":"Node {{ $labels.node }} has MemoryPressure condition\nVALUE = {{ $value }}\nLABELS = {{ $labels }}","summary":"Kubernetes memory pressure (node {{ $labels.node }})"} | {"project":"example","severity":"critical","stage":"dev"} |              0 | f         | [{"receiver":"opsgenie"}] |        | {"editor_settings":{"simplified_query_and_expressions_section":false,"simplified_notifications_section":false}}
 664238 |           1 | example-dev-k8s-mem-pressure | 477b7cd3-2fee-4393-a40f-be43fc80ef4c | example-dev  |             23 |             0 |      24 | 2025-03-14 13:04:16 | KubernetesNodeMemoryPressure | B         | [{"refId":"A","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"prometheus-example-dev","model":{"datasource":{"type":"prometheus","uid":"prometheus-example-dev"},"editorMode":"code","expr":"kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"}","instant":true,"intervalMs":1000,"legendFormat":"__auto","maxDataPoints":43200,"range":false,"refId":"A"}},{"refId":"B","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[0],"type":"gt"},"operator":{"type":"and"},"query":{"params":["C"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"A","intervalMs":1000,"maxDataPoints":43200,"refId":"B","type":"threshold"}}] |              600 | NoData        | Error          | 600000000000 | {"description":"Node {{ $labels.node }} has MemoryPressure condition\nVALUE = {{ $value }}\nLABELS = {{ $labels }}","summary":"Kubernetes memory pressure (node {{ $labels.node }})"} | {"project":"example","severity":"critical","stage":"dev"} |              1 | f         | [{"receiver":"opsgenie"}] |        | {"editor_settings":{"simplified_query_and_expressions_section":false,"simplified_notifications_section":false}}
 664300 |           1 | example-dev-k8s-mem-pressure | 477b7cd3-2fee-4393-a40f-be43fc80ef4c | example-dev  |             24 |             0 |      25 | 2025-03-14 13:14:16 | KubernetesNodeMemoryPressure | B         | [{"refId":"A","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"prometheus-example-dev","model":{"datasource":{"type":"prometheus","uid":"prometheus-example-dev"},"editorMode":"code","expr":"kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"}","instant":true,"intervalMs":1000,"legendFormat":"__auto","maxDataPoints":43200,"range":false,"refId":"A"}},{"refId":"B","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[0],"type":"gt"},"operator":{"type":"and"},"query":{"params":["C"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"A","intervalMs":1000,"maxDataPoints":43200,"refId":"B","type":"threshold"}}] |              600 | NoData        | Error          | 600000000000 | {"description":"Node {{ $labels.node }} has MemoryPressure condition\nVALUE = {{ $value }}\nLABELS = {{ $labels }}","summary":"Kubernetes memory pressure (node {{ $labels.node }})"} | {"project":"example","severity":"critical","stage":"dev"} |              0 | f         | [{"receiver":"opsgenie"}] |        | {"editor_settings":{"simplified_query_and_expressions_section":false,"simplified_notifications_section":false}}
 664308 |           1 | example-dev-k8s-mem-pressure | 477b7cd3-2fee-4393-a40f-be43fc80ef4c | example-dev  |             25 |             0 |      26 | 2025-03-14 13:14:17 | KubernetesNodeMemoryPressure | B         | [{"refId":"A","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"prometheus-example-dev","model":{"datasource":{"type":"prometheus","uid":"prometheus-example-dev"},"editorMode":"code","expr":"kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"}","instant":true,"intervalMs":1000,"legendFormat":"__auto","maxDataPoints":43200,"range":false,"refId":"A"}},{"refId":"B","queryType":"","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[0],"type":"gt"},"operator":{"type":"and"},"query":{"params":["C"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"A","intervalMs":1000,"maxDataPoints":43200,"refId":"B","type":"threshold"}}] |              600 | NoData        | Error          | 600000000000 | {"description":"Node {{ $labels.node }} has MemoryPressure condition\nVALUE = {{ $value }}\nLABELS = {{ $labels }}","summary":"Kubernetes memory pressure (node {{ $labels.node }})"} | {"project":"example","severity":"critical","stage":"dev"} |              1 | f         | [{"receiver":"opsgenie"}] |        | {"editor_settings":{"simplified_query_and_expressions_section":false,"simplified_notifications_section":false}}
(5 rows)

Metadata

Metadata

Assignees

No one assigned

    Labels

    grafana-upstreamIssues non-operator related, should be logged in the grafana product repotriage/needs-informationIndicates an issue needs more information in order to work on it.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions