Failure

Failure alerts indicate some kind of deviation in the system’s configuration from its desired state. For example, when replicas are configured in a Redis database, there must be at least one master instance. When there are none, Redis is not operating as configured. These kind of problems are reported as failure alerts.

Here’s an example of an alert rule to report this failure:

# Redis Master Missing
# Note this covers both cluster mode and HA mode, thus we are counting by redis_mode
- alert: RedisMissingMaster
  expr: |-
    count by (job, service, redis_mode, namespace, asserts_env, asserts_site) (
      redis_instance_info{role="master"}
    ) == 0
  for: 1m
  labels:
    asserts_severity: critical
    asserts_entity_type: Service
    asserts_alert_category: failure

Asserts Meta Label	Description
`asserts_env`	Used by the knowledge graph to identify the environment. All discovered entities and observed metrics are automatically scoped to an environment.
`asserts_site`	Used by the knowledge graph to identify the region/site within an environment. For example, you could have a `prod` environment but multiple regions, such as `us-east-1`, `us-west-2`, etc. This label is used to capture the region information. Note that this depends on how environment information is encoded in the metrics. Sometimes, both the environment and the region information may be encoded in a single label value; in such cases, the `asserts_env` label contains that value, and this label may not be present.
`asserts_entity_type`	Used by the knowledge graph to identify the level at which the metric is being observed. The `workload`, `service`, and `job` are special labels that the knowledge graph uses to identify the `Service`. These labels are also used to discover the `Service` entity in the knowledge graph entity model. In this example, while aggregating, these labels are retained, so this metric is observed for the corresponding `Service` entity.
`asserts_severity`	This label is used to indicate the severity of the problem as either `warning` (yellow) or `critical` (red).
`asserts_alert_category`	The knowledge graph categorizes all alerts into the following categories: Saturation, Amend (configuration changes to the system), Anomaly, Failure, and Error. In this example, the label `asserts_alert_category` is used to categorize this alert as a Failure.

Was this page helpful?

Email docs@grafana.com

Help and support

Community

Failure

Was this page helpful?

Still have questions?

Get every update

Failure

Was this page helpful?

Related resources from Grafana Labs