Grafana Mimir components on Grafana Labs

Grafana Mimir compactor

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir compactor

The compactor increases query performance and reduces long-term storage usage by combining blocks.

The compactor is the component responsible for:

Compacting multiple blocks of a given tenant into a single, optimized larger block. This deduplicates chunks and reduces the size of the index, resulting in reduced storage costs. Querying fewer blocks is faster, so it also increases query speed.
Keeping the per-tenant bucket index updated. The bucket index is used by queriers, store-gateways, and rulers to discover both new blocks and deleted blocks in the storage.
Deleting blocks that are no longer within a configurable retention period.

The compactor is stateless.

How compaction works

Compaction occurs on a per-tenant basis.

The compactor runs at regular, configurable intervals.

Vertical compaction merges all the blocks of a tenant uploaded by ingesters for the same time range (2 hours ranges by default) into a single block. It also deduplicates samples that were originally written to N blocks as a result of replication. Vertical compaction reduces the number of blocks for a single time range from the quantity of ingesters down to one block per tenant.

Horizontal compaction triggers after a vertical compaction. It compacts several blocks with adjacent range periods into a single larger block. The total size of the associated block chunks does not change after horizontal compaction. The horizontal compaction may significantly reduce the size of the index and the index-header kept in memory by store-gateways.

Scaling

Compaction can be tuned for clusters with large tenants. Configuration specifies both vertical and horizontal scaling of how the compactor runs as it compacts on a per-tenant basis.

Vertical scaling
The setting -compactor.compaction-concurrency configures the max number of concurrent compactions running in a single compactor instance. Each compaction uses one CPU core.
Horizontal scaling
By default, tenant blocks can be compacted by any Grafana Mimir compactor. When you enable compactor shuffle sharding by setting -compactor.compactor-tenant-shard-size (or its respective YAML configuration option) to a value higher than 0 and lower than the number of available compactors, only the specified number of compactors are eligible to compact blocks for a given tenant.

Compaction algorithm

Mimir uses a sophisticated compaction algorithm called split-and-merge.

By design, the split-and-merge algorithm overcomes time series database (TSDB) index limitations, and it avoids situations in which compacted blocks grow indefinitely for a very large tenant at any compaction stage.

This compaction strategy is a two-stage process: split and merge. The default configuration disables the split stage.

To split, the first level of compaction, for example 2h, the compactor divides all source blocks into N (-compactor.split-groups) groups. For each group, the compactor compacts the blocks, but instead of producing a single result block, it outputs M (-compactor.split-and-merge-shards) blocks, known as split blocks. Each split block contains only a subset of the series belonging to a given shard out of M shards. At the end of the split stage, the compactor produces N * M blocks with a reference to their respective shard in the block’s meta.json file.

The compactor merges the split blocks for each shard. This compacts all N split blocks of a given shard. The merge reduces the number of blocks from N * M to M. For a given compaction time range, there will be a compacted block for each of the M shards.

The merge then runs on other configured compaction time ranges, for example 12h and 24h. It compacts blocks belonging to the same shard.

This strategy is suitable for clusters with large tenants. The number of shards M is configurable on a per-tenant basis using -compactor.split-and-merge-shards, and it can be adjusted based on the number of series of each tenant. The more a tenant grows in terms of series, the more you can grow the configured number of shards. Doing so improves compaction parallelization and keeps each per-shard compacted block size under control. We recommend 1 shard per every 8 million active series in a tenant. For example, for a tenant with 100 million active series, use approximately 12 shards. Use an even number for the count.

The number of split groups, N, can also be adjusted per tenant using the -compactor.split-groups option. Increasing this value produces more compaction jobs with fewer blocks during the split stage. This allows multiple compactors to work on these jobs, and finish the splitting stage faster. However, increasing this value also generates more intermediate blocks during the split stage, which will only be reduced later in the merge stage.

If the configuration of -compactor.split-and-merge-shards changes during compaction, the change will affect only the compaction of blocks which have not yet been split. Already split blocks will use the original configuration when merged. The original configuration is stored in the meta.json of each split block.

Splitting and merging can be horizontally scaled. Non-conflicting and non-overlapping jobs will be executed in parallel.

Compactor sharding

The compactor shards compaction jobs, either from a single tenant or multiple tenants. The compaction of a single tenant can be split and processed by multiple compactor instances.

Whenever the pool of compactors grows or shrinks, tenants and jobs are resharded across the available compactor instances without any manual intervention.

Compactor sharding uses a hash ring. At startup, a compactor generates random tokens and registers itself to the compactor hash ring. While running, it periodically scans the storage bucket at every interval defined by -compactor.compaction-interval, to discover the list of tenants in storage and to compact blocks for each tenant whose hash matches the token ranges assigned to the instance itself within the hash ring.

To configure the compactors’ hash ring, refer to configuring hash rings.

Waiting for a stable hash ring at startup

A cluster cold start or an increase of two or more compactor instances at the same time may result in each new compactor instance starting at a slightly different time. Then, each compactor runs its first compaction based on a different state of the hash ring. This is not an error condition, but it may be inefficient, because multiple compactor instances may start compacting the same tenant at nearly the same time.

To mitigate the issue, compactors can be configured to wait for a stable hash ring at startup. A ring is considered stable if no instance is added to or removed from the hash ring for at least -compactor.ring.wait-stability-min-duration. The maximum time the compactor will wait is controlled by the flag -compactor.ring.wait-stability-max-duration (or the respective YAML configuration option). Once the compactor has finished waiting, either because the ring stabilized or because the maximum wait time was reached, it will start up normally.

The default value of zero for -compactor.ring.wait-stability-min-duration disables waiting for ring stability.

Compaction jobs order

The compactor allows configuring of the compaction jobs order via the -compactor.compaction-jobs-order flag (or its respective YAML config option). The configured ordering defines which compaction jobs should be executed first. The following values of -compactor.compaction-jobs-order are supported:

smallest-range-oldest-blocks-first (default)

This ordering gives priority to smallest range, oldest blocks first.

For example, with compaction ranges 2h, 12h, 24h, the compactor will compact the 2h ranges first, and among them give priority to the oldest blocks. Once all blocks in the 2h range have been compacted, it moves to the 12h range, and finally to 24h one.

All split jobs are moved to the front of the work queue, because finishing all split jobs in a given time range unblocks the merge jobs.
newest-blocks-first

This ordering gives priority to the most recent time ranges first, regardless of their compaction level.

For example, with compaction ranges 2h, 12h, 24h, the compactor compacts the most recent blocks first (up to the 24h range), and then moves to older blocks. This policy favours the most recent blocks, assuming they are queried the most frequently.

Blocks deletion

Following a successful compaction, the original blocks are deleted from the storage. Block deletion is not immediate; it follows a two step process:

An original block is marked for deletion; this is a soft delete
Once a block has been marked for deletion for longer than the configurable -compactor.deletion-delay, the block is deleted from storage; this is a hard delete

The compactor is responsible for both marking blocks and for hard deletion. Soft deletion is based on a small deletion-mark.json file stored within the block location in the bucket.

The soft delete mechanism gives queriers, rulers, and store-gateways time to discover the new compacted blocks before the original blocks are deleted. If those original blocks were immediately hard deleted, some queries involving the compacted blocks could temporarily fail or return partial results.

Blocks retention

The compactor is responsible for enforcing the storage retention, deleting the blocks that contain samples that are older than the configured retention period from the long-term storage. The storage retention is disabled by default, and no data will be deleted from the long-term storage unless you explicitly configure the retention period.

For more information, refer to Configure metrics storage retention.

Compactor scratch storage volume

Each compactor uses a storage device mounted at -compactor.data-dir to temporarily store:

files downloaded from object storage used as input to compaction
block files produced by the compactor to be uploaded to object storage

Note
While the compactor is a stateless service, it’s recommended that you configure the compactor to store its temporary files somewhere other than the root volume. This avoids I/O contention with other workloads running on the system. Common volume types include a local SSD or a cloud provider’s block storage service.

In Kubernetes, run compactors as a StatefulSet so that each Pod has a dedicated volume.

Compactor disk utilization

Large tenants may require a lot of disk space. Assuming max_compaction_range_blocks_size is the total block size for the largest tenant during the longest -compactor.block-ranges period, the expression that estimates the minimum disk space required is:

compactor.compaction-concurrency * max_compaction_range_blocks_size * 2

Alternatively, assuming the largest -compactor.block-ranges is 24h (the default), you could estimate needing 150GB of disk space for every 10M active series owned by the largest tenant. For example, if your largest tenant has 30M active series and -compactor.compaction-concurrency=1, we would recommend having a disk with at least 450GB available.

Compactor configuration

Refer to the compactor block section and the limits block section for details of compaction-related configuration.

The alertmanager and ruler components can also use object storage to store their configurations and rules uploaded by users. In that case a separate bucket should be created to store alertmanager configurations and rules: using the same bucket between ruler/alertmanager and blocks will cause issues with the compactor.

Grafana Mimir distributor

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir distributor

The distributor is a stateless component that acts as the entry point for the Grafana Mimir write path. It receives incoming write requests containing time series data, validates the data for correctness, enforces tenant-specific limits, and then ingests the data into Mimir.

To scale beyond the limits of a single node, distributors shard incoming series across a pool of partitions or ingesters. The details of how sharding, replication, and ingestion work differ between the ingest storage and classic architectures. For more information, refer to:

Supported protocols

The distributor supports the following metrics ingestion protocols:

Prometheus remote write v1
Prometheus remote write v2
OpenTelemetry Protocol (OTLP)
Influx

Validation

The distributor cleans and validates incoming data before ingesting it into Mimir.

A single request can include both valid and invalid metrics, samples, metadata, or exemplars. The distributor ingests only the valid data and rejects any invalid entries.

If invalid data is detected:

When using Prometheus remote write or Influx, the distributor returns an HTTP 400 status code.
When using OTLP, it returns an HTTP 200 status code, following the OTLP specification for partial ingestion.

In both cases, the response body contains details about the first invalid item encountered. The returned error is typically logged by the agent sending metrics to Mimir, such as Prometheus, Grafana Alloy, or the OpenTelemetry Collector.

The distributor data cleanup includes the following transformation:

The metric metadata help is truncated to fit in the length defined via the -validation.max-metadata-length flag.

The distributor validation includes the following checks:

The metric metadata and labels conform to the Prometheus exposition format.
The metric metadata (name and unit) are not longer than what is defined via the -validation.max-metadata-length flag.
The number of labels of each metric is not higher than -validation.max-label-names-per-series.
Each metric label name is not longer than -validation.max-length-label-name.
Each metric label value is not longer than -validation.max-length-label-value.
Each sample timestamp is not newer than -validation.create-grace-period.
Each exemplar has a timestamp and at least one non-empty label name and value pair.
Each exemplar has no more than 128 labels.

Note
For each tenant, you can override the validation checks by modifying the overrides section of the runtime configuration.

Rate limiting

The distributor includes two different types of rate limiters that apply to each tenant.

Request rate
The maximum number of requests per second that can be served across Grafana Mimir cluster for each tenant.
Ingestion rate
The maximum samples per second that can be ingested across Grafana Mimir cluster for each tenant.

If any of these rates is exceeded, the distributor drops the request and returns an HTTP 429 response code.

Internally, these limits are implemented using a per-distributor local rate limiter. The local rate limiter for each distributor is configured with a limit of limit / N, where N is the number of healthy distributor replicas. The distributor automatically adjusts the request and ingestion rate limits if the number of distributor replicas change.

This design uses a per-distributor local rate limiter and requires that write requests be evenly distributed across the pool of distributors.

Configure rate limits

Use the following flags to configure the rate limits:

-distributor.request-rate-limit: Request rate limit, which is per tenant, and which is in requests per second
-distributor.request-burst-size: Request burst size (in number of requests) allowed, which is per tenant
-distributor.ingestion-rate-limit: Ingestion rate limit, which is per tenant, and which is in samples per second
-distributor.ingestion-burst-size: Ingestion burst size (in number of samples) allowed, which is per tenant

Note
You can override rate limiting on a per-tenant basis by setting request_rate, ingestion_rate, request_burst_size and ingestion_burst_size in the overrides section of the runtime configuration.

Note
By default, Prometheus remote write doesn’t retry requests on 429 HTTP response status code. To modify this behavior, use retry_on_http_429: true in the Prometheus remote_write configuration.

Distributor ring and rate limiting

Since distributor rate limits are implemented locally, each distributor must know how many healthy distributors are running in the Mimir cluster. To achieve this, each distributor joins a hash ring used for service discovery, which tracks the number of healthy distributor instances.

By default, the distributors’ ring uses memberlist as its backend. If you want to configure a different backend, for example, consul or etcd, you can use the following CLI flags (and their respective YAML configuration options) to configure the distributors ring KV store:

-distributor.ring.store: The backend storage to use.
-distributor.ring.consul.*: The Consul client configuration. Set this flag only if consul is the configured backend storage.
-distributor.ring.etcd.*: The etcd client configuration. Set this flag only if etcd is the configured backend storage.

For more information, refer to Configure hash rings.

Load balancing across distributors

As a best practice, uniformly distribute write requests across all distributor instances by placing a load balancer in front of them. The preferred approach is a Layer 7 load balancer, which balances individual HTTP requests across distributors.

Note
If you run Grafana Mimir in a Kubernetes cluster and use a Kubernetes Service as the ingress for distributors, a Kubernetes Service balances TCP connections across endpoints but doesn’t balance HTTP requests within a single TCP connection.

If you enable HTTP persistent connections, also known as HTTP keep-alive, Prometheus reuses the same TCP connection for each remote-write HTTP request of a remote-write shard. This can cause distributors to receive an uneven distribution of remote-write HTTP requests.

To improve the balancing of requests between distributors, consider increasing min_shards in the Prometheus remote_write configuration block.

High-availability tracker

You can configure remote write agents, such as Prometheus or Grafana Alloy, in pairs, which means that metrics continue to be scraped and written to Grafana Mimir even when one of the remote write agents is down for maintenance or is unavailable due to a failure. We refer to this configuration as high-availability (HA) pairs.

The distributor includes an HA tracker. When the HA tracker is enabled, the distributor deduplicates incoming series from Prometheus HA pairs. This enables you to have multiple Prometheus HA replicas of the same server writing identical series to Mimir, which the distributor then deduplicates.

For more information about HA deduplication and how to configure it, refer to Configure high-availability deduplication.

Grafana Mimir ingester

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir ingester

The ingester is a stateful component that processes the most recently ingested samples and makes them available for querying. Queriers read recent data from ingesters and older data from long-term object storage via store-gateways.

The ingester stores data both in memory and on disk for a configurable retention period. In-memory series are periodically compacted into an on-disk format called a TSDB block and then uploaded to object storage. This process happens every two hours by default. When the local retention period expires and the data has been compacted and successfully uploaded, the ingester removes the local copy.

At that point, queriers retrieve the data from object storage through the store-gateways.

To recover its in-memory state after a crash or restart, the ingester maintains both a write-ahead log (WAL) and a write-behind log (WBL). The WBL is used only when out-of-order sample ingestion is enabled.

Series ingestion and querying

How ingesters receive series and how queriers read from them differs between the ingest storage and classic architectures.

Series ingestion and querying in ingest storage architecture

Note
This guidance applies to ingest storage architecture. For more information about the supported architectures in Grafana Mimir, refer to Grafana Mimir architecture.

In ingest storage architecture, distributors shard incoming series across Kafka partitions, writing each series to a single partition. A write request is considered successful once all series in the request are committed to Kafka.

Ingesters are not involved in the write path, so their availability does not impact writes. Effectively, ingesters act as pure read-path components.

Each ingester owns a single Kafka partition and continuously consumes series data from that partition, making it available for querying. As a result, data from a completed write request is not immediately queryable, but becomes available shortly afterward once the ingester has processed it. In steady state, this latency is typically below one second.

Kafka ensures high availability and durability for the most recent data. When an ingester restarts or crashes, it quickly catches up with the backlog of series written to Kafka during its downtime. This ensures that ingesters have gap-free series data once they are caught up.

This behavior is key because it allows a read quorum of one per partition. Queriers reading the most recent data from ingesters need to query only a single ingester for the relevant partition to guarantee consistency. Even if two random ingesters are unavailable, the read path remains healthy as long as there is at least one healthy ingester for each partition.

An ingester owns exactly one Kafka partition, but a partition can be assigned to multiple ingesters for high availability. As a best practice, assign two ingesters per partition. This corresponds to a replication factor of two.

For more information about sharding, refer to Series sharding in ingest storage architecture.

Series ingestion and querying in classic architecture

Note
This guidance applies to classic architecture. For more information about the supported architectures in Grafana Mimir, refer to Grafana Mimir architecture.

In classic architecture, distributors shard and replicate incoming series across ingesters, writing each series to RF different ingesters, where RF is the replication factor, three, by default. A write request is considered successful once all series are written to a quorum of ingesters, calculated based on the configured RF. For example, with an RF of three, a write request succeeds if each series is successfully written to at least two ingesters.

In this architecture, ingesters are involved in both the write and read paths. If two or more ingesters become unavailable, the write and read quorum may be lost, potentially causing a full outage.

When an ingester restarts or crashes, it cannot catch up with series that were written to its shard during downtime. This can result in gaps in the ingester’s data. Consequently, queriers also require a read quorum to guarantee consistency. A query succeeds if each series can be read from at least a quorum of ingesters that own that series. For example, with an RF of three, a read request succeeds if each series is successfully read from at least two ingesters.

For more information about sharding, refer to Series sharding in classic architecture.

Differences in read path availability between ingest storage and classic architecture

Because the read quorum behavior differs between the two architectures, ingest storage architecture is significantly more resilient to ingester failures.

The following chart models the probability of a read-path outage given a variable number of random unhealthy ingesters. We assumed 100 ingesters per zone, with three zones for the classic architecture (RF = 3) and two zones for the ingest storage architecture (RF = 2).

Even with a lower replication factor, and therefore, fewer ingesters, ingest storage architecture is more resilient to random failures:

In classic architecture, an outage occurs as soon as two ingesters in two different zones are unhealthy.
In the ingest storage architecture, an outage only occurs if two ingesters owning the same partition are simultaneously unhealthy.

Zone-aware replication

Zone-aware replication ensures that the ingester replicas for a given time series are distributed across different zones. Zones can represent logical or physical failure domains, for example, different availability zones in the cloud. Dividing replicas across multiple zones helps prevent data loss and service interruptions during a zone-wide outage.

To set up multi-zone replication, refer to Configure Grafana Mimir zone-aware replication.

Shuffle sharding

Shuffle sharding is a technique used by Grafana Mimir to minimize the impact tenants have on each other. It works by isolating each tenant’s data across different partitions or ingesters, reducing overlap between tenants.

For more information on shuffle sharding, refer to Configuring shuffle sharding.

Ingesters hash ring

Ingesters join a dedicated hash ring. In classic architecture, the ring is used for sharding and service discovery. In ingest storage architecture, it’s used for service discovery only.

Regardless of the architecture, each ingester in the ring has a state that changes throughout its lifecycle:

State	Description
`PENDING`	The ingester has started but its bootstrap phase hasn’t begun. In this state, it does not ingest series or serve read requests.
`JOINING`	The ingester is bootstrapping and preparing to serve read requests. In classic architecture, it replays the WAL and WBL. In ingest storage architecture, it replays the WAL and WBL, then catches up with the backlog of series accumulated in its Kafka partition since the previous shutdown.
`ACTIVE`	The ingester is fully operational. It ingests series and serves read requests.
`LEAVING`	The ingester is shutting down. It stops ingesting series and serving read requests.
`UNHEALTHY`	A meta state derived from heartbeat monitoring. An ingester is considered `UNHEALTHY` if it fails to update its heartbeat timestamp within a configurable timeout. Other components avoid contacting ingesters in this state. In particular, queriers skip `UNHEALTHY` ingesters when reading. In classic architecture, distributors avoid writing to them.

To configure the ingesters hash ring, refer to Configure Grafana Mimir hash rings.

Read-only mode

Note
This feature is used exclusively in classic architecture. For more information about the supported architectures in Grafana Mimir, refer to Grafana Mimir architecture.

Ingesters have an additional property in the ring called “read-only” mode. This information is stored separately from the ingester’s instance state, so an ingester can be in any supported state, for example, ACTIVE or LEAVING, while also being in read-only mode.

When an ingester is in read-only mode, it stops receiving write requests from distributors but continues to serve read requests. In the write path, read-only ingesters are excluded from the shard computation used for distributing tenant writes.

Read-only mode is particularly useful during downscaling or as preparation for an ingester’s shutdown. Ingesters can be placed in read-only mode using the Prepare instance ring downscale API endpoint.

Write-ahead and write-behind logs

The ingester uses a write-ahead log (WAL) and a write-behind log (WBL) to recover its in-memory state after a restart or crash. These logs protect against data loss caused by process restarts or failures but do not protect against disk failures. They also do not improve availability. Replication is still required for high availability.

Write-ahead log

The write-ahead log (WAL) records all incoming series to persistent disk until those series are uploaded to long-term storage. If an ingester fails, it replays the WAL on restart to restore the in-memory series and samples.

Write-behind log

The write-behind log (WBL) functions similarly to the WAL but records only out-of-order samples to persistent storage until they are uploaded to long-term storage. The WBL is used only when out-of-order ingestion is enabled via the CLI flag -ingester.out-of-order-time-window.

Grafana Mimir uses a separate log for out-of-order samples because the WAL is optimized for in-order appends. When an ingester receives a sample, it first tries to append it to in-memory data structures. If the sample is out-of-order and out-of-order ingestion is enabled, the ingester still appends it in memory and writes it to the dedicated write-behind log.

For more information about out-of-order samples ingestion, refer to Configure out of order samples ingestion.

Grafana Mimir querier

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir querier

The querier is a stateless component that evaluates PromQL expressions by fetching time series and labels on the read path.

The querier uses the store-gateway component to query the long-term storage and the ingester component to query recently written data.

How it works

To find the correct blocks to look up at query time, queriers lazily download the bucket index when they receive the first query for a given tenant. The querier caches the bucket index in memory and periodically keeps it up-to-date.

The bucket index contains a list of blocks and block deletion marks of a tenant. The querier later uses the list of blocks and block deletion marks to locate the set of blocks that need to be queried for the given query.

Anatomy of a query request

When a querier receives a query range request, the request contains the following parameters:

query: the PromQL query expression (for example, rate(node_cpu_seconds_total[1m]))
start: the start time
end: the end time
step: the query resolution (for example, 30 yields one data point every 30 seconds)

For each query, the querier analyzes the start and end time range to compute a list of all known blocks containing at least one sample within the time range. For each list of blocks per query, the querier computes a list of store-gateway instances holding the blocks. The querier sends a request to each matching store-gateway instance to fetch all samples for the series matching the query within the start and end time range.

The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were queried. This list of block IDs might be a subset of the requested blocks, for example, when a recent blocks-resharding event occurs within the last few seconds.

The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried. If the expected blocks have not been queried, the querier retries fetching samples from missing blocks from different store-gateways up to -store-gateway.sharding-ring.replication-factor (defaults to 3) times or maximum 3 times, whichever is lower.

If the consistency check fails after all retry attempts, the query execution fails. Query failure due to the querier not querying all blocks ensures the correctness of query results.

If the query time range overlaps with the -querier.query-ingesters-within duration, the querier also sends the request to ingesters. The request to the ingesters fetches samples that have not yet been uploaded to the long-term storage or are not yet available for querying through the store-gateway.

The configured period for -querier.query-ingesters-within should be greater than both:

-querier.query-store-after
the estimated minimum amount of time for the oldest samples stored in a block uploaded by ingester to be discovered and available for querying. When running Grafana Mimir with the default configuration, the estimated minimum amount of time for the oldest sample in an uploaded block to be available for querying is 3h.

After all samples have been fetched from both the store-gateways and the ingesters, the querier runs the PromQL engine to execute the query and sends back the result to the client.

Connecting to store-gateways

You must configure the queriers with the same -store-gateway.sharding-ring.* flags (or their respective YAML configuration parameters) that you use to configure the store-gateways so that the querier can access the store-gateway hash ring and discover the addresses of the store-gateways.

Connecting to ingesters

You must configure the querier with the same -ingester.ring.* flags (or their respective YAML configuration parameters) that you use to configure the ingesters so that the querier can access the ingester hash ring and discover the addresses of the ingesters.

Caching

The querier supports the following cache:

Metadata cache

Caching is optional, but highly recommended in a production environment.

Metadata cache

Store-gateways and queriers can use Memcached to cache the following bucket metadata:

List of tenants
List of blocks per tenant
Block meta.json existence and content
Block deletion-mark.json existence and content
Tenant bucket-index.json.gz content

Using the metadata cache reduces the number of API calls to long-term storage and stops the number of the API calls that scale linearly with the number of querier and store-gateway replicas.

To enable the metadata cache, set -blocks-storage.bucket-store.metadata-cache.backend.

Note
Currently, Mimir supports caching with the memcached backend.

The Memcached client includes additional configuration available via flags that begin with the prefix -blocks-storage.bucket-store.metadata-cache.memcached.*.

Additional flags for configuring the metadata cache begin with the prefix -blocks-storage.bucket-store.metadata-cache.*. By configuring the TTL to zero or a negative value, caching of given item type is disabled.

Note
You should use the same Memcached backend cluster for both the store-gateways and queriers.

Querier configuration

For details about querier configuration, refer to querier.

Grafana Mimir query-frontend

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir query-frontend

The query-frontend is a stateless component that provides a Prometheus compatible API with a number of features to accelerate the read path. The query-frontend is the primary entry point for the read path of Mimir. The query-scheduler and queriers are required within the cluster to execute the queries.

We recommend that you run at least two query-frontend replicas for high-availability reasons.

The following flow describes how a query moves through a Grafana Mimir cluster:

The query-frontend receives queries, and then either splits and shards them, or serves them from the cache.
The query-frontend enqueues the queries into a query-scheduler.
The query-scheduler stores the queries in an in-memory queue where they wait for a querier to pick them up.
Queriers pick up the queries, and executes them.
The querier sends results back to query-frontend, which then forwards the results to the client.

Functions

This section describes the functions of the query-frontend.

Splitting

The query-frontend can split long-range queries into multiple queries. By default, the split interval is 24 hours. The query-frontend executes these queries in parallel in downstream queriers and combines the results together. Splitting prevents large multi-day or multi-month queries from causing out-of-memory errors in a querier and accelerates query execution.

Caching

The query-frontend caches query results and reuses them on subsequent queries. If the cached results are incomplete, the query-frontend calculates the required partial queries and executes them in parallel on downstream queriers. The query-frontend can optionally align queries with their step parameter to improve the cacheability of the query results. The result cache is backed by Memcached.

Although aligning the step parameter to the query time range increases the performance of Grafana Mimir, it violates the PromQL conformance of Grafana Mimir. If PromQL conformance is not a priority to you, you can enable step alignment by setting -query-frontend.align-queries-with-step=true or the equivalent per-tenant setting align_queries_with_step.

About query sharding

The query-frontend also provides query sharding.

DNS configuration and readiness

When a query-frontend starts up, it is not immediately able to serve queries. The /ready endpoint reports an HTTP 200 status code only after the query-frontend connects to at least one query-scheduler, and is then ready to serve queries. Configure the /ready endpoint as a healthcheck in your load balancer; otherwise, a query-frontend scale-out event might result in failed queries or high latency until the query-frontend connects to a query-scheduler.

Grafana Mimir query-scheduler

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir query-scheduler

The query-scheduler is stateless component that retains a queue of queries to execute, and distributes the workload to available queriers.

The following flow describes how a query moves through a Grafana Mimir cluster:

The query-frontend receives queries, and then either splits and shards them, or serves them from the cache.
The query-frontend enqueues the queries into a query-scheduler.
The query-scheduler stores the queries in an in-memory queue where they wait for a querier to pick them up.
Queriers pick up the queries, and executes them.
The querier sends results back to query-frontend, which then forwards the results to the client.

Benefits of using the query-scheduler

Query-scheduler enables the scaling of query-frontends by moving queuing of requests to a separate component. The query-scheduler prevents queries that only require ingesters from being affected by degradation of store-gateways and vice versa. The query-scheduler ensures tenant fairness using a simple round-robin between all tenants with active queries.

Configuration

Query-frontends and queriers need to discover the addresses of query-scheduler instances. The query-scheduler supports two service discovery mechanisms:

DNS-based service discovery
Ring-based service discovery

DNS-based service discovery

To use the query-scheduler with DNS-based service discovery, configure the query-frontends and queriers to connect to the query-scheduler:

Query-frontend: -query-frontend.scheduler-address
Querier: -querier.scheduler-address

Note
The configured query-scheduler address should be in the host:port format.

If multiple query-schedulers are running, the host should be a DNS name resolving to all query-scheduler instances.

Ring-based service discovery

To use the query-scheduler with ring-based service discovery, configure the query-schedulers to join their hash ring, and the query-frontends and queriers to discover query-scheduler instances via the ring:

Configure the hash ring for the query-scheduler.
Set -query-scheduler.service-discovery-mode=ring (or its respective YAML configuration parameter) to query-scheduler, query-frontend and querier.
Set the -query-scheduler.ring.* flags (or their respective YAML configuration parameters) to query-scheduler, query-frontend and querier.

Migrate from DNS-based to ring-based service discovery

To migrate the query-scheduler from DNS-based service discovery to ring-based service discovery, perform the following steps:

Configure the query-scheduler instances to join a ring:

-query-scheduler.service-discovery-mode=ring

# Configure the query-scheduler ring backend (e.g. "memberlist").
-query-scheduler.ring.store=<backend>

# If the configured <backend> is "memberlist", then ensure memberlist is configured for the query-scheduler.
-memberlist.join=<same as other Mimir components>

# If the configured <backend> is "consul" or "etcd", then set their backend configuration
# for the query-scheduler ring:
# - Consul: -query-scheduler.ring.consul.*
# - Ecd:    -query-scheduler.ring.etcd.*

Wait until the query-scheduler instances have completed rolling out.
Ensure the changes have been successfully applied; open the query-scheduler ring status page and ensure all query-scheduler instances are registered to the ring. At this point, queriers and query-frontend are still discovering query-schedulers via DNS.

Configure query-frontend and querier instances to discover query-schedulers via the ring:

-query-scheduler.service-discovery-mode=ring

# Remove the DNS-based service discovery configuration:
# -query-frontend.scheduler-address

# Configure the query-scheduler ring backend (e.g. "memberlist").
-query-scheduler.ring.store=<backend>

# If the configured <backend> is "memberlist", then ensure memberlist is configured for the query-scheduler.
-memberlist.join=<same as other Mimir components>

# If the configured <backend> is "consul" or "etcd", then set their backend configuration
# for the query-scheduler ring:
# - Consul: -query-scheduler.ring.consul.*
# - Ecd:    -query-scheduler.ring.etcd.*

Note
If you deploy your Mimir cluster with Jsonnet, refer to Migrate query-scheduler from DNS-based to ring-based service discovery.

Operational considerations

For high-availability, run two query-scheduler replicas.

If you’re running a Grafana Mimir cluster with a very high query throughput, you can add more query-scheduler replicas. If you scale the query-scheduler, ensure that the number of replicas you add is less or equal than the configured -querier.max-concurrent.

Grafana Mimir store-gateway

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir store-gateway

The store-gateway component, which is stateful, queries blocks from long-term storage. On the read path, the querier and the ruler use the store-gateway when handling the query, whether the query comes from a user or from when a rule is being evaluated.

Bucket index

To find the right blocks to look up at query time, the store-gateway requires a view of the bucket in long-term storage. The store-gateway keeps the bucket view updated by periodically downloading the bucket index.

To discover each tenant’s blocks and block deletion marks, at startup, store-gateways fetch the bucket index from long-term storage for each tenant that belongs to their shard.

For each discovered block, the store-gateway downloads the index header to the local disk. During this initial bucket-synchronization phase, the store-gateway’s /ready readiness probe endpoint reports a not-ready status.

For more information about the bucket index, refer to bucket index.

Store-gateways periodically re-download the bucket index to obtain an updated view of the long-term storage and discover new blocks uploaded by ingesters and compactors, or deleted by compactors.

It is possible that the compactor might have deleted blocks or marked others for deletion since the store-gateway last checked the block. The store-gateway downloads the index header for new blocks, and offloads (deletes) the local copy of index header for deleted blocks. You can configure the -blocks-storage.bucket-store.sync-interval flag to control the frequency with which the store-gateway checks for changes in the long-term storage.

When a query executes, the store-gateway downloads chunks, but it does not fully download the whole block; the store-gateway downloads only the portions of index and chunks that are required to run a given query. To avoid the store-gateway having to re-download the index header during subsequent restarts, we recommend running the store-gateway with a persistent disk. For example, if you’re running the Grafana Mimir cluster in Kubernetes, you can use a StatefulSet with a PersistentVolumeClaim for the store-gateways.

For more information about the index-header, refer to Binary index-header documentation.

Blocks sharding and replication

The store-gateway uses blocks sharding to horizontally scale blocks in a large cluster.

Blocks are replicated across multiple store-gateway instances based on a replication factor configured via -store-gateway.sharding-ring.replication-factor. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time, such as in the event of a store-gateway failure or while restarting a store-gateway instance (for example, during a rolling update).

Store-gateway instances build a hash ring and shard and replicate blocks across the pool of store-gateway instances registered in the ring.

Store-gateways continuously monitor the ring state. When the ring topology changes, for example, when a new instance is added or removed, or the instance becomes healthy or unhealthy, each store-gateway instance resynchronizes the blocks assigned to its shard. The store-gateway resynchronization process uses the block ID hash that matches the token ranges assigned to the instance within the ring.

The store-gateway loads the index-header of each block that belongs to its store-gateway shard. After the store-gateway loads a block’s index header, the block is ready to be queried by queriers. When the querier queries blocks via a store-gateway, the response contains the list of queried block IDs. If a querier attempts to query a block that the store-gateway has not loaded, the querier retries the query on a different store-gateway up to the -store-gateway.sharding-ring.replication-factor value, which by default is 3. The query fails if the block can’t be successfully queried from any replica.

Note
You must configure the hash ring via the -store-gateway.sharding-ring.* flags or their respective YAML configuration parameters.

Sharding strategy

The store-gateway uses shuffle-sharding to divide the blocks of each tenant across a subset of store-gateway instances.

Note
When shuffle-sharding is in use, only a subset of store-gateway instances load the blocks of a tenant.

This confines blast radius of issues introduced by the tenant’s workload to its shard instances.

The -store-gateway.tenant-shard-size flag (or their respective YAML configuration parameters) determines the default number of store-gateway instances per tenant. The store_gateway_tenant_shard_size in the limits overrides can override the shard size on a per-tenant basis.

The default -store-gateway.tenant-shard-size value is 0, which means that tenant’s blocks are sharded across all store-gateway instances.

For more information about shuffle sharding, refer to configure shuffle sharding.

Auto-forget

Store-gateways include an auto-forget feature that they can use to unregister an instance from another store-gateway’s ring when a store-gateway does not properly shut down. Under normal conditions, when a store-gateway instance shuts down, it automatically unregisters from the ring. However, in the event of a crash or node failure, the instance might not properly unregister, which can leave a spurious entry in the ring.

The auto-forget feature works as follows: when an healthy store-gateway instance identifies an instance in the ring that is unhealthy for longer than 10 times the configured -store-gateway.sharding-ring.heartbeat-timeout value, the healthy instance removes the unhealthy instance from the ring.

The store-gateway auto-forget feature can be disabled by setting -store-gateway.sharding-ring.auto-forget-enabled=false.

Zone-awareness

Store-gateway replication optionally supports zone-awareness. When you enable zone-aware replication and the blocks replication factor is greater than 1, each block is replicated across store-gateway instances located in different availability zones.

To enable zone-aware replication for the store-gateways:

Configure the availability zone for each store-gateway via the -store-gateway.sharding-ring.instance-availability-zone CLI flag or its respective YAML configuration parameter.
Enable blocks zone-aware replication via the -store-gateway.sharding-ring.zone-awareness-enabled CLI flag or its respective YAML configuration parameter. Set this zone-aware replication flag on store-gateways, queriers, and rulers.
To apply the new configuration, roll out store-gateways, queriers, and rulers.

Waiting for stable ring at startup

If a cluster cold starts or scales up to two or more store-gateway instances simultaneously, the store-gateways could start at different times. As a result, the store-gateway runs the initial blocks synchronization based on a different state of the hash ring.

For example, in the event of a cold start, the first store-gateway that joins the ring might load all blocks because the sharding logic runs based on the current state of the ring, which contains one single store-gateway.

To reduce the likelihood of store-gateways starting at different times, you can configure the store-gateway to wait for a stable ring at startup. A ring is considered stable when no instance is added or removed from the ring for the minimum duration specified in the -store-gateway.sharding-ring.wait-stability-min-duration flag. If the ring continues to change after reaching the maximum duration specified in the -store-gateway.sharding-ring.wait-stability-max-duration flag, the store-gateway stops waiting for a stable ring and proceeds starting up.

To enable waiting for the ring to be stable at startup, start the store-gateway with -store-gateway.sharding-ring.wait-stability-min-duration=1m, which is the recommended value for production systems.

Blocks index-header

The index-header is a subset of the block index that the store-gateway downloads from long-term storage and keeps on the local disk. Keeping the index-header on the local disk makes query execution faster.

Index-header lazy loading

By default, a store-gateway downloads the index-headers to disk and doesn’t load them to memory until required. When required by a query, index-headers are loaded and automatically released by the store-gateway after the amount of inactivity time you specify in -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout has passed.

Grafana Mimir provides a configuration flag -blocks-storage.bucket-store.index-header.lazy-loading-enabled=false to disable index-header lazy loading. When disabled, the store-gateway loads all index-headers at startup, which provides faster access to the data in the index-header when querying at the cost of longer startup times. However, in a cluster with a large number of blocks, each store-gateway might have a large amount of loaded index-headers, regardless of how frequently they are used at query time.

When the index-header is loaded, only a portion of it is kept in memory to reduce memory usage. The rest of the index-header is read from disk as required. This requires that store-gateways have memory available to be used by the operating system for caching disk accesses.

Caching

The store-gateway supports the following type of caches:

We recommend that you use caching in a production environment. For more information about configuring the cache, refer to production tips.

Index cache

The store-gateway can use a cache to accelerate series and label lookups from block indexes. The store-gateway supports the following backends:

inmemory
memcached

In-memory index cache

By default, the inmemory index cache is enabled.

Consider the following trade-offs of using the in-memory index cache:

Pros: There is no latency.
Cons: When the replication factor is > 1, then the data that resides in the memory of the store-gateway will be duplicated among different instances. This leads to an increase in overall memory usage and a reduced cache hit ratio.

You can configure the index cache max size using the -blocks-storage.bucket-store.index-cache.inmemory.max-size-bytes flag or its respective YAML configuration parameter.

Memcached index cache

The memcached index cache uses Memcached as the cache backend.

Consider the following trade-offs of using the Memcached index cache:

Pros: You can scale beyond a single node’s memory by creating a Memcached cluster, that is shared by multiple store-gateway instances.
Cons: The system experiences higher latency in the cache round trip compared to the latency experienced when using in-memory cache.

The Memcached client uses a jump hash algorithm to shard cached entries across a cluster of Memcached servers. Because the memcached client uses a jump hash algorithm, ensure that memcached servers are not located behind a load balancer, and configure the address of the memcached servers so that servers are added to or removed from the end of the list whenever a scale up or scale down occurs.

For example, if you’re running Memcached in Kubernetes, you might:

Deploy your Memcached cluster using a StatefulSet.
Create a headless service for Memcached StatefulSet.
Configure the Mimir’s Memcached client address using the dnssrvnoa+ service discovery.

To configure the Memcached backend:

Use -blocks-storage.bucket-store.index-cache.backend=memcached.
Use the -blocks-storage.bucket-store.index-cache.memcached.addresses flag to set the address of the Memcached service.

DNS service discovery resolves the addresses of the Memcached servers.

Chunks cache

The store-gateway can also use a cache to store chunks that are fetched from long-term storage. Chunks contain actual samples, and can be reused if a query hits the same series for the same time range. Chunks can only be cached in Memcached.

To enable chunks cache, set -blocks-storage.bucket-store.chunks-cache.backend=memcached. You can configure the Memcached client via flags that include the prefix -blocks-storage.bucket-store.chunks-cache.memcached.*.

Note
There are additional low-level flags that begin with the prefix -blocks-storage.bucket-store.chunks-cache.* that you can use to configure chunks cache.

Metadata cache

Store-gateways and queriers can use memcached to cache the following bucket metadata:

List of tenants
List of blocks per tenant
Block meta.json existence and content
Block deletion-mark.json existence and content
Tenant bucket-index.json.gz content

Using the metadata cache reduces the number of API calls to long-term storage and eliminates API calls that scale linearly as the number of querier and store-gateway replicas increases.

To enable metadata cache, set -blocks-storage.bucket-store.metadata-cache.backend.

Note
Mimir only supports the memcached backend for the metadata cache.

The Memcached client includes additional configuration available via flags that begin with the prefix -blocks-storage.bucket-store.metadata-cache.memcached.*.

Additional flags for configuring metadata cache begin with the prefix -blocks-storage.bucket-store.metadata-cache.*. By configuring TTL to zero or a negative value, caching of given item type is disabled.

Note
You should use the same Memcached backend cluster for both the store-gateways and queriers.

Store-gateway HTTP endpoints

GET /store-gateway/ring
Displays the status of the store-gateways ring, including the tokens owned by each store-gateway and an option to remove (or forget) instances from the ring.

Store-gateway configuration

For more information about store-gateway configuration, refer to store_gateway.

(Optional) Grafana Mimir Alertmanager

Wed, 03 Jun 2026 09:01:40 +0200

(Optional) Grafana Mimir Alertmanager

The Mimir Alertmanager adds multi-tenancy support and horizontal scalability to the Prometheus Alertmanager. The Mimir Alertmanager is an optional component that accepts alert notifications from the Mimir ruler. The Alertmanager deduplicates and groups alert notifications, and routes them to a notification channel, such as email, PagerDuty, or OpsGenie.

Note
To run Mimir Alertmanager as a part of monolithic deployment, run Mimir with the option -target=all,alertmanager.

Multi-tenancy

Like other Mimir components, multi-tenancy in the Mimir Alertmanager uses the tenant ID header. Each tenant has an isolated alert routing configuration and Alertmanager UI.

Tenant configurations

Each tenant has an Alertmanager configuration that defines notifications receivers and alerting routes. The Mimir Alertmanager uses the same configuration file that the Prometheus Alertmanager uses.

Note
The Mimir Alertmanager exposes the configuration API according to the path set by the -server.path-prefix flag. It doesn’t use the path set by the -http.alertmanager-http-prefix flag.

If you run Mimir with the default configuration, -server.path-prefix, where the default value is /, then only set the hostname for the --address flag of the mimirtool command; don’t set a path-specific address. For example, / is correct, and /alertmanager is incorrect.

You can validate a configuration file using the mimirtool command:

mimirtool alertmanager verify <ALERTMANAGER CONFIGURATION FILE>

The following sample command shows how to upload a tenant’s Alertmanager configuration using mimirtool:

mimirtool alertmanager load <ALERTMANAGER CONFIGURATION FILE>  \
  --address=<ALERTMANAGER URL>
  --id=<TENANT ID>

The following sample command shows how to retrieve a tenant’s Alertmanager configuration using mimirtool:

mimirtool alertmanager get \
  --address=<ALERTMANAGER URL>
  --id=<TENANT ID>

The following sample commands shows how to delete a tenant’s Alertmanager configuration using mimirtool:

mimirtool alertmanager delete \
  --address=<ALERTMANAGER URL>
  --id=<TENANT ID>

After the tenant uploads an Alertmanager configuration, the tenant can access the Alertmanager UI at the /alertmanager endpoint.

Fallback configuration

When a tenant doesn’t have a Alertmanager configuration, the Grafana Mimir Alertmanager uses a fallback configuration. By default, there is always a fallback configuration set. You can overwrite the default fallback configuration via the -alertmanager.configs.fallback command-line flag.

Warning
Without a fallback configuration or a tenant specific configuration, the Alertmanager UI is inaccessible and ruler notifications for that tenant fail.

Tenant limits

The Grafana Mimir Alertmanager has a number of per-tenant limits documented in limits. Each Mimir Alertmanager limit configuration parameter has an alertmanager prefix.

Alertmanager UI

The Mimir Alertmanager exposes the same web UI as the Prometheus Alertmanager at the /alertmanager endpoint.

When running Grafana Mimir with multi-tenancy enabled, the Alertmanager requires that any HTTP request include the tenant ID header. Tenants only see alerts sent to their Alertmanager.

For a complete reference of the tenant ID header and Alertmanager endpoints, refer to HTTP API.

You can configure the HTTP path prefix for the UI and the HTTP API:

-http.alertmanager-http-prefix configures the path prefix for Alertmanager endpoints.
-alertmanager.web.external-url configures the source URLs generated in Alertmanager alerts and from where to fetch web assets.

Note
Unless you are using a reverse proxy in front of the Alertmanager API that rewrites routes, the path prefix set in -alertmanager.web.external-url must match the path prefix set in -http.alertmanager-http-prefix which is /alertmanager by default.

If the path prefixes don’t match, HTTP requests routing might not work as expected.

Using a reverse proxy

When using a reverse proxy, use the following settings when you configure the HTTP path:

Set -http.alertmanager-http-prefix to match the proxy path in your reverse proxy configuration.
Set -alertmanager.web.external-url to the URL served by your reverse proxy.

Templating

The Mimir Alertmanager adds some custom template functions to the default ones of the Prometheus Alertmanager.

Function	Params	Description
`tenantID`	-	Returns ID of the tenant the alert belongs to.
`queryFromGeneratorURL`	`generator_url`	Returns the URL decoded query from `GeneratorURL` of an alert set by a Prometheus. Example: `{{ queryFromGeneratorURL (index .Alerts 0).GeneratorURL }}`
`grafanaExploreURL`	`grafana_URL`,`datasource`,`from`,`to`,`expr`	Returns link to Grafana explore with range query based on the input parameters. Example: `{{ grafanaExploreURL "https://foo.bar" "xyz" "now-12h" "now" (queryFromGeneratorURL (index .Alerts 0).GeneratorURL) }}`

Sharding and replication

The Alertmanager shards and replicates alerts by tenant. Sharding requires that the number of Alertmanager replicas is greater-than or equal-to the replication factor configured by the -alertmanager.sharding-ring.replication-factor flag.

Grafana Mimir Alertmanager replicas use a hash ring that is stored in the KV store to discover their peers. This means that any Mimir Alertmanager replica can respond to any API or UI request for any tenant. If the Mimir Alertmanager replica receiving the HTTP request doesn’t own the tenant to which the request belongs, the request is internally routed to the appropriate replica.

To configure the Alertmanagers’ hash ring, refer to configuring hash rings.

Note
When running with a single tenant, scaling the number of replicas to be greater than the replication factor offers no benefits as the Mimir Alertmanager shards by tenant and not individual alerts.

State

The Mimir Alertmanager stores the alerts state on local disk at the location configured using -alertmanager.storage.path.

Warning
When running the Mimir Alertmanager without replication, ensure persistence of the -alertmanager.storage.path directory to avoid losing alert state.

The Mimir Alertmanager also periodically stores the alert state in the storage backend configured with -alertmanager-storage.backend. When an Alertmanager starts, it attempts to load the alerts state for a given tenant from other Alertmanager replicas. If the load from other Alertmanager replicas fails, the Alertmanager falls back to the state that is periodically stored in the storage backend.

In the event of a cluster outage, this fallback mechanism recovers the backup of the previous state. Because backups are taken periodically, this fallback mechanism does not guarantee that the most recent state is restored.

Ruler configuration

You must configure the ruler with the addresses of Alertmanagers via the -ruler.alertmanager-url flag.

Point the address to Alertmanager’s API. You can configure Alertmanager’s API prefix via the -http.alertmanager-http-prefix flag, which defaults to /alertmanager. For example, if Alertmanager is listening at http://mimir-alertmanager.namespace.svc.cluster.local and it is using the default API prefix, set -ruler.alertmanager-url to http://mimir-alertmanager.namespace.svc.cluster.local/alertmanager.

Enable UTF-8

In effort to support alerts from OpenTelemetry (OTel) data, Prometheus Alertmanager has added support for UTF-8. This is supported as an opt-in feature for the Grafana Mimir Alertmanager in Mimir versions 2.12 and later.

Warning
Enabling and then disabling UTF-8 strict mode can break existing tenant configurations if tenants added UTF-8 characters to their Alertmanager configuration while it was enabled. Once enabled, disable UTF-8 strict mode with caution.

For new Mimir installations, enable support for UTF-8 before creating any tenant configurations. You can do this by changing utf8-strict-mode-enabled to true.

For existing Mimir installations, there are a number of breaking changes that might affect existing tenant configurations. Follow these instructions to ensure all existing tenant configurations are compatible with UTF-8 before enabling it.

What are the breaking changes?

In order to support UTF-8, Alertmanager has added a new parser for label matchers (often abbreviated as matchers), which has a number of breaking changes.

Note
If you are unfamiliar with what matchers are or how they are used in a tenant configuration, you can find more information about them in the Prometheus Alertmanager documentation.

Grafana Mimir provides a number of tools to help you identify whether any existing tenant configurations are affected by these breaking changes, and to migrate any affected tenant configurations in a way that is backwards-compatible, doesn’t change the behavior of existing matchers, and works even in Mimir installations that do not have UTF-8 enabled.

Identify affected tenant configurations

To identify affected tenant configurations, take the following steps:

Make sure Mimir is running version 2.12 or later.
Enable utf8-migration-logging-enabled and set log_level to debug. You must restart Mimir for the changes to take effect.
To identify any tenant configurations that are incompatible with UTF-8 (meaning the tenant configuration fails to load and the fallback configuration is used instead), search Mimir server logs for lines containing Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible. Each log line includes the invalid matcher from the tenant configuration and the ID of the affected tenant. For example:
```
msg="Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible. Alertmanager has instead parsed the input using the classic matchers parser as a fallback. To make this input compatible with the UTF-8 matchers parser please make sure all regular expressions and values are double-quoted. If you are still seeing this message please open an issue." input="foo=" err="end of input: expected label value" suggestion="foo=\"\"" user="1"
```
In this example, the tenant with User ID 1 has an incompatible matcher in their tenant configuration foo= and should to be changed to the suggestion foo="".
To identify any tenant configurations that are compatible with UTF-8 but contain matchers that might change in behavior when its enabled, search Mimir server logs for lines containing Matchers input has disagreement. Disagreement occurs when a matcher is valid, but due to adding support for UTF-8, it can behave differently when UTF-8 is enabled.
```
msg="Matchers input has disagreement" input="foo=\"\\xf0\\x9f\\x99\\x82\"" user="1"
```

Note
It is possible for a tenant configuration to be both incompatible with UTF-8 and have disagreement, as an individual tenant configuration can contain a large number of matchers across different routes and inhibition rules.

Fix tenant configurations

To fix any identified tenant configurations, take the following steps:

Use the migrate-utf8 command in mimirtool to fix any tenant configurations that are incompatible with UTF-8. This command can migrate existing tenant configurations in a way that is backwards-compatible, doesn’t change the behavior of existing matchers, and works even in Mimir installations that don’t have UTF-8 enabled. If you cannot use mimirtool, you can edit tenant configurations by hand through applying each suggestion from the Mimir server logs.
You must look at tenant configurations that have disagreement on a case-by-case basis. Depending on the nature of the disagreement, you might not need to fix a matcher with disagreement. For example \xf0\x9f\x99\x82 is the byte sequence for the 🙂 emoji. If the intention is to match a literal 🙂 emoji then no change is required. However, if the intention is to match the literal \xf0\x9f\x99\x82 then you need to change the matcher to use \\xf0\\x9f\\x99\\x82 instead.

Note
It’s rare to find cases of disagreement in a tenant configuration, as most tenants do not need to match alerts that contain literal UTF-8 byte sequences in their labels.

Final steps

After identifying and fixing all affected tenant configurations, check the Mimir server logs again to make sure you haven’t missed any tenant configurations.
To enable UTF-8, set utf8-strict-mode-enabled to true. You must restart Mimir for the changes to take effect.
To confirm UTF-8 is enabled, search for Starting Alertmanager in UTF-8 strict mode in the Mimir server logs. If you find Starting Alertmanager in classic mode instead then UTF-8 is not enabled.
Any incompatible tenant configurations will fail to load. To identify if any tenant configurations are failing to load, search the Mimir server logs for lines containing error applying config, or query the cortex_alertmanager_config_last_reload_successful gauge for 0.
You can disable utf8-migration-logging-enabled and set log_level back to its previous value.

(Optional) Grafana Mimir overrides-exporter

Wed, 03 Jun 2026 09:01:40 +0200

(Optional) Grafana Mimir overrides-exporter

Grafana Mimir supports applying overrides on a per-tenant basis. A number of overrides configure limits that prevent a single tenant from using too many resources. The overrides-exporter component exposes limits as Prometheus metrics so that operators can understand how close tenants are to their limits.

For more information about configuring overrides, refer to Runtime configuration file.

Running the overrides-exporter

The overrides-exporter must be explicitly enabled.

Warning
The metrics emitted by the overrides-exporter have high cardinality. It’s recommended to run only a single replica of the overrides-exporter to limit that cardinality.

With a runtime.yaml file as follows:

# file: runtime.yaml
# In this example, we're overriding ingestion limits for a single tenant.
overrides:
  "user1":
    ingestion_burst_size: 350000
    ingestion_rate: 350000
    max_global_series_per_metric: 300000
    max_global_series_per_user: 300000

Run the overrides-exporter by providing the -target, and -runtime-config.file flags:

mimir -target=overrides-exporter -runtime-config.file=runtime.yaml

After the overrides-exporter starts, you can to use curl to inspect the tenant overrides:

curl -s http://localhost:8080/metrics | grep cortex_limits_overrides

The output metrics look similar to the following:

# HELP cortex_limits_overrides Resource limit overrides applied to tenants
# TYPE cortex_limits_overrides gauge
cortex_limits_overrides{limit_name="ingestion_burst_size",user="user1"} 350000
cortex_limits_overrides{limit_name="ingestion_rate",user="user1"} 350000
cortex_limits_overrides{limit_name="max_global_series_per_metric",user="user1"} 300000
cortex_limits_overrides{limit_name="max_global_series_per_user",user="user1"} 300000

With these metrics, you can set up alerts to know when tenants are close to hitting their limits before they exceed them.

(Optional) Grafana Mimir ruler

Wed, 03 Jun 2026 09:01:40 +0200

(Optional) Grafana Mimir ruler

The ruler is an optional component that evaluates PromQL expressions defined in recording and alerting rules. Each tenant has a set of recording and alerting rules and can group those rules into namespaces.

Evaluating rules generates new samples. Those samples are then passed to an in-process distributor to be ingested and made available for further queries. Configuration of the built-in distributor uses its configuration parameters.

Operational modes

The ruler supports two different rule evaluation modes:

Internal

This is the default mode. The ruler internally runs a querier, and evaluates recording and alerting rules in the ruler process itself. To evaluate rules, the ruler connects directly to ingesters and store-gateways, and writes any resulting series to the ingesters.

Configuration of the built-in querier uses its configuration parameters.

Note
When you use the internal mode, the ruler uses no query acceleration techniques and the evaluation of very high cardinality queries could take longer than the evaluation interval, which may lead to missing data points in the evaluated recording rules.

Remote

In this mode the ruler delegates rules evaluation to the query-frontend. When enabled, the ruler leverages all the query acceleration techniques employed by the query-frontend, such as query sharding. To enable the remote operational mode, set the -ruler.query-frontend.address CLI flag or its respective YAML configuration parameter for the ruler. Communication between ruler and query-frontend is established over gRPC, so you can make use of client-side load balancing by prefixing the query-frontend address URL with dns:///.

Remote over HTTP/HTTPS

When the query-frontend address set via the -ruler.query-frontend.address CLI flag or its respective YAML configuration parameter starts with http:// or https://, the ruler delegates rule evaluation to a Prometheus-compatible server. One use case for this feature is to use a proxy to federate data from multiple Mimir instances.

Recording rules

The ruler evaluates the expressions in the recording rules at regular intervals and writes the results back to the ingesters.

Alerting rules

The ruler evaluates the expressions in alerting rules at regular intervals and if the result includes any series, the alert becomes active. If an alerting rule has a defined for duration, it enters the PENDING (pending) state. After the alert has been active for the entire for duration, it enters the FIRING (firing) state. The ruler then notifies Alertmanagers of any FIRING (firing) alerts.

Configure the addresses of Alertmanagers with the -ruler.alertmanager-url flag. This flag supports the DNS service discovery format. For more information about DNS service discovery, refer to Supported discovery modes.

If you’re using Mimir’s Alertmanager, point the address to Alertmanager’s API. You can configure Alertmanager’s API prefix via the -http.alertmanager-http-prefix flag, which defaults to /alertmanager. For example, if Alertmanager is listening at http://mimir-alertmanager.namespace.svc.cluster.local and it is using the default API prefix, set -ruler.alertmanager-url to http://mimir-alertmanager.namespace.svc.cluster.local/alertmanager.

Federated rule groups

A federated rule group is a rule group with a non-empty source_tenants.

The source_tenants field allows aggregating data from multiple tenants while evaluating a rule group. The expressions of each rule in the group will be evaluated against the data of all tenants in source_tenants. If source_tenants is empty or omitted, then the tenant under which the group is created will be treated as the source_tenant.

Below is an example of how a federated rule group would look like:

name: MyGroupName
source_tenants: ["tenant-a", "tenant-b"]
rules:
  - record: sum:metric
    expr: sum(metric)

In this example MyGroupName rules will be evaluated against tenant-a and tenant-b tenants.

Federated rule groups are skipped during evaluation by default. This feature depends on the cross-tenant query federation feature. To enable federated rules set -ruler.tenant-federation.enabled=true and -tenant-federation.enabled=true CLI flags (or their respective YAML config options).

During evaluation query limits applied to single tenants are also applied to each query in the rule group. For example, if tenant-a has a federated rule group with source_tenants: [tenant-b, tenant-c], then query limits for tenant-b and tenant-c will be applied. If any of these limits is exceeded, the whole evaluation will fail. No partial results will be saved. The same “no partial results” guarantee applies to queries failing for other reasons (e.g. ingester unavailability).

The time series used during evaluation of federated rules will have the __tenant_id__ label, similar to how it is present on series returned with cross-tenant query federation.

Note
Federated rule groups allow data from multiple source tenants to be written into a single destination tenant. This makes the separation of tenants’ data less clear.

For example, tenant-a has a federated rule group that aggregates over tenant-b’s data like sum(metric_b) and writes the result back into tenant-a’s storage as the metric sum:metric_b. Now tenant-a contains some of tenant-b’s data.

Have this in mind when configuring the access control layer in front of Mimir and when enabling federated rules via -ruler.tenant-federation.enabled.

Sharding

The ruler supports multi-tenancy and horizontal scalability. To achieve horizontal scalability, the ruler shards the execution of rules by rule groups. Ruler replicas form their own hash ring stored in the KV store to divide the work of the executing rules.

To configure the rulers’ hash ring, refer to configuring hash rings.

Manage alerting and recording rules

There is more than one way to manage alerting and recording rules.

Via the `mimirtool` CLI tool

The mimirtool rules command offers utility subcommands for linting, formatting, and uploading rules to Grafana Mimir. For more information, refer to the mimirtool rules.

Via the `grafana/mimir/operations/mimir-rules-action` GitHub Action

The GitHub Action mimir-rules-action wraps some of the functionality of mimirtool rules. For more information, refer to the documentation of the action.

Via the HTTP configuration API

The ruler HTTP configuration API enables tenants to create, update, and delete rule groups. For a complete list of endpoints and example requests, refer to ruler.

State

The ruler uses the backend configured via -ruler-storage.backend. The ruler supports the following backends:

Amazon S3: -ruler-storage.backend=s3
Google Cloud Storage: -ruler-storage.backend=gcs
Microsoft Azure Storage: -ruler-storage.backend=azure
OpenStack Swift: -ruler-storage.backend=swift
Local storage: -ruler-storage.backend=local

Local storage

The local storage backend reads Prometheus recording rules from the local filesystem.

Note
Local storage is a read-only backend that doesn’t support the creation and deletion of rules through the Configuration API.

When all rulers have the same rule files, local storage supports ruler sharding. To facilitate sharding in Kubernetes, mount a Kubernetes ConfigMap into every ruler pod.

The following example shows a local storage definition:

-ruler-storage.backend=local
-ruler-storage.local.directory=/tmp/rules

The ruler looks for tenant rules in the /tmp/rules/<TENANT ID> directory. The ruler requires rule files to be in the Prometheus format.