Run Grafana Mimir in production on Grafana Labs

Planning Grafana Mimir capacity

Wed, 03 Jun 2026 09:01:40 +0200

Planning Grafana Mimir capacity

The information that follows is an overview about the CPU, memory, and disk space that Grafana Mimir requires at scale. You can get a rough idea about the required resources, rather than a prescriptive recommendation about the exact amount of CPU, memory, and disk space.

The resources utilization is estimated based on a general production workload, and the assumption is that Grafana Mimir is running with one tenant and the default configuration. Your real resources’ utilization likely differs, because it is based on actual data, configuration settings, and traffic patterns. For example, the real resources’ utilization might differ based on the actual number or length of series’ labels, or the percentage of queries that reach the store-gateway.

The resources’ utilization are the minimum requirements. To gracefully handle traffic peaks, run Grafana Mimir with 50% extra capacity for memory and disk.

Monolithic mode

When Grafana Mimir is running in monolithic mode, you can estimate the required resources by summing up all of the requirements for each Grafana Mimir component. For more information about per component requirements, refer to Microservices mode.

Microservices mode

When Grafana Mimir is running in microservices mode, you can estimate the required resources of each component individually.

Distributor

The distributor component resources utilization is determined by the number of received samples per second.

Estimated required CPU and memory:

CPU: 1 core every 25,000 samples per second.
Memory: 1GB every 25,000 samples per second.

How to estimate the rate of samples per second:

Query the number of active series across all of your Prometheus servers:
```
sum(prometheus_tsdb_head_series)
```
Check the scrape_interval that you configured in Prometheus.

Estimate the rate of samples per second by using the following formula:

estimated rate = (<active series> * (60 / <scrape interval in seconds>)) / 60

Ingester

The ingester component resources’ utilization is determined by the number of series that are in memory.

Estimated required CPU, memory, and disk space:

CPU: 1 core for every 300,000 series in memory
Memory: 2.5GB for every 300,000 series in memory
Disk space: 5GB for every 300,000 series in memory

How to estimate the number of series in memory:

Query the number of active series across all your Prometheus servers:
```
sum(prometheus_tsdb_head_series)
```
Check the configured -ingester.ring.replication-factor (defaults to 3)
Estimate the total number of series in memory across all ingesters using the following formula:
```
total number of in-memory series = <active series> * <replication factor>
```

Query-frontend

The query-frontend component resources utilization is determined by the number of queries per second.

Estimated required CPU and memory:

CPU: 1 core for every 250 queries per second
Memory: 1GB for every 250 queries per second

(Optional) Query-scheduler

The query-scheduler component resources’ utilization is determined by the number of queries per second.

Estimated required CPU and memory:

CPU: 1 core for every 500 queries per second
Memory: 100MB for every 500 queries per second

Querier

The querier component resources utilization is determined by the number of queries per second.

Estimated required CPU and memory:

CPU: 1 core for every 10 queries per second
Memory: 1GB for every 10 queries per second

Note
The estimate is 1 CPU core and 1GB per query, with an average query latency of 100ms.

Store-gateway

The store-gateway component resources’ utilization is determined by the number of queries per second and active series before ingesters replication.

Estimated required CPU, memory, and disk space:

CPU: 1 core every 10 queries per second
Memory: 1GB every 10 queries per second
Disk: 13GB every 1 million active series

Note
The CPU and memory requirements are computed by estimating 1 CPU core and 1GB per query, an average query latency of 1s when reaching the store-gateway, and only 10% of queries reaching the store-gateway.

Note
The disk requirement has been estimated assuming 2 bytes per sample for compacted blocks (both index and chunks), the index-header being 0.10% of a block size, a scrape interval of 15 seconds, a retention of 1 year and store-gateways replication factor configured to 3. The resulting estimated store-gateway disk space for one series is 13KB.

How to estimate the number of active series before ingesters replication:

Query the number of active series across all your Prometheus servers:
```
sum(prometheus_tsdb_head_series)
```

(Optional) Ruler

The ruler component resources utilization is determined by the number of rules evaluated per second.

When internal mode is used (default), rules evaluation is computationally equal to queries execution, so the querier resources recommendations apply to ruler too.

When remote operational mode is used, most of the computational load is shifted to query-frontend and querier components. So those should be scaled accordingly to deal both with queries and rules evaluation workload.

Compactor

The compactor component resources utilization is determined by the number of active series.

The compactor can scale horizontally both in Grafana Mimir clusters with one tenant and multiple tenants. We recommend to run at least one compactor instance every 20 million active series ingested in total in the Grafana Mimir cluster, calculated before ingesters replication.

Assuming you run one compactor instance every 20 million active series, the estimated required CPU, memory and disk for each compactor instance are:

CPU: 1 core
Memory: 4GB
Disk: 300GB

For more information about disk requirements, refer to Compactor disk utilization.

For more information about how to scale the compactor for large tenants, refer to Manage capacity for large tenants.

To estimate the number of active series before ingesters replication, query the number of active series across all Prometheus servers:

sum(prometheus_tsdb_head_series)

(Optional) Alertmanager

The Alertmanager component resources’ utilization is determined by the number of alerts firing at the same time.

Estimated required CPU and memory:

CPU: 1 CPU core for every 100 firing alert notifications per second
Memory: 1GB for every 5,000 firing alerts

To estimate the peak of firing alert notifications per second in the last 24 hours, run the following query across all Prometheus servers:

sum(max_over_time(rate(alertmanager_alerts_received_total[5m])[24h:5m]))

To estimate the maximum number of firing alerts in the last 24 hours, run the following query across all Prometheus servers:

sum(max_over_time(alertmanager_alerts[24h]))

(Optional) Caches

Grafana Mimir supports caching in various stages of the read path:

results cache to cache partial query results
chunks cache to cache timeseries chunks from the object store
index cache to accelerate looking up series and labels lookups
metadata cache to accelerate looking up individual timeseries blocks

A rule of thumb for scaling memcached deployments for these caches is to look at the rate of evictions. If it 0 during steady load and only with occasional spikes, then memcached is sufficiently scaled. If it is >0 all the time, then memcached needs to be scaled out.

You can execute the following query to find out the rate of evictions:

sum by(instance) (rate(memcached_items_evicted_total{}[5m]))

Perform a rolling update to Grafana Mimir

Wed, 03 Jun 2026 09:01:40 +0200

Perform a rolling update to Grafana Mimir

You can use a rolling update strategy to apply configuration changes to Grafana Mimir, and to upgrade Grafana Mimir to a newer version. A rolling update results in no downtime to Grafana Mimir.

Monolithic mode

When you run Grafana Mimir in monolithic mode, roll out changes to one instance at a time. After you apply changes to an instance, and the instance restarts, its /ready endpoint returns HTTP status code 200, which means that you can proceed with rolling out changes to another instance.

Note
When you run Grafana Mimir on Kubernetes, to roll out changes to one instance at a time, configure the Deployment or StatefulSet update strategy to RollingUpdate and maxUnavailable to 1.

Microservices mode

When you run Grafana Mimir in microservices mode, roll out changes to multiple instances of each stateless component at the same time. You can also roll out multiple stateless components in parallel. Stateful components have the following restrictions:

Alertmanagers: Roll out changes to a maximum of two Alertmanagers at a time.
Ingesters: Roll out changes to one ingester at a time.
Store-gateways: Roll out changes to a maximum of two store-gateways at a time.

Note
If you enabled zone-aware replication for a component, you can roll out changes to all component instances in the same zone at the same time.

Alertmanagers

Alertmanagers store alerts state in memory. When an Alertmanager is restarted, the alerts stored on the Alertmanager are not available until the Alertmanager runs again.

By default, Alertmanagers replicate each tenant’s alerts to three Alertmanagers. Alerts notification and visualization succeed when each tenant has at least one healthy Alertmanager in their shard.

To ensure no alerts notification, reception, or visualization fail during a rolling update, roll out changes to a maximum of two Alertmanagers at a time.

Note
If you enabled zone-aware replication for Alertmanager, you can roll out changes to all Alertmanagers in one zone at the same time.

Ingesters

Ingesters store recently received samples in memory. When an ingester restarts, the samples stored in the restarting ingester are not available for querying until the ingester is running again.

To ensure no query fails during a rolling update, roll out changes to one ingester at a time. This ensures at least one ingester per partition remains available when using the ingest storage architecture, and a majority of ingesters remains available when using the classic architecture.

Note
If you enabled zone-aware replication for ingesters, you can roll out changes to all ingesters in one zone at the same time with either the ingest storage or classic architectures.

Store-gateways

Store-gateways shard blocks among running instances. By default, each block is replicated to three store-gateways. Queries succeed when each required block is loaded by at least one store-gateway.

To ensure no query fails during a rolling update, roll out changes to a maximum of two store-gateways at a time.

Note
If you enabled zone-aware replication for store-gateways, you can roll out changes to all store-gateways in one zone at the same time.

Scaling out Grafana Mimir

Wed, 03 Jun 2026 09:01:40 +0200

Scaling out Grafana Mimir

Grafana Mimir can horizontally scale every component. Scaling out Grafana Mimir means that to respond to increased load, you can increase the number of replicas of each Grafana Mimir component.

We have designed Grafana Mimir to scale up quickly, safely, and with no manual intervention. However, be careful when scaling down some of the stateful components as these actions can result in writes and reads failures, or partial query results.

Monolithic mode

When running Grafana Mimir in monolithic mode, you can safely scale up to any number of instances. To scale down the Grafana Mimir cluster, see Scaling down ingesters.

Microservices mode

When running Grafana Mimir in microservices mode, you can safely scale up any component. You can also safely scale down any stateless component.

The following stateful components have limitations when scaling down:

Alertmanagers
Ingesters
Store-gateways

Scaling down Alertmanagers

Scaling down Alertmanagers can result in downtime.

Consider the following guidelines when you scale down Alertmanagers:

Scale down no more than two Alertmanagers at the same time.
Ensure at least -alertmanager.sharding-ring.replication-factor Alertmanager instances are running (three when running Grafana Mimir with the default configuration).

Note
If you enabled zone-aware replication for Alertmanagers, you can, in parallel, scale down any number of Alertmanager instances within one zone at a time.

Scaling down ingesters in ingest storage architecture

Note
This guidance applies to ingest storage architecture. For more information about the supported architectures in Grafana Mimir, refer to Grafana Mimir architecture.

When running Grafana Mimir with ingest storage architecture, scaling down ingesters triggers the reassignment of ingestion partitions instead of transferring in-memory series ownership between ingesters.

The ingestion layer durably stores each partition and can reassign it to a new ingester without data loss. When you terminate or scale down an ingester, it stops writing to its assigned partitions. Other ingesters continue consuming active partitions as normal according to the partition lifecycle.

For details about how partitions are created, reassigned, and transitioned between states, refer to Grafana Mimir hash rings.

Because the system writes ingestion data to Kafka and persists it in object storage, scaling down ingesters in the ingest storage architecture doesn’t require draining in-memory series or handoff operations.

In production environments, scaling down typically happens automatically through the rollout-operator, which coordinates ingesters across zones. The rollout-operator prepares ingesters for shutdown by moving their partitions from ACTIVE to INACTIVE and removes the ingesters after a defined period.

The rollout-operator is deployed by the Grafana Mimir Helm chart and is the recommended way to manage partitioned ingesters and scaling operations for ingest storage.

If you’re managing ingesters manually, you can use GET, POST, or DELETE on the HTTP API endpoint /ingester/prepare-partition-downscale to prepare ingesters for downscaling instead of relying on the rollout-operator.

Scaling down ingesters in classic architecture

Note
This guidance applies to classic architecture. For more information about the supported architectures in Grafana Mimir, refer to Grafana Mimir architecture.

Ingesters store recently received samples in memory. When you scale down an ingester, do not discard the samples stored in the ingester to guarantee no data loss.

You might experience the following challenges when you scale down ingesters:

By default, when an ingester shuts down, it does not upload the samples to long-term storage, which causes data loss.

Ingesters expose an API endpoint /ingester/shutdown that flushes in-memory time series data from ingester to the long-term storage and unregisters the ingester from the ring.

After the /ingester/shutdown API endpoint successfully returns, the ingester doesn’t receive write or read requests, but the process doesn’t exit.

You can terminate the process by sending a SIGINT or SIGTERM signal after the shutdown endpoint returns.

To mitigate this challenge, upload the ingester blocks to long-term storage before shutting down.
When you scale down ingesters, the querier might temporarily return partial results.

The blocks an ingester uploads to the long-term storage are not immediately available for querying. It takes the queriers and store-gateways some time before a newly uploaded block is available for querying. If you scale down two or more ingesters in a short period of time, queries might return partial results.

Complete the following steps to scale down ingesters in any zone.

Send a POST request to the /ingester/prepare-instance-ring-downscale API endpoint on each ingester to place it into read-only mode.
Wait until the blocks uploaded by read-only ingesters are available for querying before proceeding. The required amount of time to wait depends on your configuration and is the maximum value for the following settings:
- The configured -querier.query-store-after setting
- Two times the configured -blocks-storage.bucket-store.sync-interval setting
- Two times the configured -compactor.cleanup-interval setting
Scale down each ingester:
1. Send a POST request to the /ingester/shutdown API endpoint on the ingester to terminate it.
2. Wait until the API endpoint call has successfully returned and the ingester has logged “finished flushing and shipping TSDB blocks”.
3. Send a SIGINT or SIGTERM signal to the process of the ingester to terminate.

Scaling down store-gateways

To guarantee no downtime when scaling down store-gateways, complete the following steps:

Ensure at least -store-gateway.sharding-ring.replication-factor store-gateway instances are running (three when running Grafana Mimir with the default configuration).
Scale down no more than two store-gateways at the same time. If you enabled zone-aware replication for store-gateways, you can in parallel scale down any number of store-gateway instances in one zone at a time. Zone-aware replication is enabled by default in the mimir-distributed Helm chart.
Stop the store-gateway instances you want to scale down.
If you have set the value of -store-gateway.sharding-ring.unregister-on-shutdown to false, then remove the stopped instances from the store-gateway ring:
1. In a browser, go to the GET /store-gateway/ring page that store-gateways expose on their HTTP port.
2. Click Forget on the instances that you scaled down. Alternatively, wait for the duration of the value of -store-gateway.sharding-ring.heartbeat-timeout times 10. The default value of -store-gateway.sharding-ring.heartbeat-timeout is one minute.
Proceed with the next two store-gateway replicas. If you are using zone-aware replication, the proceed with the next zone.

Grafana Mimir production tips

Wed, 03 Jun 2026 09:01:40 +0200

Grafana Mimir production tips

This topic provides tips and techniques for you to consider when setting up a production Grafana Mimir cluster.

Ingester

Ensure a high maximum number of open file descriptors

The ingester receives samples from distributor, and appends the received samples to the specific per-tenant TSDB that is stored on the ingester local disk. The per-tenant TSDB is composed of several files and the ingester keeps a file descriptor open for each TSDB file. The total number of file descriptors, used to load TSDB files, linearly increases with the number of tenants in the Grafana Mimir cluster and the configured -blocks-storage.tsdb.retention-period.

We recommend fine-tuning the following settings to avoid reaching the maximum number of open file descriptors:

Configure the system’s file-max ulimit to at least 65536. Increase the limit to 1048576 when running a Grafana Mimir cluster with more than a thousand tenants.
Enable ingesters shuffle sharding to reduce the number of tenants per ingester.

Ingester disk space requirements

The ingester writes received samples to a write-ahead log (WAL) and by default, compacts them into a new block every two hours. Both the WAL and blocks are temporarily stored on the local disk. The required disk space depends on the number of time series stored in the ingester and the configured -blocks-storage.tsdb.retention-period.

For more information about estimating ingester disk space requirements, refer to Planning capacity.

Ingester disk IOPS

The IOPS (input/output operations per second) and latency of the ingester disks can affect both write and read requests. On the write path, the ingester writes to the write-ahead log (WAL) on disk. On the read path, the ingester reads from the series whose chunks have already been written to disk.

For these reasons, run the ingesters on disks such as SSDs that have fast disk speed.

Resource utilization based ingester read path limiting

The ingester supports limiting read requests based on resource (CPU/memory) utilization, in order to protect the write path. The ingester write path is generally considered more important than the read path in production, so it’s (often) better to limit read requests when ingesters are under pressure than to fail writes (or even crash).

We recommend enabling resource utilization based ingester read path limiting, to protect ingesters from potentially getting overwhelmed by expensive queries. For more information on its configuration, refer to ingester.

Querier

Ensure caching is enabled

The querier supports caching to reduce the number API requests to the long-term storage.

We recommend enabling caching in the querier. For more information about configuring the cache, refer to querier.

Avoid querying non-compacted blocks

When running Grafana Mimir at scale, querying non-compacted blocks might be inefficient for the following reasons:

Non-compacted blocks contain duplicated samples, as a result of the ingesters replication.
Querying many small TSDB indexes is slower than querying a few compacted TSDB indexes.

The default values for -querier.query-store-after, -querier.query-ingesters-within, and -blocks-storage.bucket-store.ignore-blocks-within are set such that only compacted blocks are queried. In most cases, no additional configuration is required.

Configure Grafana Mimir so large tenants are parallelized by the compactor:

Configure compactor’s -compactor.split-and-merge-shards and -compactor.split-groups for every tenant with more than 20 million time series. For more information about configuring the compactor’s split and merge shards, refer to compactor.

How to estimate `-querier.query-store-after`

If you are not using the defaults, set the -querier.query-store-after to a duration that is large enough to give compactor enough time to compact newly uploaded blocks, and queriers and store-gateways to discover and synchronize newly compacted blocks.

The following diagram shows all of the timings involved in the estimation. This diagram should be used only as a template and you can modify the assumptions based on real measurements in your Mimir cluster. The example makes the following assumptions:

An ingester takes up to 30 minutes to upload a block to the storage
The compactor takes up to three hours to compact two-hour blocks shipped from all ingesters
Querier and store-gateways take up to 15 minutes to discover and load a new compacted block

Based on these assumptions, in the worst-case scenario, it takes up to six hours and 45 minutes from when a sample is ingested until that sample has been appended to a block flushed to the storage and the block is vertically compacted with all other overlapping two-hour blocks shipped from ingesters.

Store-gateway

Ensure caching is enabled

The store-gateway supports caching that reduces the number of API calls to the long-term storage and improves query performance.

We recommend enabling caching in the store-gateway. For more information about configuring the cache, refer to store-gateway.

Ensure a high number of maximum open file descriptors

The store-gateway stores each block’s index-header on the local disk and loads it via memory mapping. The store-gateway keeps a file descriptor open for each index-header loaded at a given time. The total number of file descriptors used to load index-headers linearly increases with the number of blocks owned by the store-gateway instance.

We recommend configuring the system’s file-max ulimit at least to 65536 to avoid reaching the maximum number of open file descriptors.

Store-gateway disk IOPS

The IOPS and latency of the store-gateway disk can affect queries. The store-gateway downloads the block’s index-headers onto local disk, and reads them for each query that needs to fetch data from the long-term storage.

For these reasons, run the store-gateways on disks such as SSDs that have fast disk speed.

Compactor

Ensure the compactor has enough disk space

The compactor requires a lot of disk space to download source blocks from the long-term storage and temporarily store the compacted block before uploading it to the storage. For more information about required disk space, refer to Compactor disk utilization.

Manage capacity for large tenants

While working with large tenants, there are two compactor-specific settings to consider for planning or adjusting capacity:

-compactor.split-groups
-compactor.split-and-merge-shards

As a best practice, use one shard per every 8 million series in a tenant, rounded to the nearest even number. For example, for a tenant with 100 million series, use approximately 12 shards.

Additionally, as a best practice, set the number of split-groups to be the same as the shard count.

Alternatively, if you’re using query sharding on the query frontend, use the next power of 2 to avoid extra load on the read path. For example, use 16 shards for a tenant with 100 million series.

For more information about how these settings work, refer to Compaction algorithm.

Caching

Ensure Memcached is properly scaled

We recommend ensuring Memcached evictions happen infrequently. Grafana Mimir query performance might be negatively affected if your Memcached cluster evicts items frequently. We recommend increasing your Memcached cluster replicas to add more memory to the cluster and reduce evictions.

We also recommend running a dedicated Memcached cluster for each type of cache since each Mimir uses each differently and they scale differently. Separation also isolates each cache from the others so that one type of cache entry doesn’t crowd out other entries and degrade performance.

The metadata cache stores information about files in object storage, contents of auxiliary files such as bucket indexes, and discovered information about object storage such as lists of tenants. This results in relatively low CPU and bandwidth usage.

The query results cache to stores query responses. Entries in this cache tend to be small and Mimir only fetches a few at a time. This results in relatively low CPU and bandwidth usage.

The index caches store portions of the TSDB index fetched from object storage. Entries in this cache vary in size from a few hundred bytes to several megabytes. Mimir fetches entries both individually and in batches. A single query may fetch many entries from the cache. This results in higher CPU usage compared to other caches.

The chunks caches store portions of time series samples fetched from object storage. Entries in this cache tend to be large (several kilobytes) and are fetched in batches by the store-gateway components. This results in higher bandwidth usage compared to other caches.

Cache size

Memcached extstore feature allows to extend Memcached’s memory space onto flash (or similar) storage.

Refer to how we scaled Grafana Cloud Logs’ Memcached cluster to 50TB and improved reliability.

Security

We recommend securing the Grafana Mimir cluster. For more information about securing a Mimir cluster, refer to Secure Grafana Mimir.

Network

Most of the communication between Mimir components occurs over gRPC. The gRPC connection does not use any compression by default.

If network throughput is a concern or a high cost, then you can enable compression on the gRPC connection between components. This will reduce the network throughput at the cost of increased CPU usage. You can choose between gzip and snappy. Gzip provides better compression than snappy at the cost of more CPU usage.

You can use the Squash Compression Benchmark to choose between snappy and gzip. For protobuf data snappy achieves a compression ratio of 5 with compression speeds of around 400MiB/s. For the same data gzip achieves a ratio between 6 and 8 with speeds between 50MiB/s and 135 MiB/s.

To configure gRPC compression, use the following CLI flags or their YAML equivalents. The accepted values are snappy and gzip. If you set the flag to an empty string (''), it explicitly disables compression.

CLI flag	YAML option
`-query-frontend.grpc-client-config.grpc-compression`	`alertmanager.alertmanager_client.grpc_compression`
`-query-scheduler.grpc-client-config.grpc-compression`	`frontend.grpc_client_config.grpc_compression`
`-ruler.client.grpc-compression`	`frontend_worker.grpc_client_config.grpc_compression`
`-ruler.query-frontend.grpc-client-config.grpc-compression`	`ingester_client.grpc_client_config.grpc_compression`
`-alertmanager.alertmanager-client.grpc-compression`	`query_scheduler.grpc_client_config.grpc_compression`
`-ingester.client.grpc-compression`	`ruler.query_frontend.grpc_client_config.grpc_compression`

Note
-ruler.query-frontend.grpc-client-config.grpc-compression is only applicable when the ruler uses gRPC to communicate with the query-frontend. Refer to Remote ruler mode.

Heavy multi-tenancy

For each tenant, Mimir opens and maintains a TSDB in memory. If you have a significant number of tenants, the memory overhead might become prohibitive. To reduce the associated overhead, consider the following:

Reduce -blocks-storage.tsdb.head-chunks-write-buffer-size-bytes, default 4MB. For example, try 1MB or 128KB.
Reduce -blocks-storage.tsdb.stripe-size, default 16384. For example, try 256, or even 64.
Configure shuffle sharding

Periodic latency spikes when cutting blocks

Depending on the workload, you might witness latency spikes when Mimir cuts blocks. To reduce the impact of this behavior, consider the following:

Upgrade to 2.15+. Refer to https://github.com/grafana/mimir/commit/03f2f06e1247e997a0246d72f5c2c1fd9bd386df.

Run Grafana Mimir in production on Grafana Labs

Planning Grafana Mimir capacity

Planning Grafana Mimir capacity

Monolithic mode

Microservices mode

Distributor

Ingester

Query-frontend

(Optional) Query-scheduler

Querier

Store-gateway

(Optional) Ruler

Compactor

(Optional) Alertmanager

(Optional) Caches

Perform a rolling update to Grafana Mimir

Perform a rolling update to Grafana Mimir

Monolithic mode

Microservices mode

Alertmanagers

Ingesters

Store-gateways

Scaling out Grafana Mimir

Scaling out Grafana Mimir

Monolithic mode

Microservices mode

Scaling down Alertmanagers

Scaling down ingesters in ingest storage architecture

Scaling down ingesters in classic architecture

Scaling down store-gateways

Grafana Mimir production tips

Grafana Mimir production tips

Ingester

Ensure a high maximum number of open file descriptors

Ingester disk space requirements

Ingester disk IOPS

Resource utilization based ingester read path limiting

Querier

Ensure caching is enabled

Avoid querying non-compacted blocks

How to estimate -querier.query-store-after

Store-gateway

Ensure caching is enabled

Ensure a high number of maximum open file descriptors

Store-gateway disk IOPS

Compactor

Ensure the compactor has enough disk space

Manage capacity for large tenants

Caching

Ensure Memcached is properly scaled

Cache size

Security

Network

Heavy multi-tenancy

Periodic latency spikes when cutting blocks

How to estimate `-querier.query-store-after`