<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Run Grafana Mimir in production on Grafana Labs</title><link>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/</link><description>Recent content in Run Grafana Mimir in production on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/mimir/v3.1.x/manage/run-production-environment/index.xml" rel="self" type="application/rss+xml"/><item><title>Planning Grafana Mimir capacity</title><link>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/planning-capacity/</link><pubDate>Wed, 03 Jun 2026 09:01:40 +0200</pubDate><guid>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/planning-capacity/</guid><content><![CDATA[&lt;h1 id=&#34;planning-grafana-mimir-capacity&#34;&gt;Planning Grafana Mimir capacity&lt;/h1&gt;
&lt;p&gt;The information that follows is an overview about the CPU, memory, and disk space that Grafana Mimir requires at scale.
You can get a rough idea about the required resources, rather than a prescriptive recommendation about the exact amount of CPU, memory, and disk space.&lt;/p&gt;
&lt;p&gt;The resources utilization is estimated based on a general production workload, and the assumption
is that Grafana Mimir is running with one tenant and the default configuration.
Your real resources’ utilization likely differs, because it is based on actual data, configuration settings, and traffic patterns.
For example, the real resources’ utilization might differ based on the actual number
or length of series&amp;rsquo; labels, or the percentage of queries that reach the store-gateway.&lt;/p&gt;
&lt;p&gt;The resources’ utilization are the minimum requirements.
To gracefully handle traffic peaks, run Grafana Mimir with 50% extra capacity for memory and disk.&lt;/p&gt;
&lt;h2 id=&#34;monolithic-mode&#34;&gt;Monolithic mode&lt;/h2&gt;
&lt;p&gt;When Grafana Mimir is running in monolithic mode, you can estimate the required resources by summing up all of the requirements for each Grafana Mimir component.
For more information about per component requirements, refer to &lt;a href=&#34;#microservices-mode&#34;&gt;Microservices mode&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;microservices-mode&#34;&gt;Microservices mode&lt;/h2&gt;
&lt;p&gt;When Grafana Mimir is running in microservices mode, you can estimate the required resources of each component individually.&lt;/p&gt;
&lt;h3 id=&#34;distributor&#34;&gt;Distributor&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/distributor/&#34;&gt;distributor&lt;/a&gt; component resources utilization is determined by the number of received samples per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core every 25,000 samples per second.&lt;/li&gt;
&lt;li&gt;Memory: 1GB every 25,000 samples per second.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How to estimate the rate of samples per second:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query the number of active series across all of your Prometheus servers:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Check the &lt;a href=&#34;https://prometheus.io/docs/prometheus/latest/configuration/configuration/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;scrape_interval&lt;/a&gt; that you configured in Prometheus.&lt;/li&gt;
&lt;li&gt;Estimate the rate of samples per second by using the following formula:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;estimated rate = (&amp;lt;active series&amp;gt; * (60 / &amp;lt;scrape interval in seconds&amp;gt;)) / 60&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;ingester&#34;&gt;Ingester&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/ingester/&#34;&gt;ingester&lt;/a&gt; component resources’ utilization is determined by the number of series that are in memory.&lt;/p&gt;
&lt;p&gt;Estimated required CPU, memory, and disk space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 300,000 series in memory&lt;/li&gt;
&lt;li&gt;Memory: 2.5GB for every 300,000 series in memory&lt;/li&gt;
&lt;li&gt;Disk space: 5GB for every 300,000 series in memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How to estimate the number of series in memory:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query the number of active series across all your Prometheus servers:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Check the configured &lt;code&gt;-ingester.ring.replication-factor&lt;/code&gt; (defaults to &lt;code&gt;3&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Estimate the total number of series in memory across all ingesters using the following formula:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;total number of in-memory series = &amp;lt;active series&amp;gt; * &amp;lt;replication factor&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;query-frontend&#34;&gt;Query-frontend&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/query-frontend/&#34;&gt;query-frontend&lt;/a&gt; component resources utilization is determined by the number of queries per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 250 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB for every 250 queries per second&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;optional-query-scheduler&#34;&gt;(Optional) Query-scheduler&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/query-scheduler/&#34;&gt;query-scheduler&lt;/a&gt; component resources’ utilization is determined by the number of queries per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 500 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 100MB for every 500 queries per second&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;querier&#34;&gt;Querier&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/querier/&#34;&gt;querier&lt;/a&gt; component resources utilization is determined by the number of queries per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 10 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB for every 10 queries per second&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;The estimate is 1 CPU core and 1GB per query, with an average query latency of 100ms.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h3 id=&#34;store-gateway&#34;&gt;Store-gateway&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;store-gateway&lt;/a&gt; component resources’ utilization is determined by the number of queries per second and active series before ingesters replication.&lt;/p&gt;
&lt;p&gt;Estimated required CPU, memory, and disk space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core every 10 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB every 10 queries per second&lt;/li&gt;
&lt;li&gt;Disk: 13GB every 1 million active series&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;The CPU and memory requirements are computed by estimating 1 CPU core and 1GB per query, an average query latency of 1s when reaching the store-gateway, and only 10% of queries reaching the store-gateway.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;



&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;The disk requirement has been estimated assuming 2 bytes per sample for compacted blocks (both index and chunks), the index-header being 0.10% of a block size, a scrape interval of 15 seconds, a retention of 1 year and store-gateways replication factor configured to 3. The resulting estimated store-gateway disk space for one series is 13KB.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;How to estimate the number of active series before ingesters replication:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query the number of active series across all your Prometheus servers:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;optional-ruler&#34;&gt;(Optional) Ruler&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/ruler/&#34;&gt;ruler&lt;/a&gt; component resources utilization is determined by the number of rules evaluated per second.&lt;/p&gt;
&lt;p&gt;When &lt;a href=&#34;../../../references/architecture/components/ruler/#internal&#34;&gt;internal&lt;/a&gt; mode is used (default), rules evaluation is computationally equal to queries execution, so the querier resources recommendations apply to ruler too.&lt;/p&gt;
&lt;p&gt;When &lt;a href=&#34;../../../references/architecture/components/ruler/#internal&#34;&gt;remote&lt;/a&gt; operational mode is used, most of the computational load is shifted to query-frontend and querier components. So those should be scaled accordingly to deal both with queries and rules evaluation workload.&lt;/p&gt;
&lt;h3 id=&#34;compactor&#34;&gt;Compactor&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/compactor/&#34;&gt;compactor&lt;/a&gt; component resources utilization is determined by the number of active series.&lt;/p&gt;
&lt;p&gt;The compactor can scale horizontally both in Grafana Mimir clusters with one tenant and multiple tenants.
We recommend to run at least one compactor instance every 20 million active series ingested in total in the Grafana Mimir cluster, calculated before ingesters replication.&lt;/p&gt;
&lt;p&gt;Assuming you run one compactor instance every 20 million active series, the estimated required CPU, memory and disk for each compactor instance are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core&lt;/li&gt;
&lt;li&gt;Memory: 4GB&lt;/li&gt;
&lt;li&gt;Disk: 300GB&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information about disk requirements, refer to &lt;a href=&#34;../../../references/architecture/components/compactor/#compactor-disk-utilization&#34;&gt;Compactor disk utilization&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For more information about how to scale the compactor for large tenants, refer to 
    &lt;a href=&#34;/docs/mimir/v3.1.x/manage/run-production-environment/production-tips/#manage-capacity-for-large-tenants&#34;&gt;Manage capacity for large tenants&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To estimate the number of active series before ingesters replication, query the number of active series across all Prometheus servers:&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;optional-alertmanager&#34;&gt;(Optional) Alertmanager&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/alertmanager/&#34;&gt;Alertmanager&lt;/a&gt; component resources’ utilization is determined by the number of alerts firing at the same time.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 CPU core for every 100 firing alert notifications per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB for every 5,000 firing alerts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To estimate the peak of firing alert notifications per second in the last 24 hours, run the following query across all Prometheus servers:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(max_over_time(rate(alertmanager_alerts_received_total[5m])[24h:5m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To estimate the maximum number of firing alerts in the last 24 hours, run the following query across all Prometheus servers:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(max_over_time(alertmanager_alerts[24h]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;optional-caches&#34;&gt;(Optional) Caches&lt;/h3&gt;
&lt;p&gt;Grafana Mimir supports caching in various stages of the read path:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;results cache to cache partial query results&lt;/li&gt;
&lt;li&gt;chunks cache to cache timeseries chunks from the object store&lt;/li&gt;
&lt;li&gt;index cache to accelerate looking up series and labels lookups&lt;/li&gt;
&lt;li&gt;metadata cache to accelerate looking up individual timeseries blocks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A rule of thumb for scaling memcached deployments for these caches is to look at the rate of evictions. If it 0 during
steady load and only with occasional spikes, then memcached is sufficiently scaled. If it is &amp;gt;0 all the time, then
memcached needs to be scaled out.&lt;/p&gt;
&lt;p&gt;You can execute the following query to find out the rate of evictions:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum by(instance) (rate(memcached_items_evicted_total{}[5m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="planning-grafana-mimir-capacity">Planning Grafana Mimir capacity&lt;/h1>
&lt;p>The information that follows is an overview about the CPU, memory, and disk space that Grafana Mimir requires at scale.
You can get a rough idea about the required resources, rather than a prescriptive recommendation about the exact amount of CPU, memory, and disk space.&lt;/p></description></item><item><title>Perform a rolling update to Grafana Mimir</title><link>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/perform-a-rolling-update/</link><pubDate>Wed, 03 Jun 2026 09:01:40 +0200</pubDate><guid>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/perform-a-rolling-update/</guid><content><![CDATA[&lt;h1 id=&#34;perform-a-rolling-update-to-grafana-mimir&#34;&gt;Perform a rolling update to Grafana Mimir&lt;/h1&gt;
&lt;p&gt;You can use a rolling update strategy to apply configuration changes to
Grafana Mimir, and to upgrade Grafana Mimir to a newer version. A
rolling update results in no downtime to Grafana Mimir.&lt;/p&gt;
&lt;h2 id=&#34;monolithic-mode&#34;&gt;Monolithic mode&lt;/h2&gt;
&lt;p&gt;When you run Grafana Mimir in monolithic mode, roll out changes to one instance at a time.
After you apply changes to an instance, and the instance restarts, its &lt;code&gt;/ready&lt;/code&gt; endpoint returns HTTP status code &lt;code&gt;200&lt;/code&gt;, which means that you can proceed with rolling out changes to another instance.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;When you run Grafana Mimir on Kubernetes, to roll out changes to one instance at a time, configure the &lt;code&gt;Deployment&lt;/code&gt; or &lt;code&gt;StatefulSet&lt;/code&gt; update strategy to &lt;code&gt;RollingUpdate&lt;/code&gt; and &lt;code&gt;maxUnavailable&lt;/code&gt; to &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h2 id=&#34;microservices-mode&#34;&gt;Microservices mode&lt;/h2&gt;
&lt;p&gt;When you run Grafana Mimir in microservices mode, roll out changes to multiple instances of each stateless component at the same time.
You can also roll out multiple stateless components in parallel.
Stateful components have the following restrictions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Alertmanagers: Roll out changes to a maximum of two Alertmanagers at a time.&lt;/li&gt;
&lt;li&gt;Ingesters: Roll out changes to one ingester at a time.&lt;/li&gt;
&lt;li&gt;Store-gateways: Roll out changes to a maximum of two store-gateways at a time.&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for a component, you can roll out changes to all component instances in the same zone at the same time.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h3 id=&#34;alertmanagers&#34;&gt;Alertmanagers&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;../../../references/architecture/components/alertmanager/&#34;&gt;Alertmanagers&lt;/a&gt; store alerts state in memory.
When an Alertmanager is restarted, the alerts stored on the Alertmanager are not available until the Alertmanager runs again.&lt;/p&gt;
&lt;p&gt;By default, Alertmanagers replicate each tenant&amp;rsquo;s alerts to three Alertmanagers.
Alerts notification and visualization succeed when each tenant has at least one healthy Alertmanager in their shard.&lt;/p&gt;
&lt;p&gt;To ensure no alerts notification, reception, or visualization fail during a rolling update, roll out changes to a maximum of two Alertmanagers at a time.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for Alertmanager, you can roll out changes to all Alertmanagers in one zone at the same time.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h3 id=&#34;ingesters&#34;&gt;Ingesters&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;../../../references/architecture/components/ingester/&#34;&gt;Ingesters&lt;/a&gt; store recently received samples in memory.
When an ingester restarts, the samples stored in the restarting ingester are not available for querying until the ingester is running again.&lt;/p&gt;
&lt;p&gt;To ensure no query fails during a rolling update, roll out changes to one ingester at a time. This ensures at least one ingester per partition remains available when using the ingest storage architecture, and a majority of ingesters remains available when using the classic architecture.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for ingesters, you can roll out changes to all ingesters in one zone at the same time with either the ingest storage or classic architectures.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h3 id=&#34;store-gateways&#34;&gt;Store-gateways&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;Store-gateways&lt;/a&gt; shard blocks among running instances.
By default, each block is replicated to three store-gateways.
Queries succeed when each required block is loaded by at least one store-gateway.&lt;/p&gt;
&lt;p&gt;To ensure no query fails during a rolling update, roll out changes to a maximum of two store-gateways at a time.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for store-gateways, you can roll out changes to all store-gateways in one zone at the same time.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

]]></content><description>&lt;h1 id="perform-a-rolling-update-to-grafana-mimir">Perform a rolling update to Grafana Mimir&lt;/h1>
&lt;p>You can use a rolling update strategy to apply configuration changes to
Grafana Mimir, and to upgrade Grafana Mimir to a newer version. A
rolling update results in no downtime to Grafana Mimir.&lt;/p></description></item><item><title>Scaling out Grafana Mimir</title><link>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/scaling-out/</link><pubDate>Wed, 03 Jun 2026 09:01:40 +0200</pubDate><guid>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/scaling-out/</guid><content><![CDATA[&lt;h1 id=&#34;scaling-out-grafana-mimir&#34;&gt;Scaling out Grafana Mimir&lt;/h1&gt;
&lt;p&gt;Grafana Mimir can horizontally scale every component.
Scaling out Grafana Mimir means that to respond to increased load, you can increase the number of replicas of each Grafana Mimir component.&lt;/p&gt;
&lt;p&gt;We have designed Grafana Mimir to scale up quickly, safely, and with no manual intervention.
However, be careful when scaling down some of the stateful components as these actions can result in writes and reads failures, or partial query results.&lt;/p&gt;
&lt;h2 id=&#34;monolithic-mode&#34;&gt;Monolithic mode&lt;/h2&gt;
&lt;p&gt;When running Grafana Mimir in monolithic mode, you can safely scale up to any number of instances.
To scale down the Grafana Mimir cluster, see &lt;a href=&#34;#scaling-down-ingesters&#34;&gt;Scaling down ingesters&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;microservices-mode&#34;&gt;Microservices mode&lt;/h2&gt;
&lt;p&gt;When running Grafana Mimir in microservices mode, you can safely scale up any component.
You can also safely scale down any stateless component.&lt;/p&gt;
&lt;p&gt;The following stateful components have limitations when scaling down:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Alertmanagers&lt;/li&gt;
&lt;li&gt;Ingesters&lt;/li&gt;
&lt;li&gt;Store-gateways&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;scaling-down-alertmanagers&#34;&gt;Scaling down Alertmanagers&lt;/h3&gt;
&lt;p&gt;Scaling down &lt;a href=&#34;../../../references/architecture/components/alertmanager/&#34;&gt;Alertmanagers&lt;/a&gt; can result in downtime.&lt;/p&gt;
&lt;p&gt;Consider the following guidelines when you scale down Alertmanagers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scale down no more than two Alertmanagers at the same time.&lt;/li&gt;
&lt;li&gt;Ensure at least &lt;code&gt;-alertmanager.sharding-ring.replication-factor&lt;/code&gt; Alertmanager instances are running (three when running Grafana Mimir with the default configuration).&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for Alertmanagers, you can, in parallel, scale down any number of Alertmanager instances within one zone at a time.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h3 id=&#34;scaling-down-ingesters-in-ingest-storage-architecture&#34;&gt;Scaling down ingesters in ingest storage architecture&lt;/h3&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;This guidance applies to ingest storage architecture. For more information about the supported architectures in Grafana Mimir, refer to 
    &lt;a href=&#34;/docs/mimir/v3.1.x/get-started/about-grafana-mimir-architecture/&#34;&gt;Grafana Mimir architecture&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;When running Grafana Mimir with ingest storage architecture, scaling down ingesters triggers the reassignment of ingestion partitions instead of transferring in-memory series ownership between ingesters.&lt;/p&gt;
&lt;p&gt;The ingestion layer durably stores each partition and can reassign it to a new ingester without data loss. When you terminate or scale down an ingester, it stops writing to its assigned partitions. Other ingesters continue consuming active partitions as normal according to the partition lifecycle.&lt;/p&gt;
&lt;p&gt;For details about how partitions are created, reassigned, and transitioned between states, refer to 
    &lt;a href=&#34;/docs/mimir/v3.1.x/references/architecture/hash-ring/#partitions-ring-lifecycle&#34;&gt;Grafana Mimir hash rings&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Because the system writes ingestion data to Kafka and persists it in object storage, scaling down ingesters in the ingest storage architecture doesn’t require draining in-memory series or handoff operations.&lt;/p&gt;
&lt;p&gt;In production environments, scaling down typically happens automatically through the rollout-operator, which coordinates ingesters across zones. The rollout-operator prepares ingesters for shutdown by moving their partitions from &lt;code&gt;ACTIVE&lt;/code&gt; to &lt;code&gt;INACTIVE&lt;/code&gt; and removes the ingesters after a defined period.&lt;/p&gt;
&lt;p&gt;The rollout-operator is deployed by the Grafana Mimir Helm chart and is the recommended way to manage partitioned ingesters and scaling operations for ingest storage.&lt;/p&gt;
&lt;p&gt;If you’re managing ingesters manually, you can use GET, POST, or DELETE on the 
    &lt;a href=&#34;/docs/mimir/v3.1.x/references/http-api/#prepare-partition-downscale&#34;&gt;HTTP API endpoint &lt;code&gt;/ingester/prepare-partition-downscale&lt;/code&gt;&lt;/a&gt; to prepare ingesters for downscaling instead of relying on the rollout-operator.&lt;/p&gt;
&lt;h3 id=&#34;scaling-down-ingesters-in-classic-architecture&#34;&gt;Scaling down ingesters in classic architecture&lt;/h3&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;This guidance applies to classic architecture. For more information about the supported architectures in Grafana Mimir, refer to 
    &lt;a href=&#34;/docs/mimir/v3.1.x/get-started/about-grafana-mimir-architecture/&#34;&gt;Grafana Mimir architecture&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;
    &lt;a href=&#34;/docs/mimir/v3.1.x/references/architecture/components/ingester/&#34;&gt;Ingesters&lt;/a&gt; store recently received samples in memory. When you scale down an ingester, do not discard the samples stored in the ingester to guarantee no data loss.&lt;/p&gt;
&lt;p&gt;You might experience the following challenges when you scale down ingesters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;By default, when an ingester shuts down, it does not upload the samples to long-term storage, which causes data loss.&lt;/p&gt;
&lt;p&gt;Ingesters expose an API endpoint 
    &lt;a href=&#34;/docs/mimir/v3.1.x/references/http-api/#shutdown&#34;&gt;&lt;code&gt;/ingester/shutdown&lt;/code&gt;&lt;/a&gt; that flushes in-memory time series data from ingester to the long-term storage and unregisters the ingester from the ring.&lt;/p&gt;
&lt;p&gt;After the &lt;code&gt;/ingester/shutdown&lt;/code&gt; API endpoint successfully returns, the ingester doesn&amp;rsquo;t receive write or read requests, but the process doesn&amp;rsquo;t exit.&lt;/p&gt;
&lt;p&gt;You can terminate the process by sending a &lt;code&gt;SIGINT&lt;/code&gt; or &lt;code&gt;SIGTERM&lt;/code&gt; signal after the shutdown endpoint returns.&lt;/p&gt;
&lt;p&gt;To mitigate this challenge, upload the ingester blocks to long-term storage before shutting down.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When you scale down ingesters, the querier might temporarily return partial results.&lt;/p&gt;
&lt;p&gt;The blocks an ingester uploads to the long-term storage are not immediately available for querying.
It takes the 
    &lt;a href=&#34;/docs/mimir/v3.1.x/references/architecture/components/querier/&#34;&gt;queriers&lt;/a&gt; and 
    &lt;a href=&#34;/docs/mimir/v3.1.x/references/architecture/components/store-gateway/&#34;&gt;store-gateways&lt;/a&gt; some time before a newly uploaded block is available for querying.
If you scale down two or more ingesters in a short period of time, queries might return partial results.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Complete the following steps to scale down ingesters in any zone.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Send a POST request to the &lt;code&gt;/ingester/prepare-instance-ring-downscale&lt;/code&gt; API endpoint on each ingester to place it into read-only mode.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Wait until the blocks uploaded by read-only ingesters are available for querying before proceeding. The required amount of time to wait depends on your configuration and is the maximum value for the following settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The configured &lt;code&gt;-querier.query-store-after&lt;/code&gt; setting&lt;/li&gt;
&lt;li&gt;Two times the configured &lt;code&gt;-blocks-storage.bucket-store.sync-interval&lt;/code&gt; setting&lt;/li&gt;
&lt;li&gt;Two times the configured &lt;code&gt;-compactor.cleanup-interval&lt;/code&gt; setting&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scale down each ingester:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Send a POST request to the &lt;code&gt;/ingester/shutdown&lt;/code&gt; API endpoint on the ingester to terminate it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Wait until the API endpoint call has successfully returned and the ingester has logged &amp;ldquo;finished flushing and shipping TSDB blocks&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Send a &lt;code&gt;SIGINT&lt;/code&gt; or &lt;code&gt;SIGTERM&lt;/code&gt; signal to the process of the ingester to terminate.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;scaling-down-store-gateways&#34;&gt;Scaling down store-gateways&lt;/h3&gt;
&lt;p&gt;To guarantee no downtime when scaling down &lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;store-gateways&lt;/a&gt;, complete the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Ensure at least &lt;code&gt;-store-gateway.sharding-ring.replication-factor&lt;/code&gt; store-gateway instances are running (three when running Grafana Mimir with the default configuration).&lt;/li&gt;
&lt;li&gt;Scale down no more than two store-gateways at the same time.
If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt;
for store-gateways, you can in parallel scale down any number of store-gateway instances in one zone at a time.
Zone-aware replication is enabled by default in the &lt;code&gt;mimir-distributed&lt;/code&gt; Helm chart.&lt;/li&gt;
&lt;li&gt;Stop the store-gateway instances you want to scale down.&lt;/li&gt;
&lt;li&gt;If you have set the value of &lt;code&gt;-store-gateway.sharding-ring.unregister-on-shutdown&lt;/code&gt; to &lt;code&gt;false&lt;/code&gt;, then remove the stopped instances from the store-gateway ring:
&lt;ol&gt;
&lt;li&gt;In a browser, go to the &lt;code&gt;GET /store-gateway/ring&lt;/code&gt; page that store-gateways expose on their HTTP port.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Forget&lt;/strong&gt; on the instances that you scaled down.
Alternatively, wait for the duration of the value of &lt;code&gt;-store-gateway.sharding-ring.heartbeat-timeout&lt;/code&gt; times 10.
The default value of &lt;code&gt;-store-gateway.sharding-ring.heartbeat-timeout&lt;/code&gt; is one minute.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Proceed with the next two store-gateway replicas. If you are using zone-aware replication, the proceed with the next zone.&lt;/li&gt;
&lt;/ol&gt;
]]></content><description>&lt;h1 id="scaling-out-grafana-mimir">Scaling out Grafana Mimir&lt;/h1>
&lt;p>Grafana Mimir can horizontally scale every component.
Scaling out Grafana Mimir means that to respond to increased load, you can increase the number of replicas of each Grafana Mimir component.&lt;/p></description></item><item><title>Grafana Mimir production tips</title><link>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/production-tips/</link><pubDate>Wed, 03 Jun 2026 09:01:40 +0200</pubDate><guid>https://grafana.com/docs/mimir/v3.1.x/manage/run-production-environment/production-tips/</guid><content><![CDATA[&lt;h1 id=&#34;grafana-mimir-production-tips&#34;&gt;Grafana Mimir production tips&lt;/h1&gt;
&lt;p&gt;This topic provides tips and techniques for you to consider when setting up a production Grafana Mimir cluster.&lt;/p&gt;
&lt;h2 id=&#34;ingester&#34;&gt;Ingester&lt;/h2&gt;
&lt;h3 id=&#34;ensure-a-high-maximum-number-of-open-file-descriptors&#34;&gt;Ensure a high maximum number of open file descriptors&lt;/h3&gt;
&lt;p&gt;The ingester receives samples from distributor, and appends the received samples to the specific per-tenant TSDB that is stored on the ingester local disk.
The per-tenant TSDB is composed of several files and the ingester keeps a file descriptor open for each TSDB file.
The total number of file descriptors, used to load TSDB files, linearly increases with the number of tenants in the Grafana Mimir cluster and the configured &lt;code&gt;-blocks-storage.tsdb.retention-period&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We recommend fine-tuning the following settings to avoid reaching the maximum number of open file descriptors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure the system&amp;rsquo;s &lt;code&gt;file-max&lt;/code&gt; ulimit to at least &lt;code&gt;65536&lt;/code&gt;. Increase the limit to &lt;code&gt;1048576&lt;/code&gt; when running a Grafana Mimir cluster with more than a thousand tenants.&lt;/li&gt;
&lt;li&gt;Enable ingesters &lt;a href=&#34;../../../configure/configure-shuffle-sharding/&#34;&gt;shuffle sharding&lt;/a&gt; to reduce the number of tenants per ingester.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;ingester-disk-space-requirements&#34;&gt;Ingester disk space requirements&lt;/h3&gt;
&lt;p&gt;The ingester writes received samples to a write-ahead log (WAL) and by default, compacts them into a new block every two hours.
Both the WAL and blocks are temporarily stored on the local disk.
The required disk space depends on the number of time series stored in the ingester and the configured &lt;code&gt;-blocks-storage.tsdb.retention-period&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For more information about estimating ingester disk space requirements, refer to &lt;a href=&#34;../planning-capacity/#ingester&#34;&gt;Planning capacity&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;ingester-disk-iops&#34;&gt;Ingester disk IOPS&lt;/h3&gt;
&lt;p&gt;The IOPS (input/output operations per second) and latency of the ingester disks can affect both write and read requests.
On the write path, the ingester writes to the write-ahead log (WAL) on disk.
On the read path, the ingester reads from the series whose chunks have already been written to disk.&lt;/p&gt;
&lt;p&gt;For these reasons, run the ingesters on disks such as SSDs that have fast disk speed.&lt;/p&gt;
&lt;h3 id=&#34;resource-utilization-based-ingester-read-path-limiting&#34;&gt;Resource utilization based ingester read path limiting&lt;/h3&gt;
&lt;p&gt;The ingester supports limiting read requests based on resource (CPU/memory) utilization, in order to protect the write path.
The ingester write path is generally considered more important than the read path in production, so it&amp;rsquo;s (often) better to
limit read requests when ingesters are under pressure than to fail writes (or even crash).&lt;/p&gt;
&lt;p&gt;We recommend enabling resource utilization based ingester read path limiting, to protect ingesters from potentially getting overwhelmed by expensive queries.
For more information on its configuration, refer to &lt;a href=&#34;../../../configure/configure-resource-utilization-based-ingester-read-path-limiting/&#34;&gt;ingester&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;querier&#34;&gt;Querier&lt;/h2&gt;
&lt;h3 id=&#34;ensure-caching-is-enabled&#34;&gt;Ensure caching is enabled&lt;/h3&gt;
&lt;p&gt;The querier supports caching to reduce the number API requests to the long-term storage.&lt;/p&gt;
&lt;p&gt;We recommend enabling caching in the querier.
For more information about configuring the cache, refer to &lt;a href=&#34;../../../references/architecture/components/querier/&#34;&gt;querier&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;avoid-querying-non-compacted-blocks&#34;&gt;Avoid querying non-compacted blocks&lt;/h3&gt;
&lt;p&gt;When running Grafana Mimir at scale, querying non-compacted blocks might be inefficient for the following reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Non-compacted blocks contain duplicated samples, as a result of the ingesters replication.&lt;/li&gt;
&lt;li&gt;Querying many small TSDB indexes is slower than querying a few compacted TSDB indexes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The default values for &lt;code&gt;-querier.query-store-after&lt;/code&gt;, &lt;code&gt;-querier.query-ingesters-within&lt;/code&gt;, and &lt;code&gt;-blocks-storage.bucket-store.ignore-blocks-within&lt;/code&gt; are set such that only compacted blocks are queried. In most cases, no additional configuration is required.&lt;/p&gt;
&lt;p&gt;Configure Grafana Mimir so large tenants are parallelized by the compactor:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure compactor&amp;rsquo;s &lt;code&gt;-compactor.split-and-merge-shards&lt;/code&gt; and &lt;code&gt;-compactor.split-groups&lt;/code&gt; for every tenant with more than 20 million time series. For more information about configuring the compactor&amp;rsquo;s split and merge shards, refer to &lt;a href=&#34;../../../references/architecture/components/compactor/&#34;&gt;compactor&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id=&#34;how-to-estimate--querierquery-store-after&#34;&gt;How to estimate &lt;code&gt;-querier.query-store-after&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;If you are not using the defaults, set the &lt;code&gt;-querier.query-store-after&lt;/code&gt; to a duration that is large enough to give compactor enough time to compact newly uploaded blocks, and queriers and store-gateways to discover and synchronize newly compacted blocks.&lt;/p&gt;
&lt;p&gt;The following diagram shows all of the timings involved in the estimation. This diagram should be used only as a template and you can modify the assumptions based on real measurements in your Mimir cluster. The example makes the following assumptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An ingester takes up to 30 minutes to upload a block to the storage&lt;/li&gt;
&lt;li&gt;The compactor takes up to three hours to compact two-hour blocks shipped from all ingesters&lt;/li&gt;
&lt;li&gt;Querier and store-gateways take up to 15 minutes to discover and load a new compacted block&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on these assumptions, in the worst-case scenario, it takes up to six hours and 45 minutes from when a sample is ingested until that sample has been appended to a block flushed to the storage and the block is &lt;a href=&#34;../../../references/architecture/components/compactor/&#34;&gt;vertically compacted&lt;/a&gt; with all other overlapping two-hour blocks shipped from ingesters.&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;avoid-querying-non-compacted-blocks.png&#34;
  alt=&#34;Avoid querying non compacted blocks&#34;/&gt;&lt;/p&gt;
&lt;h2 id=&#34;store-gateway&#34;&gt;Store-gateway&lt;/h2&gt;
&lt;h3 id=&#34;ensure-caching-is-enabled-1&#34;&gt;Ensure caching is enabled&lt;/h3&gt;
&lt;p&gt;The store-gateway supports caching that reduces the number of API calls to the long-term storage and improves query performance.&lt;/p&gt;
&lt;p&gt;We recommend enabling caching in the store-gateway.
For more information about configuring the cache, refer to &lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;store-gateway&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;ensure-a-high-number-of-maximum-open-file-descriptors&#34;&gt;Ensure a high number of maximum open file descriptors&lt;/h3&gt;
&lt;p&gt;The store-gateway stores each block’s index-header on the local disk and loads it via memory mapping.
The store-gateway keeps a file descriptor open for each index-header loaded at a given time.
The total number of file descriptors used to load index-headers linearly increases with the number of blocks owned by the store-gateway instance.&lt;/p&gt;
&lt;p&gt;We recommend configuring the system&amp;rsquo;s &lt;code&gt;file-max&lt;/code&gt; ulimit at least to &lt;code&gt;65536&lt;/code&gt; to avoid reaching the maximum number of open file descriptors.&lt;/p&gt;
&lt;h3 id=&#34;store-gateway-disk-iops&#34;&gt;Store-gateway disk IOPS&lt;/h3&gt;
&lt;p&gt;The IOPS and latency of the store-gateway disk can affect queries.
The store-gateway downloads the block’s &lt;a href=&#34;../../../references/architecture/binary-index-header/&#34;&gt;index-headers&lt;/a&gt; onto local disk, and reads them for each query that needs to fetch data from the long-term storage.&lt;/p&gt;
&lt;p&gt;For these reasons, run the store-gateways on disks such as SSDs that have fast disk speed.&lt;/p&gt;
&lt;h2 id=&#34;compactor&#34;&gt;Compactor&lt;/h2&gt;
&lt;h3 id=&#34;ensure-the-compactor-has-enough-disk-space&#34;&gt;Ensure the compactor has enough disk space&lt;/h3&gt;
&lt;p&gt;The compactor requires a lot of disk space to download source blocks from the long-term storage and temporarily store the compacted block before uploading it to the storage.
For more information about required disk space, refer to &lt;a href=&#34;../../../references/architecture/components/compactor/#compactor-disk-utilization&#34;&gt;Compactor disk utilization&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;manage-capacity-for-large-tenants&#34;&gt;Manage capacity for large tenants&lt;/h3&gt;
&lt;p&gt;While working with large tenants, there are two compactor-specific settings to consider for planning or adjusting capacity:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-compactor.split-groups&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-compactor.split-and-merge-shards&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a best practice, use one shard per every 8 million series in a tenant, rounded to the nearest even number. For example, for a tenant with 100 million series, use approximately 12 shards.&lt;/p&gt;
&lt;p&gt;Additionally, as a best practice, set the number of split-groups to be the same as the shard count.&lt;/p&gt;
&lt;p&gt;Alternatively, if you&amp;rsquo;re using query sharding on the query frontend, use the next power of 2 to avoid extra load on the read path. For example, use 16 shards for a tenant with 100 million series.&lt;/p&gt;
&lt;p&gt;For more information about how these settings work, refer to 
    &lt;a href=&#34;/docs/mimir/v3.1.x/references/architecture/components/compactor/#compaction-algorithm&#34;&gt;Compaction algorithm&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;caching&#34;&gt;Caching&lt;/h2&gt;
&lt;h3 id=&#34;ensure-memcached-is-properly-scaled&#34;&gt;Ensure Memcached is properly scaled&lt;/h3&gt;
&lt;p&gt;We recommend ensuring Memcached evictions happen infrequently.
Grafana Mimir query performance might be negatively affected if your Memcached cluster evicts items frequently.
We recommend increasing your Memcached cluster replicas to add more memory to the cluster and reduce evictions.&lt;/p&gt;
&lt;p&gt;We also recommend running a dedicated Memcached cluster for each type of cache since each Mimir uses each differently and they scale differently.
Separation also isolates each cache from the others so that one type of cache entry doesn&amp;rsquo;t crowd out other entries and degrade performance.&lt;/p&gt;
&lt;p&gt;The metadata cache stores information about files in object storage, contents of auxiliary files such as bucket indexes, and discovered information about object storage such as lists of tenants.
This results in relatively low CPU and bandwidth usage.&lt;/p&gt;
&lt;p&gt;The query results cache to stores query responses.
Entries in this cache tend to be small and Mimir only fetches a few at a time.
This results in relatively low CPU and bandwidth usage.&lt;/p&gt;
&lt;p&gt;The index caches store portions of the TSDB index fetched from object storage.
Entries in this cache vary in size from a few hundred bytes to several megabytes.
Mimir fetches entries both individually and in batches.
A single query may fetch many entries from the cache.
This results in higher CPU usage compared to other caches.&lt;/p&gt;
&lt;p&gt;The chunks caches store portions of time series samples fetched from object storage.
Entries in this cache tend to be large (several kilobytes) and are fetched in batches by the store-gateway components.
This results in higher bandwidth usage compared to other caches.&lt;/p&gt;
&lt;h3 id=&#34;cache-size&#34;&gt;Cache size&lt;/h3&gt;
&lt;p&gt;Memcached &lt;a href=&#34;https://docs.memcached.org/features/flashstorage/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;extstore&lt;/a&gt; feature allows to extend Memcached’s memory space onto flash (or similar) storage.&lt;/p&gt;
&lt;p&gt;Refer to &lt;a href=&#34;/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/&#34;&gt;how we scaled Grafana Cloud Logs&amp;rsquo; Memcached cluster to 50TB and improved reliability&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;security&#34;&gt;Security&lt;/h2&gt;
&lt;p&gt;We recommend securing the Grafana Mimir cluster.
For more information about securing a Mimir cluster, refer to &lt;a href=&#34;../../secure/&#34;&gt;Secure Grafana Mimir&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;network&#34;&gt;Network&lt;/h2&gt;
&lt;p&gt;Most of the communication between Mimir components occurs over gRPC. The gRPC
connection does not use any compression by default.&lt;/p&gt;
&lt;p&gt;If network throughput is a concern or a high cost, then you can enable compression on the gRPC connection between
components. This will reduce the network throughput at the cost of increased CPU usage. You can choose between gzip and
snappy. Gzip provides better compression than snappy at the cost of more CPU usage.&lt;/p&gt;
&lt;p&gt;You can use the &lt;a href=&#34;http://quixdb.github.io/squash-benchmark/#results-table&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Squash Compression Benchmark&lt;/a&gt; to choose between snappy and gzip.
For protobuf data snappy achieves a compression ratio of 5 with compression speeds of
around 400MiB/s. For the same data gzip achieves a ratio between 6 and 8 with speeds between 50MiB/s and 135 MiB/s.&lt;/p&gt;
&lt;p&gt;To configure gRPC compression, use the following CLI flags or their YAML equivalents. The accepted values are
&lt;code&gt;snappy&lt;/code&gt; and &lt;code&gt;gzip&lt;/code&gt;. If you set the flag to an empty string (&lt;code&gt;&#39;&#39;&lt;/code&gt;), it explicitly disables compression.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;CLI flag&lt;/th&gt;
              &lt;th&gt;YAML option&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-query-frontend.grpc-client-config.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;alertmanager.alertmanager_client.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-query-scheduler.grpc-client-config.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;frontend.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-ruler.client.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;frontend_worker.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-ruler.query-frontend.grpc-client-config.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;ingester_client.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-alertmanager.alertmanager-client.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;query_scheduler.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-ingester.client.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;ruler.query_frontend.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;

&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;&lt;code&gt;-ruler.query-frontend.grpc-client-config.grpc-compression&lt;/code&gt; is only applicable when the ruler uses gRPC to communicate with the query-frontend. Refer to &lt;a href=&#34;../../../references/architecture/components/ruler/#remote-over-http-https&#34;&gt;Remote ruler mode&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h2 id=&#34;heavy-multi-tenancy&#34;&gt;Heavy multi-tenancy&lt;/h2&gt;
&lt;p&gt;For each tenant, Mimir opens and maintains a TSDB in memory. If you have a significant number of tenants, the memory overhead might become prohibitive.
To reduce the associated overhead, consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduce &lt;code&gt;-blocks-storage.tsdb.head-chunks-write-buffer-size-bytes&lt;/code&gt;, default &lt;code&gt;4MB&lt;/code&gt;. For example, try &lt;code&gt;1MB&lt;/code&gt; or &lt;code&gt;128KB&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Reduce &lt;code&gt;-blocks-storage.tsdb.stripe-size&lt;/code&gt;, default &lt;code&gt;16384&lt;/code&gt;. For example, try &lt;code&gt;256&lt;/code&gt;, or even &lt;code&gt;64&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Configure &lt;a href=&#34;/docs/mimir/latest/configure/configure-shuffle-sharding/&#34;&gt;shuffle sharding&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;periodic-latency-spikes-when-cutting-blocks&#34;&gt;Periodic latency spikes when cutting blocks&lt;/h2&gt;
&lt;p&gt;Depending on the workload, you might witness latency spikes when Mimir cuts blocks.
To reduce the impact of this behavior, consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Upgrade to &lt;code&gt;2.15&#43;&lt;/code&gt;. Refer to &lt;a href=&#34;https://github.com/grafana/mimir/commit/03f2f06e1247e997a0246d72f5c2c1fd9bd386df&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;https://github.com/grafana/mimir/commit/03f2f06e1247e997a0246d72f5c2c1fd9bd386df&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="grafana-mimir-production-tips">Grafana Mimir production tips&lt;/h1>
&lt;p>This topic provides tips and techniques for you to consider when setting up a production Grafana Mimir cluster.&lt;/p>
&lt;h2 id="ingester">Ingester&lt;/h2>
&lt;h3 id="ensure-a-high-maximum-number-of-open-file-descriptors">Ensure a high maximum number of open file descriptors&lt;/h3>
&lt;p>The ingester receives samples from distributor, and appends the received samples to the specific per-tenant TSDB that is stored on the ingester local disk.
The per-tenant TSDB is composed of several files and the ingester keeps a file descriptor open for each TSDB file.
The total number of file descriptors, used to load TSDB files, linearly increases with the number of tenants in the Grafana Mimir cluster and the configured &lt;code>-blocks-storage.tsdb.retention-period&lt;/code>.&lt;/p></description></item></channel></rss>