<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Troubleshoot Tempo on Grafana Labs</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/</link><description>Recent content in Troubleshoot Tempo on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/tempo/v2.2.x/troubleshooting/index.xml" rel="self" type="application/rss+xml"/><item><title>Distributor refusing spans</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/max-trace-limit-reached/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/max-trace-limit-reached/</guid><content><![CDATA[&lt;h1 id=&#34;distributor-refusing-spans&#34;&gt;Distributor refusing spans&lt;/h1&gt;
&lt;p&gt;The two most likely causes of refused spans are unhealthy ingesters or trace limits being exceeded.&lt;/p&gt;
&lt;h2 id=&#34;unhealthy-ingesters&#34;&gt;Unhealthy ingesters&lt;/h2&gt;
&lt;p&gt;Unhealthy ingesters can be caused by failing OOMs, or scale down events.
If you have unhealthy ingesters, your log line will look something like this:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;msg=&amp;#34;pusher failed to consume trace data&amp;#34; err=&amp;#34;at least 2 live replicas required, could only find 1&amp;#34;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In this case you may need to visit the ingester &lt;a href=&#34;../../operations/consistent_hash_ring/&#34;&gt;ring page&lt;/a&gt; at &lt;code&gt;/ingester/ring&lt;/code&gt; on the Distributors
and &amp;ldquo;Forget&amp;rdquo; the unhealthy ingesters. This will work in the short term, but the long term fix is to stabilize your ingesters.&lt;/p&gt;
&lt;h2 id=&#34;trace-limits-reached&#34;&gt;Trace limits reached&lt;/h2&gt;
&lt;p&gt;In high volume tracing environments the default trace limits are sometimes not sufficient. These limits exist to protect Tempo
and prevent it from OOMing, crashing or otherwise allow tenants to not DOS each other. If you are refusing spans due to limits you
will see logs like this at the distributor:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;msg=&amp;#34;pusher failed to consume trace data&amp;#34; err=&amp;#34;rpc error: code = FailedPrecondition desc = TRACE_TOO_LARGE: max size of trace (52428800) exceeded while adding 15632 bytes to trace a0fbd6f9ac5e2077d90a19551dd67b6f for tenant single-tenant&amp;#34;
msg=&amp;#34;pusher failed to consume trace data&amp;#34; err=&amp;#34;rpc error: code = FailedPrecondition desc = LIVE_TRACES_EXCEEDED: max live traces per tenant exceeded: per-user traces limit (local: 60000 global: 0 actual local: 60000) exceeded&amp;#34;
msg=&amp;#34;pusher failed to consume trace data&amp;#34; err=&amp;#34;rpc error: code = ResourceExhausted desc = RATE_LIMITED: ingestion rate limit (15000000 bytes) exceeded while adding 10 bytes&amp;#34;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You will also see the following metric incremented. The &lt;code&gt;reason&lt;/code&gt; label on this metric will contain information about the refused reason.&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;tempo_discarded_spans_total&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In this case use available configuration options to &lt;a href=&#34;../../configuration/#ingestion-limits&#34;&gt;increase limits&lt;/a&gt;.&lt;/p&gt;
]]></content><description>&lt;h1 id="distributor-refusing-spans">Distributor refusing spans&lt;/h1>
&lt;p>The two most likely causes of refused spans are unhealthy ingesters or trace limits being exceeded.&lt;/p>
&lt;h2 id="unhealthy-ingesters">Unhealthy ingesters&lt;/h2>
&lt;p>Unhealthy ingesters can be caused by failing OOMs, or scale down events.
If you have unhealthy ingesters, your log line will look something like this:&lt;/p></description></item><item><title>Troubleshoot Grafana Agent</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/agent/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/agent/</guid><content><![CDATA[&lt;h1 id=&#34;troubleshoot-grafana-agent&#34;&gt;Troubleshoot Grafana Agent&lt;/h1&gt;
&lt;p&gt;Sometimes it can be difficult to tell what, if anything, the Grafana Agent is sending along to the backend. This document focuses
on a few techniques to gain visibility on how many traces are being pushed to the Agent and if they&amp;rsquo;re making it to the
backend. The tracing pipeline is built on top of the &lt;a href=&#34;https://github.com/open-telemetry/opentelemetry-collector&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;OpenTelemetry Collector&lt;/a&gt; which
does a fantastic job of logging network and other issues.&lt;/p&gt;
&lt;p&gt;If your logs are showing no obvious errors try the following:&lt;/p&gt;
&lt;h2 id=&#34;metrics&#34;&gt;Metrics&lt;/h2&gt;
&lt;p&gt;The agent publishes a few Prometheus metrics that are useful to determine how much trace traffic it is receiving and successfully forwarding. These
are a good place to start when diagnosing tracing Agent issues.&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;traces_receiver_accepted_spans
traces_receiver_refused_spans
traces_exporter_sent_spans
traces_exporter_send_failed_spans&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;automatic-logging&#34;&gt;Automatic logging&lt;/h2&gt;
&lt;p&gt;If metrics and logs are looking good, but you are still unable to find traces in Grafana Cloud, you can turn on &lt;a href=&#34;../../configuration/grafana-agent/automatic-logging/&#34;&gt;Automatic Logging&lt;/a&gt;. A recommend debug setup is:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;traces:
  configs:
  - name: default
    ...
    automatic_logging:
      backend: stdout
      roots: true&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This will emit logs to stdout for every root span that the Agent forwards. This can be useful to see exactly which traces are being forwarded to Grafana
Cloud.&lt;/p&gt;
]]></content><description>&lt;h1 id="troubleshoot-grafana-agent">Troubleshoot Grafana Agent&lt;/h1>
&lt;p>Sometimes it can be difficult to tell what, if anything, the Grafana Agent is sending along to the backend. This document focuses
on a few techniques to gain visibility on how many traces are being pushed to the Agent and if they&amp;rsquo;re making it to the
backend. The tracing pipeline is built on top of the &lt;a href="https://github.com/open-telemetry/opentelemetry-collector" target="_blank" rel="noopener noreferrer">OpenTelemetry Collector&lt;/a> which
does a fantastic job of logging network and other issues.&lt;/p></description></item><item><title>Unable to find traces</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/unable-to-see-trace/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/unable-to-see-trace/</guid><content><![CDATA[&lt;h1 id=&#34;unable-to-find-traces&#34;&gt;Unable to find traces&lt;/h1&gt;
&lt;p&gt;The two main causes of missing traces are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Issues in ingestion of the data into Tempo. Spans are either not being sent correctly to Tempo or they are not getting sampled.&lt;/li&gt;
&lt;li&gt;Issues querying for traces that have been received by Tempo.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;section-1-diagnose-and-fix-ingestion-issues&#34;&gt;Section 1: Diagnose and fix ingestion issues&lt;/h2&gt;
&lt;p&gt;The first step is to check whether the application spans are actually reaching Tempo.&lt;/p&gt;
&lt;p&gt;Add the following flag to the distributor container - &lt;a href=&#34;https://github.com/grafana/tempo/blob/57da4f3fd5d2966e13a39d27dbed4342af6a857a/modules/distributor/config.go#L55&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;&lt;code&gt;distributor.log-received-traces&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This flag enables debug logging of all the traces received by the distributor. These logs can help check if Tempo is receiving any traces at all.&lt;/p&gt;
&lt;p&gt;You can also check the following metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tempo_distributor_spans_received_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tempo_ingester_traces_created_total&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The value of both metrics should be greater than &lt;code&gt;0&lt;/code&gt; within a few minutes of the application spinning up.
You can check both metrics using either&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The metrics page exposed from Tempo at &lt;code&gt;http://&amp;lt;tempo-address&amp;gt;:&amp;lt;tempo-http-port&amp;gt;/metrics&lt;/code&gt; or&lt;/li&gt;
&lt;li&gt;In Prometheus, if it is being used to scrape metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;case-1---tempo_distributor_spans_received_total-is-0&#34;&gt;Case 1 - tempo_distributor_spans_received_total is 0&lt;/h3&gt;
&lt;p&gt;If the value of &lt;code&gt;tempo_distributor_spans_received_total&lt;/code&gt; is 0, possible reasons are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use of incorrect protocol/port combination while initializing the tracer in the application.&lt;/li&gt;
&lt;li&gt;Tracing records not getting picked up to send to Tempo by the internal sampler.&lt;/li&gt;
&lt;li&gt;Application is running inside docker and sending traces to an incorrect endpoint.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Receiver specific traffic information can also be obtained using &lt;code&gt;tempo_receiver_accepted_spans&lt;/code&gt; which has a label for the receiver (protocol used for ingestion. Ex: &lt;code&gt;jaeger-thrift&lt;/code&gt;).&lt;/p&gt;
&lt;h3 id=&#34;solutions&#34;&gt;Solutions&lt;/h3&gt;
&lt;p&gt;There are three possible solutions: protocol or port problems, sampling issues, or incorrect endpoints.&lt;/p&gt;
&lt;p&gt;To fix protocol or port problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Find out which communication protocol is being used by the application to emit traces. This is unique to every client SDK. For instance: Jaeger Golang Client uses &lt;code&gt;Thrift Compact over UDP&lt;/code&gt; by default.&lt;/li&gt;
&lt;li&gt;Check the list of supported protocols and their ports and ensure that the correct combination is being used.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To fix sampling issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;These issues can be tricky to determine because most SDKs use a probabilistic sampler by default. This may lead to just one in a 1000 records being picked up.&lt;/li&gt;
&lt;li&gt;Check the sampling configuration of the tracer being initialized in the application and make sure it has a high sampling rate.&lt;/li&gt;
&lt;li&gt;Some clients also provide metrics on the number of spans reported from the application, for example &lt;code&gt;jaeger_tracer_reporter_spans_total&lt;/code&gt;. Check the value of that metric if available and make sure it is greater than zero.&lt;/li&gt;
&lt;li&gt;Another way to diagnose this problem would be to generate lots and lots of traces to see if some records make their way to Tempo.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To fix an incorrect endpoint issue:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the application is also running inside docker, make sure the application is sending traces to the correct endpoint (&lt;code&gt;tempo:&amp;lt;receiver-port&amp;gt;&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;case-2---tempo_ingester_traces_created_total-is-0&#34;&gt;Case 2 - tempo_ingester_traces_created_total is 0&lt;/h2&gt;
&lt;p&gt;If the value of &lt;code&gt;tempo_ingester_traces_created_total&lt;/code&gt; is 0, this can indicate network issues between distributors and ingesters.&lt;/p&gt;
&lt;p&gt;Checking the metric &lt;code&gt;tempo_request_duration_seconds_count{route=&#39;/tempopb.Pusher/Push&#39;}&lt;/code&gt; exposed from the ingester which indicates that it is receiving ingestion requests from the distributor.&lt;/p&gt;
&lt;h3 id=&#34;solution&#34;&gt;Solution&lt;/h3&gt;
&lt;p&gt;Check logs of distributors for a message like &lt;code&gt;msg=&amp;quot;pusher failed to consume trace data&amp;quot; err=&amp;quot;DoBatch: IngesterCount &amp;lt;= 0&amp;quot;&lt;/code&gt;.
This is likely because no ingester is joining the gossip ring, make sure the same gossip ring address is supplied to the distributors and ingesters.&lt;/p&gt;
&lt;h2 id=&#34;diagnose-and-fix-sampling-and-limits-issues&#34;&gt;Diagnose and fix sampling and limits issues&lt;/h2&gt;
&lt;p&gt;If you are able to query some traces in Tempo but not others, you have come to the right section!&lt;/p&gt;
&lt;p&gt;This could happen because of a number of reasons and some have been detailed in this blog post -
&lt;a href=&#34;/blog/2020/07/09/where-did-all-my-spans-go-a-guide-to-diagnosing-dropped-spans-in-jaeger-distributed-tracing/&#34;&gt;Where did all my spans go? A guide to diagnosing dropped spans in Jaeger distributed tracing&lt;/a&gt;.
This is useful if you are using the Jaeger Agent.&lt;/p&gt;
&lt;p&gt;If you are using the Grafana Agent, continue reading the following section for metrics to monitor.&lt;/p&gt;
&lt;h3 id=&#34;diagnose-the-issue&#34;&gt;Diagnose the issue&lt;/h3&gt;
&lt;p&gt;Check if the pipeline is dropping spans. The following metrics on Grafana Agent help determine this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tempo_exporter_send_failed_spans&lt;/code&gt;. The value of this metric should be 0.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tempo_receiver_refused_spans&lt;/code&gt;. This value of this metric should be 0.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tempo_processor_dropped_spans&lt;/code&gt;. The value of this metric should be 0.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the pipeline is not reporting any dropped spans, check whether application spans are being dropped by Tempo. The following metrics help determine this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;tempo_receiver_refused_spans&lt;/code&gt;. The value of &lt;code&gt;tempo_receiver_refused_spans&lt;/code&gt; should be 0.&lt;/p&gt;
&lt;p&gt;Grafana Agent and Tempo share the same metric. Make sure to check the value of the metric from both services.
If the value of &lt;code&gt;tempo_receiver_refused_spans&lt;/code&gt; is greater than 0, then the possible reason is the application spans are being dropped due to rate limiting.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;solution-1&#34;&gt;Solution&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;If the pipeline (Grafana Agent) is found to be dropping spans, the deployment may need to be scaled up. Look for a message like &lt;code&gt;too few agents compared to the ingestion rate&lt;/code&gt; in the agent logs.&lt;/li&gt;
&lt;li&gt;There might also be issues with connectivity to Tempo backend, check the agent for logs like &lt;code&gt;error sending batch, will retry&lt;/code&gt; and make sure the Tempo endpoint and credentials are correctly configured.&lt;/li&gt;
&lt;li&gt;If Tempo is found to be dropping spans, then the possible reason is the application spans are being dropped due to rate limiting.
The rate limiting may be appropriate and does not need to be fixed. The metric simply explained the cause of the missing spans, and there is nothing more to be done.&lt;/li&gt;
&lt;li&gt;If more ingestion volume is needed, increase the configuration for the rate limiting, by adding this CLI flag to Tempo at startup - &lt;a href=&#34;https://github.com/grafana/tempo/blob/78f3554ca30bd5a4dec01629b8b7b2b0b2b489be/modules/overrides/limits.go#L42&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;https://github.com/grafana/tempo/blob/78f3554ca30bd5a4dec01629b8b7b2b0b2b489be/modules/overrides/limits.go#L42&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;Check the &lt;a href=&#34;../../configuration/#ingestion-limits&#34;&gt;ingestion limits page&lt;/a&gt; for further information on limits.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h2 id=&#34;section-3-diagnose-and-fix-issues-with-querying-traces&#34;&gt;Section 3: Diagnose and fix issues with querying traces&lt;/h2&gt;
&lt;p&gt;If you have determined that data has been ingested correctly into Tempo, then it is time to investigate possible issues with querying the data.&lt;/p&gt;
&lt;p&gt;Check the logs of the Tempo Query Frontend. The Query Frontend pod runs with two containers (Query Frontend and Tempo Query), so lets use the following command to view Query Frontend logs -&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;kubectl logs -f pod/query-frontend-xxxxx -c query-frontend&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The presence of the following errors in the log may explain issues with querying traces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;level=info ts=XXXXXXX caller=frontend.go:63 method=GET traceID=XXXXXXXXX url=/api/traces/XXXXXXXXX duration=5m41.729449877s status=500&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;no org id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;could not dial 10.X.X.X:3200 connection refused&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tenant-id not found&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Possible reasons for the above errors are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tempo Querier is not connected to Tempo Query Frontend. Check the value of the metric &lt;code&gt;cortex_query_frontend_connected_clients&lt;/code&gt; exposed by the Query Frontend.
It should be &amp;gt; 0, which indicates that Queriers are connected to the Query Frontend.&lt;/li&gt;
&lt;li&gt;Grafana Tempo data source is not configured to pass &lt;code&gt;tenant-id&lt;/code&gt; in the Authorization header (only applicable to multi-tenant deployments).&lt;/li&gt;
&lt;li&gt;Not connected to Tempo Querier correctly.&lt;/li&gt;
&lt;li&gt;Insufficient permissions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;solutions-1&#34;&gt;Solutions&lt;/h3&gt;
&lt;p&gt;To fix connection issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the queriers are not connected to the Query Frontend, check the following section in Querier configuration and make sure the address of the Query Frontend is correct

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;querier:
  frontend_worker:
    frontend_address: query-frontend-discovery.default.svc.cluster.local:9095&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- &gt;  - Verify the `backend.yaml` configuration file present on the Tempo Query container and make sure it is attempting to connect to the right port of the query frontend.
    **Note** this is only relevant for [Grafana 7.4.x and before](https://grafana.com/docs/tempo/latest/configuration/querying/#grafana-74x).
    --&gt;
&lt;ul&gt;
&lt;li&gt;Confirm that the Grafana data source is configured correctly and debug network issues between Grafana and Tempo.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To fix an insufficient permissions issue:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Verify that the Querier has the LIST and GET permissions on the bucket.&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="unable-to-find-traces">Unable to find traces&lt;/h1>
&lt;p>The two main causes of missing traces are:&lt;/p>
&lt;ul>
&lt;li>Issues in ingestion of the data into Tempo. Spans are either not being sent correctly to Tempo or they are not getting sampled.&lt;/li>
&lt;li>Issues querying for traces that have been received by Tempo.&lt;/li>
&lt;/ul>
&lt;h2 id="section-1-diagnose-and-fix-ingestion-issues">Section 1: Diagnose and fix ingestion issues&lt;/h2>
&lt;p>The first step is to check whether the application spans are actually reaching Tempo.&lt;/p></description></item><item><title>Too many jobs in the queue</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/too-many-jobs-in-queue/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/too-many-jobs-in-queue/</guid><content><![CDATA[&lt;h1 id=&#34;too-many-jobs-in-the-queue&#34;&gt;Too many jobs in the queue&lt;/h1&gt;
&lt;p&gt;The error message might also be&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;queue doesn&#39;t have room for 100 jobs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;failed to add a job to work queue&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You may see this error if the compactor isn’t running and the blocklist size has exploded.
Possible reasons why the compactor may not be running are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Insufficient permissions.&lt;/li&gt;
&lt;li&gt;Compactor sitting idle because no block is hashing to it.&lt;/li&gt;
&lt;li&gt;Incorrect configuration settings.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;diagnosing-the-issue&#34;&gt;Diagnosing the issue&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Check metric &lt;code&gt;tempodb_compaction_bytes_written_total&lt;/code&gt;
If this is greater than zero (0), it means the compactor is running and writing to the backend.&lt;/li&gt;
&lt;li&gt;Check metric &lt;code&gt;tempodb_compaction_errors_total&lt;/code&gt;
If this metric is greater than zero (0), check the logs of the compactor for an error message.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Verify that the Compactor has the LIST, GET, PUT, and DELETE permissions on the bucket objects.
&lt;ul&gt;
&lt;li&gt;If these permissions are missing, assign them to the compactor container.&lt;/li&gt;
&lt;li&gt;For detailed information, check - &lt;a href=&#34;/docs/tempo/latest/configuration/s3/#permissions&#34;&gt;https://grafana.com/docs/tempo/latest/configuration/s3/#permissions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If there’s a compactor sitting idle while others are running, port-forward to the compactor’s http endpoint. Then go to &lt;code&gt;/compactor/ring&lt;/code&gt; and click &lt;strong&gt;Forget&lt;/strong&gt; on the inactive compactor.&lt;/li&gt;
&lt;li&gt;Check the following configuration parameters to ensure that there are correct settings:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;max_block_bytes&lt;/code&gt; to determine when the ingester cuts blocks. A good number is anywhere from 100MB to 2GB depending on the workload.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_compaction_objects&lt;/code&gt; to determine the max number of objects in a compacted block. This should relatively high, generally in the millions.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;retention_duration&lt;/code&gt; for how long traces should be retained in the backend.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Check the storage section of the config and increase &lt;code&gt;queue_depth&lt;/code&gt;. Do bear in mind that a deeper queue could mean longer
waiting times for query responses. Adjust &lt;code&gt;max_workers&lt;/code&gt; accordingly, which configures the number of parallel workers
that query backend blocks.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;storage:
  trace:
    pool:
      max_workers: 100                 # worker pool determines the number of parallel requests to the object store backend
      queue_depth: 10000&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="too-many-jobs-in-the-queue">Too many jobs in the queue&lt;/h1>
&lt;p>The error message might also be&lt;/p>
&lt;ul>
&lt;li>&lt;code>queue doesn't have room for 100 jobs&lt;/code>&lt;/li>
&lt;li>&lt;code>failed to add a job to work queue&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>You may see this error if the compactor isn’t running and the blocklist size has exploded.
Possible reasons why the compactor may not be running are:&lt;/p></description></item><item><title>Bad blocks</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/bad-blocks/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/bad-blocks/</guid><content><![CDATA[&lt;h1 id=&#34;bad-blocks&#34;&gt;Bad blocks&lt;/h1&gt;
&lt;p&gt;Queries fail with an error message containing:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;error querying store in Querier.FindTraceByID: error using pageFinder (1, 5927cbfb-aabe-48b2-9df5-f4c3302d915f): ...&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This might indicate that there is a bad (corrupted) block in the backend.&lt;/p&gt;
&lt;p&gt;A block can get corrupted if the ingester crashed while flushing the block to the backend.&lt;/p&gt;
&lt;h2 id=&#34;fixing-bad-blocks&#34;&gt;Fixing bad blocks&lt;/h2&gt;
&lt;p&gt;At the moment, a backend block can be fixed if either the index or bloom-filter is corrupt/deleted.&lt;/p&gt;
&lt;p&gt;To fix such a block, first download it onto a machine where you can run the &lt;code&gt;tempo-cli&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next run the &lt;code&gt;tempo-cli&lt;/code&gt;&amp;rsquo;s &lt;code&gt;gen index&lt;/code&gt; / &lt;code&gt;gen bloom&lt;/code&gt; commands depending on which file is corrupt/deleted.
The command will create a fresh index/bloom-filter from the data file at the required location (in the block folder).
To view all of the options for this command, see the &lt;a href=&#34;../../operations/tempo_cli/&#34;&gt;cli docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Finally, upload the generated index or bloom-filter onto the object store backend under the folder for the block.&lt;/p&gt;
&lt;h2 id=&#34;removing-bad-blocks&#34;&gt;Removing bad blocks&lt;/h2&gt;
&lt;p&gt;If the above step on fixing bad blocks reveals that the data file is corrupt, the only remaining solution is to delete
the block, which can result in some loss of data.&lt;/p&gt;
&lt;p&gt;The mechanism to remove a block from the backend is backend-specific, but the block to remove will be at:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;&amp;lt;tenant ID&amp;gt;/&amp;lt;block ID&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="bad-blocks">Bad blocks&lt;/h1>
&lt;p>Queries fail with an error message containing:&lt;/p>
&lt;div class="code-snippet code-snippet__mini">&lt;div class="lang-toolbar__mini">
&lt;span class="code-clipboard">
&lt;button x-data="app_code_snippet()" x-init="init()" @click="copy()">
&lt;img class="code-clipboard__icon" src="/media/images/icons/icon-copy-small-2.svg" alt="Copy code to clipboard" width="14" height="13">
&lt;span>Copy&lt;/span>
&lt;/button>
&lt;/span>
&lt;/div>&lt;div class="code-snippet code-snippet__border">
&lt;pre data-expanded="false">&lt;code class="language-none">error querying store in Querier.FindTraceByID: error using pageFinder (1, 5927cbfb-aabe-48b2-9df5-f4c3302d915f): ...&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>This might indicate that there is a bad (corrupted) block in the backend.&lt;/p></description></item><item><title>Tag search</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/search-tag/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/search-tag/</guid><content><![CDATA[&lt;h1 id=&#34;tag-search&#34;&gt;Tag search&lt;/h1&gt;
&lt;p&gt;An issue occurs while searching for traces in Grafana Explore. The &lt;strong&gt;Service Name&lt;/strong&gt; and &lt;strong&gt;Span Name&lt;/strong&gt; drop down lists are empty, and there is a &lt;code&gt;No options found&lt;/code&gt; message.&lt;/p&gt;
&lt;p&gt;HTTP requests to Tempo query frontend endpoint at &lt;code&gt;/api/search/tag/service.name/values&lt;/code&gt; would respond with an empty set.&lt;/p&gt;
&lt;h2 id=&#34;root-cause&#34;&gt;Root cause&lt;/h2&gt;
&lt;p&gt;The introduction of a cap on the size of tags causes this issue.&lt;/p&gt;
&lt;p&gt;Configuration parameter &lt;code&gt;max_bytes_per_tag_values_query&lt;/code&gt; causes the return of an empty result
when a query exceeds the configured value.&lt;/p&gt;
&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;
&lt;p&gt;There are two main solutions to this issue:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduce the cardinality of tags pushed to Tempo. Reducing the number of unique tag values will reduce the size returned by a tag search query.&lt;/li&gt;
&lt;li&gt;Increase the &lt;code&gt;max_bytes_per_tag_values_query&lt;/code&gt; parameter in the &lt;a href=&#34;../../configuration/#overrides&#34;&gt;overrides&lt;/a&gt; block of your Tempo configuration to a value as high as 50MB.&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="tag-search">Tag search&lt;/h1>
&lt;p>An issue occurs while searching for traces in Grafana Explore. The &lt;strong>Service Name&lt;/strong> and &lt;strong>Span Name&lt;/strong> drop down lists are empty, and there is a &lt;code>No options found&lt;/code> message.&lt;/p></description></item><item><title>Response larger than the max</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/response-too-large/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/response-too-large/</guid><content><![CDATA[&lt;h1 id=&#34;response-larger-than-the-max&#34;&gt;Response larger than the max&lt;/h1&gt;
&lt;p&gt;The error message will take a similar form to the following:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;500 Internal Server Error Body: response larger than the max (&amp;lt;size&amp;gt; vs &amp;lt;limit&amp;gt;)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This error indicates that the response received or sent is too large.
This can happen in multiple places, but it&amp;rsquo;s most commonly seen in the query path,
with messages between the querier and the query frontend.&lt;/p&gt;
&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;
&lt;h3 id=&#34;tempo-server-general&#34;&gt;Tempo server (general)&lt;/h3&gt;
&lt;p&gt;Tempo components communicate with each other via gRPC requests.
To increase the maximum message size, you can increase the gRPC message size limit in the server block.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;server:
  grpc_server_max_recv_msg_size: &amp;lt;size&amp;gt;
  grpc_server_max_send_msg_size: &amp;lt;size&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The server config block is not synchronized across components.
Most likely you will need to increase the message size limit in multiple components.&lt;/p&gt;
&lt;h3 id=&#34;querier&#34;&gt;Querier&lt;/h3&gt;
&lt;p&gt;Additionally, querier workers can be configured to use a larger message size limit.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;querier:
    frontend_worker:
        grpc_client_config:
            max_send_msg_size: &amp;lt;size&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;ingestion&#34;&gt;Ingestion&lt;/h3&gt;
&lt;p&gt;Lastly, message size is also limited in ingestion and can be modified in the distributor block.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;distributor:
  receivers:
    otlp:
      grpc:
        max_recv_msg_size_mib: &amp;lt;size&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="response-larger-than-the-max">Response larger than the max&lt;/h1>
&lt;p>The error message will take a similar form to the following:&lt;/p>
&lt;div class="code-snippet code-snippet__mini">&lt;div class="lang-toolbar__mini">
&lt;span class="code-clipboard">
&lt;button x-data="app_code_snippet()" x-init="init()" @click="copy()">
&lt;img class="code-clipboard__icon" src="/media/images/icons/icon-copy-small-2.svg" alt="Copy code to clipboard" width="14" height="13">
&lt;span>Copy&lt;/span>
&lt;/button>
&lt;/span>
&lt;/div>&lt;div class="code-snippet code-snippet__border">
&lt;pre data-expanded="false">&lt;code class="language-none">500 Internal Server Error Body: response larger than the max (&amp;lt;size&amp;gt; vs &amp;lt;limit&amp;gt;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>This error indicates that the response received or sent is too large.
This can happen in multiple places, but it&amp;rsquo;s most commonly seen in the query path,
with messages between the querier and the query frontend.&lt;/p></description></item><item><title>Troubleshoot metrics-generator</title><link>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/metrics-generator/</link><pubDate>Fri, 03 Apr 2026 12:35:46 -0500</pubDate><guid>https://grafana.com/docs/tempo/v2.2.x/troubleshooting/metrics-generator/</guid><content><![CDATA[&lt;h1 id=&#34;troubleshoot-metrics-generator&#34;&gt;Troubleshoot metrics-generator&lt;/h1&gt;
&lt;p&gt;If you are concerned with data quality issues in the metrics-generator, we&amp;rsquo;d first recommend:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reviewing your telemetry pipeline to determine the number of dropped spans. We are only looking for major issues here.&lt;/li&gt;
&lt;li&gt;Reviewing the &lt;a href=&#34;../../metrics-generator/service_graphs/&#34;&gt;service graph documentation&lt;/a&gt; to understand how they are built.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If everything seems ok from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.&lt;/p&gt;
&lt;h2 id=&#34;all-metrics&#34;&gt;All metrics&lt;/h2&gt;
&lt;h3 id=&#34;dropped-spans-in-the-distributor&#34;&gt;Dropped spans in the distributor&lt;/h3&gt;
&lt;p&gt;The distributor has a queue of outgoing spans to the metrics-generators. If that queue is full then the distributor
will drop spans before they reach the generator. Use the following metric to determine if that is happening:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_distributor_queue_pushes_failures_total{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;failed-pushes-to-the-generator&#34;&gt;Failed pushes to the generator&lt;/h3&gt;
&lt;p&gt;For any number of reasons, the distributor can fail a push to the generators. Use the following metric to
determine if that is happening:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_distributor_metrics_generator_pushes_failures_total{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;discarded-spans-in-the-generator&#34;&gt;Discarded spans in the generator&lt;/h3&gt;
&lt;p&gt;Spans are rejected from being considered by the metrics-generator by a configurable slack time as well as due to user
configurable filters. You can see the number of spans rejected by reason using this metric:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_spans_discarded_total{}[1m])) by (reason)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If a lot of spans are dropped in the metrics-generator due to your filters, you will need to adjust them. If spans are dropped
due to the ingestion slack time, consider adjusting this setting:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;metrics_generator:
  metrics_ingestion_time_range_slack: 30s&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If spans are regularly exceeding this value you may want to consider reviewing your tracing pipeline to see if you have excessive buffering.
Note that increasing this value allows the generator to consume more spans, but does reduce the accuracy of metrics because spans farther
away from &amp;ldquo;now&amp;rdquo; are included.&lt;/p&gt;
&lt;h3 id=&#34;max-active-series&#34;&gt;Max active series&lt;/h3&gt;
&lt;p&gt;The generator protects itself and your remote-write target by having a maximum number of series the generator produces.
Use the &lt;code&gt;sum&lt;/code&gt; below to determine if series are being dropped due to this limit:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_registry_series_limited_total{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Use the following setting to update the limit:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;overrides:
  metrics_generator_max_active_series: 0&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note that this value is per metrics generator. The actual max series remote written will be &lt;code&gt;&amp;lt;# of metrics generators&amp;gt; * &amp;lt;metrics_generator_max_active_series&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;remote-write-failures&#34;&gt;Remote write failures&lt;/h3&gt;
&lt;p&gt;For any number of reasons, the generator may fail a write to the remote write target. Use the following metrics to
determine if that is happening:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(prometheus_remote_storage_samples_failed_total{}[1m]))
sum(rate(prometheus_remote_storage_samples_dropped_total{}[1m]))
sum(rate(prometheus_remote_storage_exemplars_failed_total{}[1m]))
sum(rate(prometheus_remote_storage_exemplars_dropped_total{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;service-graph-metrics&#34;&gt;Service graph metrics&lt;/h2&gt;
&lt;p&gt;Service graphs have additional configuration which can impact the quality of the output metrics.&lt;/p&gt;
&lt;h3 id=&#34;expired-edges&#34;&gt;Expired edges&lt;/h3&gt;
&lt;p&gt;The following metrics can be used to determine how many edges are failing to find a match.&lt;/p&gt;
&lt;p&gt;Rate of edges that have expired without a match:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_processor_service_graphs_expired_edges{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Rate of all edges:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_processor_service_graphs_edges{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you are seeing a large number of edges expire without a match, consider adjusting the &lt;code&gt;wait&lt;/code&gt; setting. This
controls how long the metrics generator waits to find a match before it gives up.&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;metrics_generator:
  processor:
    service_graphs:
      wait: 10s&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;service-graph-max-items&#34;&gt;Service graph max items&lt;/h3&gt;
&lt;p&gt;The service graph processor has a maximum number of edges it will track at once to limit the total amount of memory the processor uses.
To determine if edges are being dropped due to this limit, check:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_processor_service_graphs_dropped_spans{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Use &lt;code&gt;max_items&lt;/code&gt; to adjust the maximum amount of edges tracked:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;metrics_generator:
  processor:
    service_graphs:
      max_items: 10000&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="troubleshoot-metrics-generator">Troubleshoot metrics-generator&lt;/h1>
&lt;p>If you are concerned with data quality issues in the metrics-generator, we&amp;rsquo;d first recommend:&lt;/p>
&lt;ul>
&lt;li>Reviewing your telemetry pipeline to determine the number of dropped spans. We are only looking for major issues here.&lt;/li>
&lt;li>Reviewing the &lt;a href="../../metrics-generator/service_graphs/">service graph documentation&lt;/a> to understand how they are built.&lt;/li>
&lt;/ul>
&lt;p>If everything seems ok from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.&lt;/p></description></item></channel></rss>