PDC agent metrics
The PDC agent exposes Prometheus-compatible metrics for monitoring and alerting. By default, you can access these metrics at http://<agent-host>:8090/metrics. You can change the metrics port using the -metrics-addr flag. You can also disable metric parsing from SSH logs with the -parse-metrics flag.
Note
For Grafana Cloud-side PDC metrics such as connected agent count and request duration, refer to View PDC activity in the
grafanacloud-usagedata source.
Scrape the metrics
To collect PDC agent metrics, configure a Prometheus-compatible scraper to target the agent’s metrics endpoint. The following example shows a Grafana Alloy configuration:
prometheus.scrape "pdc_agent" {
targets = [
{"__address__" = "<agent-host>:8090"},
]
forward_to = [prometheus.remote_write.default.receiver]
}Replace <agent-host> with the hostname or IP address of the machine running the PDC agent. If you changed the metrics port with -metrics-addr, update the port accordingly.
Available metrics
The metrics include counters, gauges, and native histograms that provide insight into the agent’s behavior, including:
- SSH connection count and duration
- TCP connection counts
- Signing request latency
- Restart counts with exit codes
- Agent version information
The PDC agent exposes the following metrics:
| Metric name | Type | Description | Labels |
|---|---|---|---|
pdc_agent_agent_info | Gauge | Set to 1 with labels identifying the agent build | version, ssh_version, stack_id |
pdc_agent_signing_requests_duration_seconds | Native histogram | Duration of signing requests in seconds | status |
pdc_agent_ssh_connections | Gauge | Number of open SSH connections | none |
pdc_agent_ssh_restarts_total | Counter | Total number of SSH restarts | connection, exit_code |
pdc_agent_ssh_open_channels | Gauge | Number of open SSH channels | connection |
pdc_agent_tcp_connections_total | Counter | Number of opened TCP connections | connection, target, status |
pdc_agent_ssh_time_to_connect_seconds | Native histogram | Time spent to establish SSH connection | connection |
The connection label
The connection label appears on several metrics and identifies which parallel SSH connection emitted the metric. It corresponds to the -connections flag. When running with the default single connection, the label value is 0. If you increase -connections to 3, you see values 0, 1, and 2.
Use cases and example queries
The following examples show how to use PDC agent metrics for monitoring, alerting, and troubleshooting.
Confirm agent availability
Use pdc_agent_agent_info to verify that a PDC agent is running and to identify its version:
pdc_agent_agent_infoA result of 1 for each agent instance confirms it is up. The version and ssh_version labels help you verify that agents are running the expected software versions.
Track SSH restart rate
A high rate of SSH restarts can indicate network instability or server-side issues. Use the exit_code label to break down restarts by cause:
sum by (exit_code) (rate(pdc_agent_ssh_restarts_total[5m]))Refer to PDC agent exit codes for details on what each exit code means.
Monitor open SSH channels
A rising number of open channels can indicate that an agent is becoming overloaded. The troubleshooting guide recommends monitoring CPU usage as the primary indicator, and open channels provides a complementary signal:
pdc_agent_ssh_open_channelsTrack TCP connection success and failure rates
Use the status label to compare successful and failed TCP connections to your data source targets:
sum by (target, status) (rate(pdc_agent_tcp_connections_total[5m]))A high failure rate for a specific target suggests the data source is unreachable from the agent. Refer to the troubleshooting guide for common causes.
Measure signing request latency
Track the p99 latency of certificate signing requests to detect API performance issues:
histogram_quantile(0.99, rate(pdc_agent_signing_requests_duration_seconds[5m]))Monitor SSH connection establishment time
Track how long it takes the agent to establish an SSH connection to the PDC server. Elevated connection times can indicate network latency or congestion:
histogram_quantile(0.99, rate(pdc_agent_ssh_time_to_connect_seconds[5m]))Was this page helpful?
Related resources from Grafana Labs


