Prometheus Configuration
To enable emitting of Prometheus metrics, add the following configuration to your configuration file:
server:
prometheusPort: 9090
Available Prometheus Metrics
remote_invocations
Counter for the number of invocations of the capabilities service
expired_key
Counter for key expirations
execution_success
Counter for the number of successful executions
pre_queue_size
Gauge of a number of items in prequeue
cas_miss
Counter for number of CAS misses from worker-worker
queue_failure
Counter for number of operations that failed to queue
requeue_failure
Counter for number of operations that failed to requeue
dispatched_operations_size
Gauge of the number of dispatched operations
worker_pool_size
Gauge of the number of workers available
storage_worker_pool_size
Gauge of the number of storage workers available
execute_worker_pool_size
Gauge of the number of execute workers available.
queue_size
Gauge of the size of the queue (using a queue_name label for each individual queue)
actions
Counter for the number of actions processed
operations_stage_load
Counter for the number of operations in each stage (using a stage_name
for each individual stage)
operation_status
Counter for the completed operations status (using a code
label for each individual GRPC status code)
operation_exit_code
Counter for the completed operations exit code (using an exit_code
label for each individual execution exit code)
operation_worker
Counter for the number of operations executed on each worker (using a worker_name
label for each individual worker)
action_results
Counter for the number of action results
missing_blobs
Histogram for the number of missing blobs
execution_slot_usage
Gauge for the number of execution slots used on each worker
execution_time_ms
Histogram for the execution time on a worker (in milliseconds)
execution_stall_time_ms
Histogram for the execution stall time on a worker (in milliseconds)
input_fetch_slot_usage
Gauge for the number of input fetch slots used on each worker
input_fetch_time_ms
Histogram for the input fetch time on a worker (in milliseconds)
input_fetch_stall_time_ms
Histogram for the input fetch stall time on a worker (in milliseconds)
queued_time_ms
Histogram for the operation queued time (in milliseconds)
output_upload_time_ms
Histogram for the output upload time (in milliseconds)
completed_operations
Counter for the number of completed operations
operation_poller
Counter for the number of operations being polled
io_bytes
Histogram for the bytes read/written to get system I/O
health_check
Counter showing service restarts
cas_size
Gauge of total size of the worker’s CAS in bytes
cas_ttl_s
Histogram for amount of time CAS entries live on L1 storage before expiration (seconds)
cas_entry_count
Gauge of the total number of entries in the worker’s CAS
Java interceptors can be used to monitor Grpc services using Prometheus. To enable these metrics, add the following configuration to your server:
server:
grpcMetrics:
enabled: true
provideLatencyHistograms: false