Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 461 Bytes

README.md

File metadata and controls

36 lines (23 loc) · 461 Bytes

Prometheus Monitoring for Training Job

Available Metrics

Currently available metrics to monitor are listed below.

Job Creation

training_operator_jobs_created_total

Job Deletion

training_operator_jobs_deleted_total

Successful Job Completions

training_operator_jobs_successful_total

Failed Jobs

training_operator_jobs_failed_total

Restarted Jobs

training_operator_jobs_restarted_total