Services dashboard

Time settings, including the overall time range and granularity of aggregations, can be changed in UTC time button in the top left.

Dashboard tiles:

Availability
- (# Availability test failures) / (# Availability test runs)
- Availability tests are run every 5 minutes by the health-monitor-timer-func in web-workers. The test verifies that we can:
  1. Ping the health endpoint to check that the service is available
  2. Submit a scan request
  3. Get the status of the submitted scan request
  4. Get the scan report once the scan completes
- If any of these steps fails (either an http error code or an internal scan failure), an availability failure is sent. Otherwise, once all steps complete, an availability success is sent.
- If an availability test exceeds the amount of time between tests, the next availability test will start running in parallel when the timer trigger is fired.
Reliability
- (# Requests failed with 5xx status code) / (# Total requests)
- This is calculated using all requests, including those generated by our availability tests and user requests
Performance
- Scan performance as a percentage of cases where the scanExecutionTime (see notes on Scan duration graph) did not exceed the target
Scan duration (seconds)
- scanExecutionTime: The amount of time it took for the scan to run (starting when the batch worker begins the scan)
- scanWaitTime: The time from when the scan was submitted to when the batch worker started the scan
- scanTotalTime: The time from when the scan was submitted to when the scan completed and the report became available (scanExecutionTime + scanWaitTime)
API Response Time (milliseconds)
- Average response time for each Azure function (including web-workers) during the time range
API Request Count
- Total number of requests for each Azure function (including web-workers) during the time range
API Failed Request Count
- Sum of failures by all Azure functions (including web-workers) during the time range
- Failure counts are aggregated over the set time granularity
API Response Status Code Count
- Total number of API HTTP response status code by category during the time range
Scan Request Count by pipeline stages
- Total number of scan request by service pipeline stages during the time range
Scan Request Count Accepted vs Rejected
- The total number of submitted scan requests, as well as how many were accepted or rejected (for invalid urls)
- Counts are aggregated over the set time granularity
Batch Scan Task States
- The total number of Batch scan task completed and succeeded
- Note: If there are no failures for the given time range, ScanTaskFailure may not appear in the legend
Batch Scan Task Duration (sec)
- The Batch scan task duration (seconds). Include task waiting, execution, and total time
Batch Account Node Count
- The Batch account pools node count by state during the time range. Can be used to check batch pool availability
Batch Account Task States
- The Batch account pools task events. Include task start, complete, and fail events.
E2E Test Results
- E2E test results for each service pipeline stage
E2E Test Results Over Time
- E2E test results during the time range

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dashboard.md

dashboard.md

Services dashboard

Dashboard tiles:

Files

dashboard.md

Latest commit

History

dashboard.md

File metadata and controls

Services dashboard

Dashboard tiles: