Skip to content

Latest commit

 

History

History
87 lines (52 loc) · 3.54 KB

File metadata and controls

87 lines (52 loc) · 3.54 KB

Services dashboard

Time settings, including the overall time range and granularity of aggregations, can be changed in UTC time button in the top left.

Dashboard tiles:

  • Availability

    • (# Availability test failures) / (# Availability test runs)
    • Availability tests are run every 5 minutes by the health-monitor-timer-func in web-workers. The test verifies that we can:
      1. Ping the health endpoint to check that the service is available
      2. Submit a scan request
      3. Get the status of the submitted scan request
      4. Get the scan report once the scan completes
    • If any of these steps fails (either an http error code or an internal scan failure), an availability failure is sent. Otherwise, once all steps complete, an availability success is sent.
    • If an availability test exceeds the amount of time between tests, the next availability test will start running in parallel when the timer trigger is fired.
  • Reliability

    • (# Requests failed with 5xx status code) / (# Total requests)
    • This is calculated using all requests, including those generated by our availability tests and user requests
  • Performance

    • Scan performance as a percentage of cases where the scanExecutionTime (see notes on Scan duration graph) did not exceed the target
  • Scan duration (seconds)

    • scanExecutionTime: The amount of time it took for the scan to run (starting when the batch worker begins the scan)
    • scanWaitTime: The time from when the scan was submitted to when the batch worker started the scan
    • scanTotalTime: The time from when the scan was submitted to when the scan completed and the report became available (scanExecutionTime + scanWaitTime)
  • API Response Time (milliseconds)

    • Average response time for each Azure function (including web-workers) during the time range
  • API Request Count

    • Total number of requests for each Azure function (including web-workers) during the time range
  • API Failed Request Count

    • Sum of failures by all Azure functions (including web-workers) during the time range
    • Failure counts are aggregated over the set time granularity
  • API Response Status Code Count

    • Total number of API HTTP response status code by category during the time range
  • Scan Request Count by pipeline stages

    • Total number of scan request by service pipeline stages during the time range
  • Scan Request Count Accepted vs Rejected

    • The total number of submitted scan requests, as well as how many were accepted or rejected (for invalid urls)
    • Counts are aggregated over the set time granularity
  • Batch Scan Task States

    • The total number of Batch scan task completed and succeeded
    • Note: If there are no failures for the given time range, ScanTaskFailure may not appear in the legend
  • Batch Scan Task Duration (sec)

    • The Batch scan task duration (seconds). Include task waiting, execution, and total time
  • Batch Account Node Count

    • The Batch account pools node count by state during the time range. Can be used to check batch pool availability
  • Batch Account Task States

    • The Batch account pools task events. Include task start, complete, and fail events.
  • E2E Test Results

    • E2E test results for each service pipeline stage
  • E2E Test Results Over Time

    • E2E test results during the time range