Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric Accuracy #68

Open
sergeychernyshev opened this issue Nov 14, 2022 · 5 comments
Open

Metric Accuracy #68

sergeychernyshev opened this issue Nov 14, 2022 · 5 comments

Comments

@sergeychernyshev
Copy link

I am wondering how accurate is this metric?

My initial tests on a very simplistic prototype were OK, but now I am trying to test real website using WebPageTest and am seeing significant discrepancies between TTVC measured using this library and Visually Complete metric recorded by WebPageTest (using screen captures).

The TTVC metric is usually larger ranging from ~30% to ~300% higher than the time stamp recorded using screenshots in a few tests that I manually ran.

Is there an existing methodology that I can use to assess the accuracy of this technique?

Additionally, are there any known aspects of the page implementation that are particularly prone to throwing this metric off?

@kbuffington
Copy link

Define what you mean by accurate?

This TTVC library will look for any visible change on the page, and wait for all network requests to finish (for 2s) before marking TTVC as complete (at the point the last visible change occurred). Unless there's an error somewhere in the logic, you could say it's completely accurate, to that defined metric.

Using a Visually Complete metric from WPT or SpeedCurve that compares screenshots to determine when a visible change last occurred will often return completely different results than this one. If a single pixel on the page changes, TTVC will pick it up (whether you consider that a note-worthy visible change) where a screenshot comparison may not. It's all in how you define the metric, and what they're looking for. Since the two are measuring different things (and presumably have different thresholds), don't be surprised if they return different values.

The big advantage of this library is that you can run it on every page load for every user and get highly accurate (to the metric) logging of what your TTVC is. That's obviously not possible using a screenshot comparison tool.

@sergeychernyshev
Copy link
Author

Can you elaborate how waiting for 2s for all network requests to finish is related to visual completeness?

My understanding was that it only waits for downloading of images that are visible on the screen, am I wrong about it?

@kbuffington
Copy link

Because it's impossible to tell if those network requests will result in a visible change. Say a JS file was requested, or a JSON file was returned as the response from an API call. The code on the page could draw something in the viewable window as a response. So what the TTVC library does is it records anytime anything was changed in the viewport (stuff drawn below the fold doesn't count) and it saves that as the last visible change, and then it waits for network and CPU idle. If two seconds elapse without any new files requested or API calls being made, then that last visible change it had recorded earlier is then marked as visually complete in performance.mark. If network requests continue to be made, it keeps waiting for that idle period. As long as nothing changes in the viewport, eventually that original visually complete value will be marked.

If something does change, then the last visible change is bumped to the current time, and it goes back to waiting for network/CPU idle for 2 more seconds.

What this means is that if your website has a really dumb waterfall like this, the TTVC would be calculated like this:

  • 0.5s - HTML returned
  • 1.0s - JS finished executing
  • 1.5s - All content and images drawn to viewport
  • 2.0s - Start POSTing to 20 different logging API endpoints, 1 per second, but nothing is drawn to the screen
  • 3.0s-22.0s - logging API calls completed
  • 24s - No new network activity for 2s, so TTVC is marked as 1.5s, the last time anything was drawn in the viewport.

@sergeychernyshev
Copy link
Author

Ahh. Looks like I misunderstood you - the waiting for network only counts toward waiting for potential changes to viewport, not towards the metric itself.

That makes sense - determining when to stop waiting is definitely a challenge, often times I see this being a delicate balance for sending the beacons, on one hand you want to collect as much data as possible, on another, you don't want to loose too many beacons.

This does not however explain the numbers that I see where TTVC is marked much higher than the last screenshot times. Is there an explanation for that? e.g. why would anything be marked as updated in TTVC world if no pixels changes were registered by WPT?

@ajhyndman
Copy link
Member

Hey, thanks for your interest @sergeychernyshev!

I think you're right to trust a screenshot-based analysis over this implementation. We are always going to be working within the confines of the APIs that browser developers expose to us. (I think it would be awesome if browser vendors decide to support this metric natively.)

If I had to guess, the most likely reason that @dropbox/ttvc would report a later time than a screenshot analysis is that the page in question is being mutated in ways that don't impact pixels on the screen. We do a pretty good job at filtering out mutations that affect elements outside of the viewport. But if, for example, a developer attaches an extra classname to the body tag, it's very hard to rule out that something visual changed.

You can enable debug logging to verify exactly what visual changes we are picking up, using this init option!

import {init} from '@dropbox/ttvc';

init({ debug: true });

To try and get as accurate as possible, we have built out a set of test suites, here:

https://github.com/dropbox/ttvc/tree/main/test/e2e

Looking through this might give you a sense of what scenarios we cover. If you can identify any new problem scenarios and want to sketch out a new test or two here for us, that would be most welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants