Performance testing next steps #1093

swissspidy · 2024-03-26T21:51:41Z

Last year we made several big improvements in the performance testing area. To name a few:

Migrate core performance tests to Playwright
Added support for taking measurements using web-vitals.js and Lighthouse
- Started measuring TTFB and LCP
Extended core performance tests
Looked into ways to make test results more stable (without much luck)
- Performance Metrics Stabilization #849
Made all these utilities reusable

As a reminder, the two main objectives we had:

Improve core/GB performance testing
Get other projects to adopt performance testing by making the tooling easier and more reusable

While progress slowed down due to other priorities, there is still a lot of untapped potential this year. I am therefore opening this issue to kickstart a discussion on next steps.

Some enhancements/features that are already on our radar:

Core
Gutenberg
- Measure more frontend metrics
  - Better demo content, measure Server-Timing, etc.
Stability
- Further look into this, try to see if Playground could be an option
- Maybe build a more powerful dashboard using Grafana or similar
Adoption
- GitHub Action
  - Consider moving to the WordPress GitHub org to make it more official
  - Support Blueprints for more realistic setup
- Maybe provide an all-in-one npm package for running tests & doing comparisons (making it even easier to use)

Some loose ideas:

Adoption
- Run performance tests at scale using Tide?
- Blog post on developer.wordpress.org/news
- Directly reach out to a few bigger plugins to help them set up performance testing, setting a precedent for others

Curious to hear everyone's thoughts :)

The text was updated successfully, but these errors were encountered:

joemcgill · 2024-03-29T20:50:26Z

Thanks for kicking this discussion off, @swissspidy.

I appreciate the distinction between two main objectives. For now, I'm going to limit my thoughts to the first objective, "Improve Core/GB performance testing".

One of the things that I observed during the 6.5 release is that finding the source of a server timing regression was challenging when that regression was committed to the Core repo as part of a larger Gutenberg sync. I think we can improve this somewhat by updating the performance tests in the Gutenberg repo to include the same server timing metrics that we record for each commit to the WP Core repo (i.e., wpTotal, wpBeforeTemplate, and wpTemplate). While TTFB is a close proximity, the additional noise from network requests and calculating the metric in a headless browser makes pinpointing potential regressions more difficult.

I also strongly agree with your suggestion to improve our demo content, and think this applies as much to the Core tests as the Gutenberg tests. Currently in Core, we are only testing the default homepage for the Twenty Twenty-One and Twenty Twenty-Three themes after importing a Theme Test Data from this commit. While this keeps test content consistent over time, there are a number of limitations to this approach including the fact that some specific use cases that we care about (e.g., measuring the effect of image optimizations on LCP) are not covered by our current test content.

From a visualization point of view, our current dashboards at codevitals.run have become harder to use over time as we've added additional metrics. I'd love to investigate improving or replacing these dashboards with a system where we could more granularly filter results by metric, theme, template, object cache, etc. In doing so, we should also evaluate how we are normalizing and storing the raw data which currently gets normalized before it's saved to the dashboard's database meaning we don't have the ability to build new reports using the original unfiltered data.

Last idea for now, is that when we introduced these tests we used Twenty Twenty-One and Twenty Twenty-Three as representative themes for classic and block themes, but that has proven to be overly simplistic as some performance regressions are only visible based on characteristics of a theme (e.g., how many template part variations they register). At minimum, we should add Twenty Twenty-Four to our test matrix in both repos.

TL;DR:

Add server timing metrics to the GB performance workflow
Improve demo content for the performance workflow in both repos
Add specific use cases (e.g., homepage, page with large LCP image, etc) to our workflows for both repos
Improve the dashboard for tracking performance over time so it's more user friendly
Improve data collection process to make raw data queryable in the future
Add tests for Twenty Twenty-Four

swissspidy · 2024-04-03T14:15:59Z

That all makes sense to me. I think most of those suggestions are already mentioned in some place or another. I also previously explored a Grafana-based dashboard that would be more user friendly and could be fed raw data.

Some more thoughts on adoption:

Let's start with directly reaching out to some top plugins that we think could benefit from performance testing but don't leverage that yet. Find out why, figure out what's missing, and help them get started.
This way, we can iterate on the tooling before publishing another blog post or improving the GitHub Action.

joemcgill · 2024-05-06T17:30:41Z

@swissspidy, I think it would be great to add a summary page to Performance Metrics in the GB repo, similar to what we have in WP-dev, so the potential performance impact of a PR can be spot-checked. As an example, here are PRs for basically the same change in both the GB repo and the WP-dev repo:

Do you already have any plans to try to apply updates to the GB Performance tests, or should I open independent issues over there?

swissspidy · 2024-05-06T18:51:46Z

Sure that can be done together with the Server-Timing enhancements etc., bringing the learnings from core to GB. A dedicated issue there sounds good.

joemcgill · 2024-05-06T19:19:22Z

@swissspidy – WordPress/gutenberg#61411

swissspidy · 2024-05-07T19:00:52Z

WordPress/gutenberg#61450

swissspidy added [Focus] Measurement Issues related to the Measurement focus area Needs Discussion Anything that needs a discussion/agreement labels Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance testing next steps #1093

Performance testing next steps #1093

swissspidy commented Mar 26, 2024

joemcgill commented Mar 29, 2024

swissspidy commented Apr 3, 2024

joemcgill commented May 6, 2024

swissspidy commented May 6, 2024

joemcgill commented May 6, 2024

swissspidy commented May 7, 2024

Performance testing next steps #1093

Performance testing next steps #1093

Comments

swissspidy commented Mar 26, 2024

joemcgill commented Mar 29, 2024

swissspidy commented Apr 3, 2024

joemcgill commented May 6, 2024

swissspidy commented May 6, 2024

joemcgill commented May 6, 2024

swissspidy commented May 7, 2024