Performance best practices #258

bph · 2024-05-02T15:05:08Z

Discussed in #253

^{Originally posted by ironnysh April 25, 2024}
Topic: Website performance
Audience: extenders
Topics covered:

Tips
Best practices
How to run live tests using the Performance Team's custom Google Colab

Basically, nudging people to think about performance when they create themes and plugins.

Background:

Members of the WordPress Performance Team have been working on an introduction to gathering WordPress performance data in the field. This is provided in this Colab, as the Colab format is a great fit for this kind of content, since it allows providing BigQuery queries that can be run directly inside of it, and the results can be presented alongside the queries. Colab covers both potential core use-cases as well as plugin or theme use-cases.

See this Make post (Conducting WordPress performance research in the field) and this Slack discussion on the Training Team channel for more information.

thelovekesh · 2024-05-09T07:23:08Z

cc @WordPress/performance.

swissspidy · 2024-05-09T10:09:41Z

Could you perhaps elaborate a bit more on the goal of this proposed article? It seems to mix 3 different topics:

Performance best practices
Think about performance during plugin/theme development
Field analysis using the Colab

ironnysh · 2024-05-09T11:55:55Z

Hi @swissspidy, there's a follow up to the original suggestion that didn't make it here 🙂

Update: I’ve been chatting about this with @felixarntz, who suggested breaking it into a series:
Regarding the topics, I think it would be good to cover individual performance aspects in separate articles. What you're describing would be very broad for one article. What might work well is a series, where one intro article paints a holistic overview, and then the other articles go into depth.

joemcgill · 2024-05-09T15:48:23Z

I think it would be great to include some articles about performance best practices to the dev blog. I agree that it would be better to write individual articles targeted at specific topics, rather than a general article about "best practices" broadly. Some examples of what I think could be useful (just brainstorming, here)...

Using async/defer to improve JavaScript loading
Using autoload and wp_prime_option_caches to improve option loading time
Serving images using modern formats
Debugging long-running JS to improve INP
Tips for using the Cache API

Are these the kind of ideas that you all have in mind?

dmsnell · 2024-05-09T19:21:05Z

Any time people discuss performance I like to encourage the discussion of tradeoffs and the role of real-world measurement. I'm aware how attentively many people will read a small suggestion online and draw the conclusion that they must follow these steps to the letter or else they are some kind of "bad developer." There's an incredible power of suggestion implicit in official posts.

And that relates to performance because performance work can be incredibly complicated. Our own WordPress coding standards demand at least one unfortunate code pattern in the name of performance based on a misunderstanding how modern CPUs work. There are frequent poor decisions (with respect to maintainability, legibility, and even performance) made in the name of performance, coupled with an over-eager desire to improve things without sufficiently measuring the impact of the changes.

That is to say, it can be helpful to avoid the term "best practices" but in the case that it's used, to include alongside examples of things that work well, to include examples of attempted optimizations that failed to do what they were supposed to, and an explanation of why. Caching is a great topic to discuss because it's very easy to impede performance by adding caching, and even easier to corrupt or break software while accidentally including defects in the caching code, or the invalidation code.

To that end I generally love reading about having a mindset of performance and learning very technical details about a hardware platform so as to build a strong intuition of what matters with regards to memory overhead, runtime performance, and system complexity. This is possibly a slower and less practical path, but can help avoid the impression that "performance" is a checklist of things people ought to do when coding.

And most importantly, it cannot be stressed enough how important comprehensive and realistic measurement and testing is when thinking about performance. Micro- and synthetic benchmarks are a scourge on the software world by giving the false impression that systems are simpler than they are. Probably as often as they confirm the improvement of a given optimization they will hide that the optimization made the software slower. Measurements should lead us to ask new questions, and those new questions should lead to more refined experiments, and those experiments will likely raise more questions. Performance must be pondered and measurements must be interpreted.

As a basic example of this, I have recently been conducting performance measurements of the HTML API against a list of 300,000 HTML pages extracted from a list of the top-ranked domains. at one point it became clear to me that these pages possibly represent a large selection bias, in that the top-level path of a given domain likely has characteristically different content than links inside of that site. because of this I have been extracting same-domain links and will run my analysis off of the other category of pages and compare the results between the groups. it could be that my initial measurements, extensive as they were, do not show the most relevant part of the picture.

Similarly during the 6.4 release cycle I uncovered a number of failed optimization attempts, including those I had proposed. Under specific measurements of the lines of code themselves was the impression that they were effective, but when examining the system as a whole there was no statistically significant evidence to suggest the same. The optimizations shuffled around some runtime, but ultimately didn't impact it, and the complexity they introduced was not worthwhile. Modern computers are amazing and complicated machines, which makes performance analysis even more difficult than it already was.

Performance is very hard. Thank you for writing about it and for wanting to help educate us all on how to be effective when building our software. There is a lot of low-hanging fruit to teach, but even the low-hanging fruit comes with a nuanced story that's probably more important to understand than what the fruit itself is. Not "when does this improve performance," but "when does this optimization fail us and how?"

aristath · 2024-05-10T07:59:58Z

@dmsnell

Our own WordPress coding standards demand at least one unfortunate code pattern in the name of performance based on a misunderstanding how modern CPUs work.

Could you please elaborate and expand a bit on that? If we're doing something sub-optimal in the coding standards, we could fix it...

OllieJones · 2024-05-10T13:23:44Z

@dmsnell That's a great writeup about the whys and whynots of performance optimization work.

May I suggest you work it up into an article for the dev blog, per @joemcgill's suggestion. When people ask me performance questions I would looooove to be able to suggest, "hey read this before you spend time and money on trying to improve performance".

(My pet concern: WordPress site owners throw money overprovisioning their servers and then get annoyed because it doesn't help.)

I'll be happy to help get this published any way I can.

dmsnell · 2024-05-10T15:50:37Z

Thank you both for the responses.

@aristath I was specifically referring to the pre-increment rule, which lists performance as a reason. it surprised me to read this, and the only reference I could find for the rationale was a comment in the PHP docs from years ago. except even a trivial benchmark can demonstrate this slowing things down just as others show it speeding up when compared to the post-increment. where it matters is that pre-increment is a far-less-common code pattern across the software world I believe, and it really sticks out in WordPress. while I don't care so much to debate this one rule here, I use it as an example because we don't have an evidential reason to believe the claim we make.

modern CPUs perform a lot of work when running a program in order to provide optimizations not possible on a compiler level. I'm guessing that any speedup in the pre-increment/post-increment debate is entirely lost in any real code. if we run a loop that does nothing but increment a value ten million times, we may discover a measured difference, but the moment we introduce any other code we're entering a complicated runtime where the hot-path for that code has shifted. we're trying to optimize the 0.0000000001% - no not even that - we're trying to optimize the part that executes in parallel with the hot path and finishes in 0.0000000001% of the time, meaning that taking its runtime to zero would have no impact on the system.

the HTML API takes advantage of CPU characteristics and abstraction leaks into the PHP code for some significant speedups. these include opening up the data flow for parallel and out-of-order execution (removing data dependencies), optimizing CPU cache locality (by replacing indirect memory accesses with localized ones), and considering the impact of speculative execution on the control flow. the HTML API is, of course, not emblematic of most of WordPress' performance needs because it's a very low-level and foundational API whose performance absolutely matters at the micro level.

@OllieJones not too long ago I wrote about some of the discoveries that arose during the 6.4 performance assessments I was performing.

https://fluffyandflakey.blog/2023/10/20/profiling-wordpress-v1/

here's a couple of posts discussing failed optimizations

https://fluffyandflakey.blog/2018/06/30/collapsing-regexp-alternations/
https://fluffyandflakey.blog/2015/07/09/panderings-in-median-calculations/

and much further back I shared a story about over-eager optimizations on a 12s WordPress site

https://fluffyandflakey.blog/2015/03/22/wordpress-init/

Sometimes I worry that all the caching we're adding into WordPress is making our benchmarks perform faster on the high-end isolated machines with which we're developing and testing, while making the more typical experience on low-end shared servers worse. Caching can alleviate slow computation costs or file I/O, but it also introduces latency into the mix, often including adding database calls where none were there before, often resulting in cache misses or invalidations where we end up doing the same file I/O anyway, but now only after having introduced sequential dependencies in the loop. I haven't measured this yet (because despite trying, it's hard to find a low-end WordPress host that lets me instrument and testing things. I'm hoping and working towards using the Playground to simulate hosts of different characteristics); so until there are measurements, my idea is just another unsubstantiated opinion.

This all said, there's one thing we can't harp on enough: go faster by doing less. Additive optimizations are always complicated and risk unintentional consequences. If, however, we can identify code patterns where we're performing needless work and throwing it away, we can remove the duplicated or wasted work without worrying as much about whether our changes are effective or not. Even when the measurements don't support the improvements, as long as the code maintainability isn't impacted these things can be winners without confirmation. Often times it's these little things that don't show up in the metrics that form the "death of a thousand cuts" on performance. Examples include:

needless memory allocations. this not only bloats the memory requirements, but in some environments can hit harder by increasing the pressure on the garbage collector or by increasing memory fragmentation and making the allocator work harder.

eagerly performing computations that aren't used before returning from a function or before checking some bail-out condition after a loop has completed.

processing entire documents when only some of it needs to be. a quintessential example of this is using something like preg_replace_callback(), preg_split(), or string.replace(/pattern/g) when we could build a loop to find each successive match in a string and bail once we've found what we're looking for. we currently do some silly things with the block parser because we don't have a lazy parser available (though that's changing).

needless sequential computation and data dependencies. whether making API calls sequentially when they could be done in parallel, making multiple database calls when one would do, or iterating twice over an array to perform two transformation steps when both could be combined into a single pass, we often repeat ourselves out of a first-order convenience. sadly both PHP and JS have a poor performance story for their functions like array_map/Array.map and array_filter/Array.filter.

in all of this work I'm constantly fascinated by how significant are the performance bottlenecks which never show up in a trace or a profile. the biggest performance gains come from changes that are not the slow parts of the system. profiling is a helpful tool, but it can only show us what is there; we want to see what's not there, where massive improvements can occur by avoiding computation. in other words, a profile will never show wasted computation - it can only show where the computation spends its time.

I'd be happy to try and collaborate on any dev posts. I'm afraid it'd be hard for me to commit to too much unless there are very clear expectations and limits on how much should be in a given post. like performance work, it's also time consuming to write about performance 😉

the performance team has been doing some good work and I think it could be instructive to examine some of their changes and discuss them. I'll think about my role in this; thanks for the suggestion.

abhansnuk · 2024-06-06T13:56:23Z

Thanks to everyone working on this import topic for the blog. Just something to throw in: when you get to this phase, a focused headline and a clear excerpt on what this post will be focusing will really help.

Looking forward to the ongoing progress on this post and it being available. Thanks.

bph added flow: approved can move forward Backend Performance labels May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance best practices #258

Performance best practices #258

bph commented May 2, 2024

thelovekesh commented May 9, 2024

swissspidy commented May 9, 2024

ironnysh commented May 9, 2024 •

edited

joemcgill commented May 9, 2024

dmsnell commented May 9, 2024

aristath commented May 10, 2024

OllieJones commented May 10, 2024

dmsnell commented May 10, 2024

abhansnuk commented Jun 6, 2024

Performance best practices #258

Performance best practices #258

Comments

bph commented May 2, 2024

Discussed in #253

thelovekesh commented May 9, 2024

swissspidy commented May 9, 2024

ironnysh commented May 9, 2024 • edited

joemcgill commented May 9, 2024

dmsnell commented May 9, 2024

aristath commented May 10, 2024

OllieJones commented May 10, 2024

dmsnell commented May 10, 2024

abhansnuk commented Jun 6, 2024

ironnysh commented May 9, 2024 •

edited