-
Notifications
You must be signed in to change notification settings - Fork 681
Address MDN performance issue #6606
Comments
Here are the deployments that happened around the time that this issue cropped up:
The issue definitely was not happening prior to February 6th, and definitely was happening by February 13th. If the regression was caused by a change to our code, it must have been a change in this range: f48f3fd...7778679 |
That range covers the following individual commits. I've struck through ones that absolutely could not have contributed to this regression.
Of particular note, we know that 669ec0a (Switch to async GA tag) cannot be at issue because we reverted it and re-deployed last Wednesday, and the performance regression is still present in our data. |
That leaves us with the much more manageable list of the following 9 commits that we can't trivially exclude:
|
Confirming that yes, #6601 didn't have a performance impact. Thanks Dan for divining into this and making this issue much more actionable. What's the suggestion for next steps here? Do we pick the things that that are easily revertible and try them first? We need at least 1 full day of data to verify the impact. |
There's a pretty good chance we could exclude half of the commits above, too, but they just require a slightly closer read than that first pass :) I think the next step is to have two devs hop on Zoom, synchronously read those commits, and come up with a prioritized list of ones we should try reverting. Unfortunately, this is unlikely to happen this week: Django 2 has been merged and we need to get it deployed ASAP so we're ready for the security patch that's landing Wednesday morning. And then we'll probably need a few days to see if Django 2 itself changes our baseline performance before we can start testing those commits 🙁 |
That schedule works for me. It's important we find the issue and fix it, so we don't end up compounding performance regressions, but it's not time sensitive. |
Gregor and I spent half an hour reading each commit line-by-line, and there's nothing that we can conceive of impacting frontend performance in any way. The Stripe (d477a40) and jQuery UI (ec74256) commits were the only ones to really touch the frontend in any meaningful way, but both are constrained to the Wiki. At this point, we think the regression is unlikely to have been caused by code changes in Kuma. |
I don't know where else to look, but perhaps existing Waffle flags or settings were changed in the Django Admin, maybe there was a new deploy of the Interactive Examples service? Or something else entirely? |
Hmm, we've had something in the soapbox for the last 2 weeks roughly. Maybe that? |
Could be! Is that scheduled to end any time soon? |
NOTE: This is measuring Time to Interactive Examples, a custom metric that we track, not the more generic "TTI" metric in, say, Lighthouse. |
@atopal will check on soapbox and turn it off wed 3/11. |
okay, we have an alternative explanation: Corona |
Summary
time to interactive examples for 90th percentile has regressed by .3s some time after February 7.
Steps To Reproduce (STR)
[see 90th percentile data](MDN time to interactive examples for 90th percentile has regressed by .3s)
Actual behavior
90th percentile is at 3.7s
Expected behavior
90th percentile should be at 3.4s
Additional context
Is there anything else we should know?
The text was updated successfully, but these errors were encountered: