[EXPERIMENT] Branch mispredictions in twitter.json #2061
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an accounting of what causes our various branch misses in stage 2. I took twitter.json and transformed it via a series of steps into a single file with a single array and only empty strings where each scalar was in the original. I did this such that the size was always identical, values in roughly the same position, and (with the exception of array nesting removal) the number of structurals and size of the output from stage 2 identical.
The raw results from icelake are here, but here's the rundown. These numbers assume that branch misses from all these sources are completely independent, which probably isn't the case, but probably isn't completely wrong, either. Fascinatingly to me, this does seem to be a complete list: removing all of these sources of misprediction brings branch misses down from 773 to 3.
(*) Of note is that a file where all numbers are replaced with 18-digit integers (or the same for 8-digit) seems to be just about as unpredictable as a file where the numbers have all sorts of different sizes. 1-digit numbers do not have this issue. I'm not sure why this is.