New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve branch prediction #2140
base: master
Are you sure you want to change the base?
Conversation
10bc5bb
to
9f45a68
Compare
Sounds promising. I will review shortly. |
Maybe you'll have an idea why with adding these |
@saleyn My interpretation is that the compiler fails to convince itself that the variable is initialized. I would happily ignore this static analysis, but then we will get endless "bug reports" about it, so we better silence these warnings ourselves. |
Silencing a warning in GCC can be done without changing the code, although that can be more trouble than it is worth. |
@saleyn We can manually silence the |
I am travelling till mid March, and will unlikely be able to look into this. I think the issue with failing actions is that somewhere else I missed to initialize the argument in the |
On an x64 processor (Ice Lake) and GCC 12, I find that this PR is maybe slightly worse on Your results appear to indicate better performance for this PR in the DOM tests, but your clock speed is not the same. You do report fewer instructions, which is a good thing. Let me run DOM tests on this system... (x64/Ice Lake) and GCC 12. This PR (repeated 3 times)
Main branch (repeated 3 times)
Analysis (DOM)Just look at stage 2 (DOM), I get that the main branch speed is in the interval 5.11GB/s to 5.23 GB/s. The PR is in the interval 5.16 GB/s to 5.20 GB/s... using three trials. So this PR is anywhere between 2% better or 1% worse. When in doubt, I tend to go with the instruction count and the PR seems to reduce it by 0.8%. So I tend to believe that this PR is a win, although a small win. kostya ondemand (this PR)
kostya ondemand (main branch)
Analysis (ondemand)Focusing solely on one ondemand benchmark (kostya), this PR seems an overall negative. Recommendation (tentative)I think that if this PR focused solely on the DOM code, it would easier to consider it. As things stand, I don't have the data to convince myself that there are gains with On Demand benchmarks. It is possible that this PR creates a slight performance regression on On Demand. Of course, maybe there is a methodological issue. Risks : I should stress that results may vary depending the exact compiler and processor. We cannot rule out that this PR could be a net negative when using different systems, even with the DOM parser. So some caution is required. |
This PR adds some
unlikely
compiler hints to improve branch predition.For some reason in cases when SIMDJSON_TRY is used the compiler complatined about uninitialized variables, so explicit initialization was added.
The performance results:
master
branch: