Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memchr vs stringzilla performance comparison #718

Open
RoloEdits opened this issue Feb 24, 2024 · 2 comments
Open

memchr vs stringzilla performance comparison #718

RoloEdits opened this issue Feb 24, 2024 · 2 comments
Labels
enhancement help wanted optimization Issues related to reducing time needed to parse XML or to memory consumption

Comments

@RoloEdits
Copy link

Came across a benchmarking comparison between the two.

Notably the results:

ASCII ⏩ ASCII ⏪ UTF8 ⏩ UTF8 ⏪
Intel:
memchr 5.89 GB/s 1.08 GB/s 8.73 GB/s 3.35 GB/s
stringzilla 8.37 GB/s 8.21 GB/s 11.21 GB/s 11.20 GB/s
Arm:
memchr 6.38 GB/s 1.12 GB/s 13.20 GB/s 3.56 GB/s
stringzilla 6.56 GB/s 5.56 GB/s 9.41 GB/s 8.17 GB/s
Average 1.2x faster 6.2x faster - 2.8x faster

Its noted that that rust crate doesn't cover the full c++ api, but that it is planned to do so eventually. In the interest of performance, I thought I would share the benchmark results so informed exploring can be done if desired, if the potential gains match up with any wins for one crate or the other.

@Mingun Mingun added enhancement help wanted optimization Issues related to reducing time needed to parse XML or to memory consumption labels Feb 24, 2024
@Mingun
Copy link
Collaborator

Mingun commented Feb 24, 2024

Very interesting! If I understand correctly, this is results of benchmarks of crates themselves, you didn't integrate stringzilla to quick-xml, right? I'm always open in performance improvements so if you will you can create a PR with a replacement so everyone can experiment with such change. Note, however, that these results probably from searching small patterns in long strings. XML usually has a different access pattern -- many searches of small patterns in small strings. As you can see from quick-xml self benchmarks, the maybe_xml is even faster than quick-xml in most cases, although it does not use any SIMD libs. quick-xml wins only on very long XMLs (several megabytes) which, I think, usually a rare case.

@dralley
Copy link
Collaborator

dralley commented Feb 25, 2024

BurntSushi provided a response on Reddit, it seems like the benchmarks are a bit misleading, there are some circumstances in which StringZilla is faster but on average it seems to be slower.

https://www.reddit.com/r/rust/comments/1ayngf6/memchr_vs_stringzilla_benchmarks_up_to_7x/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted optimization Issues related to reducing time needed to parse XML or to memory consumption
Projects
None yet
Development

No branches or pull requests

3 participants