Faster escape routines #408

dralley · 2022-06-29T19:48:09Z

No description provided.

dralley · 2022-06-29T20:20:47Z

Figuring out how much of an improvement this is (or even if it is one, on average) might be challenging because I get different results on my AMD based desktop and my Intel based laptop (although laptop benchmarks are inconsistent due to power management).

Escaping long inputs is much faster (50-75% faster) on both systems regardless of whether it does any replacements, escaping short inputs on both systems is slower if there are replacements, but my desktop is also slower on short inputs without any replacements whereas my laptop is still faster (just not as much as with the long inputs).

Since most attributes are short and don't contain escapable chars, a regression there concerns me a bit.

Ironically, these instructions seem to have been added explicitly for the benefit of XML parsing, but they were never widely enough used for hardware manufacturers to prioritize the performance of the instructions, so now it's harder to benefit from them. It also seems like they put more effort into the one that works on null-terminated C strings, and less into the one that looks at strings of an explicit length, which is the variant Rust uses... with the result that it is (apparently) measurably much slower than the one designed for C strings. Frustrating.

Supposedly you can get better results with AVX2. Haven't looked into it yet.

https://news.ycombinator.com/item?id=14422098

http://lists.xml.org/archives/xml-dev/202108/msg00000.html

https://web.archive.org/web/20180617042918/https://software.intel.com/en-us/articles/xml-parsing-accelerator-with-intel-streaming-simd-extensions-4-intel-sse4/

https://stackoverflow.com/questions/58901232/why-is-sse4-2-cmpstr-slower-than-regular-code

https://stackoverflow.com/questions/20935769/sse42-sttni-pcmpestrm-is-twice-slower-than-pcmpistrm-is-it-true

codecov-commenter · 2022-06-29T20:43:28Z

Codecov Report

Merging #408 (c1d7a06) into master (5bed370) will increase coverage by 0.19%.
The diff coverage is 80.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##           master     #408      +/-   ##
==========================================
+ Coverage   64.43%   64.62%   +0.19%     
==========================================
  Files          37       37              
  Lines       17487    17529      +42     
==========================================
+ Hits        11267    11328      +61     
+ Misses       6220     6201      -19

Flag	Coverage Δ
unittests	`64.62% <80.00%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/escapei.rs	`14.51% <80.00%> (+1.26%)`	⬆️

... and 1 file with indirect coverage changes

dralley force-pushed the escape_text branch 2 times, most recently from a881c1c to af30446 Compare June 29, 2022 20:07

dralley force-pushed the escape_text branch from af30446 to c9ff68c Compare June 29, 2022 20:31

dralley mentioned this pull request Jul 7, 2022

Add benchmarks to source code shepmaster/jetscii#54

Open

Mingun added enhancement optimization Issues related to reducing time needed to parse XML or to memory consumption labels Aug 27, 2022

dralley mentioned this pull request Oct 12, 2023

Use memchr to search for characters to escape #664

Open

Faster escape routines

c1d7a06

dralley force-pushed the escape_text branch from c9ff68c to c1d7a06 Compare October 15, 2023 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster escape routines #408

Faster escape routines #408

dralley commented Jun 29, 2022

dralley commented Jun 29, 2022 •

edited

codecov-commenter commented Jun 29, 2022 •

edited

Faster escape routines #408

Are you sure you want to change the base?

Faster escape routines #408

Conversation

dralley commented Jun 29, 2022

dralley commented Jun 29, 2022 • edited

codecov-commenter commented Jun 29, 2022 • edited

Codecov Report

dralley commented Jun 29, 2022 •

edited

codecov-commenter commented Jun 29, 2022 •

edited