Improve performance, especially on non-ASCII codepoints #28

mjguynn · 2022-06-10T03:27:58Z

This PR introduces two performance-oriented changes:

Use if instead of match in the ASCII fast-path. This leads to a ~15% performance improvement on ASCII text. This was previously suggested in UnicodeWidthChar should inline an ascii fast path #1; it's been 7 years since that issue was created and match still is not optimized as well as if, so at this point I think it's justified to replace it with an if statement. I hope that this can be reverted after future compiler improvements since the match expression was much more readable.
Use a three-level lookup table instead of a binary search. This leads to a >~70% (>3x) performance improvement on non-ASCII text. The lookup tables use a total of 6528 bytes; I believe the current version's binary search table takes up 7752 bytes, so this change might improve binary size.

I created two additional benchmarks to test these changes' performance. enwik8 runs UnicodeWidthStr::width on ~97MB of English Wikipedia XML data; jawiki runs UnicodeWidthStr::width on ~120MB of Japanese Wikipedia page metadata. I haven't included these benchmarks in the commit because I'm unsure about copyright/licensing on the data and I can't remember where I obtained the jawiki data from.

Here are the benchmarks for the current version of unicode-width:

test tests::cargo        ... bench:       3,249 ns/iter (+/- 189)
test tests::enwik8       ... bench:  84,306,590 ns/iter (+/- 1,503,814)
test tests::jawiki       ... bench: 425,568,810 ns/iter (+/- 1,598,293)

And here are the benchmarks with this PR applied:

test tests::cargo        ... bench:       2,774 ns/iter (+/- 260)
test tests::enwik8       ... bench:  63,884,720 ns/iter (+/- 1,754,871)
test tests::jawiki       ... bench: 129,152,290 ns/iter (+/- 643,606)

jawiki still includes a lot of ASCII characters, so the performance improvement for non-ASCII characters is likely larger than 70%.

I also tested the new code against wcswidth. This wasn't a perfect comparison:

FFI inhibits inlining, so wcswidth was at a disadvantage
wcswidth directly operates on wchar_ts, while UnicodeWidthStr must pass the input string through .chars(). I accounted for this by collecting .chars() into a Vec and then benchmarking vec.map(|&c| cw::width(c, false).unwrap_or(0)).fold(0, Add::add).

With these disclaimers, the new code is ~30% faster than the system implementation of wcswidth and ~60% faster than http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c when run on jawiki:

test tests::jawiki          ... bench: 111,284,640 ns/iter (+/- 949,861)
test tests::jawiki_wcsw     ... bench: 164,722,600 ns/iter (+/- 2,486,271)
test tests::jawiki_mkwcsw   ... bench: 282,653,890 ns/iter (+/- 4,269,217)

I've run the following code to ensure that I didn't introduce any behavioral regressions:

for i in 0..=u32::from(char::MAX) {
	if let Ok(c) = char::try_from(i) {
		let old = unicode_width::UnicodeWidthChar::width(c);
		let new = unicode_width_new::UnicodeWidthChar::width(c);
		assert!(old == new);
		let old_cjk = unicode_width::UnicodeWidthChar::width_cjk(c);
		let new_cjk = unicode_width_new::UnicodeWidthChar::width_cjk(c);
		assert!(old_cjk == new_cjk);
	}
}

Apologies in advance to the reviewer(s): I experimented with a few different optimization methods (binary search variants, B+ trees, etc) and by the time I was finished most of the code in unicode.py had been rewritten, so there's a lot to look through.

Manishearth · 2022-06-10T16:48:25Z

I may not have time to review this soon but it looks promising!

Would it be possible to include the code for the benchmark, perhaps disabled in Cargo.toml, with instructions for downloading the file?

mjguynn · 2022-06-10T18:06:14Z

I've just added the benchmark code & instructions. Currently it will just benchmark against an empty string if it cant find the data files, let me know if you'd prefer configuration in Cargo.toml instead.

Manishearth · 2022-06-23T22:40:37Z

Thanks!

Switch from binary search to lookup tables

0de9001

Add Wikipedia benchmarks & instructions

088d7cb

Manishearth approved these changes Jun 23, 2022

View reviewed changes

Manishearth merged commit 458f6ac into unicode-rs:master Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance, especially on non-ASCII codepoints #28

Improve performance, especially on non-ASCII codepoints #28

mjguynn commented Jun 10, 2022

Manishearth commented Jun 10, 2022

mjguynn commented Jun 10, 2022

Manishearth commented Jun 23, 2022

Navigation Menu

Improve performance, especially on non-ASCII codepoints #28

Improve performance, especially on non-ASCII codepoints #28

Conversation

mjguynn commented Jun 10, 2022

Manishearth commented Jun 10, 2022

mjguynn commented Jun 10, 2022

Manishearth commented Jun 23, 2022