Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance, especially on non-ASCII codepoints #28

Merged
merged 2 commits into from Jun 23, 2022

Conversation

mjguynn
Copy link
Contributor

@mjguynn mjguynn commented Jun 10, 2022

This PR introduces two performance-oriented changes:

  1. Use if instead of match in the ASCII fast-path. This leads to a ~15% performance improvement on ASCII text. This was previously suggested in UnicodeWidthChar should inline an ascii fast path #1; it's been 7 years since that issue was created and match still is not optimized as well as if, so at this point I think it's justified to replace it with an if statement. I hope that this can be reverted after future compiler improvements since the match expression was much more readable.
  2. Use a three-level lookup table instead of a binary search. This leads to a >~70% (>3x) performance improvement on non-ASCII text. The lookup tables use a total of 6528 bytes; I believe the current version's binary search table takes up 7752 bytes, so this change might improve binary size.

I created two additional benchmarks to test these changes' performance. enwik8 runs UnicodeWidthStr::width on ~97MB of English Wikipedia XML data; jawiki runs UnicodeWidthStr::width on ~120MB of Japanese Wikipedia page metadata. I haven't included these benchmarks in the commit because I'm unsure about copyright/licensing on the data and I can't remember where I obtained the jawiki data from.

Here are the benchmarks for the current version of unicode-width:

test tests::cargo        ... bench:       3,249 ns/iter (+/- 189)
test tests::enwik8       ... bench:  84,306,590 ns/iter (+/- 1,503,814)
test tests::jawiki       ... bench: 425,568,810 ns/iter (+/- 1,598,293)

And here are the benchmarks with this PR applied:

test tests::cargo        ... bench:       2,774 ns/iter (+/- 260)
test tests::enwik8       ... bench:  63,884,720 ns/iter (+/- 1,754,871)
test tests::jawiki       ... bench: 129,152,290 ns/iter (+/- 643,606)

jawiki still includes a lot of ASCII characters, so the performance improvement for non-ASCII characters is likely larger than 70%.

I also tested the new code against wcswidth. This wasn't a perfect comparison:

  • FFI inhibits inlining, so wcswidth was at a disadvantage
  • wcswidth directly operates on wchar_ts, while UnicodeWidthStr must pass the input string through .chars(). I accounted for this by collecting .chars() into a Vec and then benchmarking vec.map(|&c| cw::width(c, false).unwrap_or(0)).fold(0, Add::add).

With these disclaimers, the new code is ~30% faster than the system implementation of wcswidth and ~60% faster than http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c when run on jawiki:

test tests::jawiki          ... bench: 111,284,640 ns/iter (+/- 949,861)
test tests::jawiki_wcsw     ... bench: 164,722,600 ns/iter (+/- 2,486,271)
test tests::jawiki_mkwcsw   ... bench: 282,653,890 ns/iter (+/- 4,269,217)

I've run the following code to ensure that I didn't introduce any behavioral regressions:

for i in 0..=u32::from(char::MAX) {
	if let Ok(c) = char::try_from(i) {
		let old = unicode_width::UnicodeWidthChar::width(c);
		let new = unicode_width_new::UnicodeWidthChar::width(c);
		assert!(old == new);
		let old_cjk = unicode_width::UnicodeWidthChar::width_cjk(c);
		let new_cjk = unicode_width_new::UnicodeWidthChar::width_cjk(c);
		assert!(old_cjk == new_cjk);
	}
}

Apologies in advance to the reviewer(s): I experimented with a few different optimization methods (binary search variants, B+ trees, etc) and by the time I was finished most of the code in unicode.py had been rewritten, so there's a lot to look through.

@Manishearth
Copy link
Member

I may not have time to review this soon but it looks promising!

Would it be possible to include the code for the benchmark, perhaps disabled in Cargo.toml, with instructions for downloading the file?

@mjguynn
Copy link
Contributor Author

mjguynn commented Jun 10, 2022

I've just added the benchmark code & instructions. Currently it will just benchmark against an empty string if it cant find the data files, let me know if you'd prefer configuration in Cargo.toml instead.

@Manishearth
Copy link
Member

Thanks!

@Manishearth Manishearth merged commit 458f6ac into unicode-rs:master Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants