Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider an alternative method for UnicodeSegmentation::graphemes #22

Open
nox opened this issue Mar 5, 2017 · 9 comments
Open

Consider an alternative method for UnicodeSegmentation::graphemes #22

nox opened this issue Mar 5, 2017 · 9 comments

Comments

@nox
Copy link

nox commented Mar 5, 2017

The boolean argument is quite useless when reading the code, because foo.graphemes(true) doesn't state what that true thing is for.

@tapeinosyne
Copy link

It would be reasonable to make this a breaking change while the number of direct dependants is acceptably low. (Presently, there are 22 publicly listed on crates.io). I feel that Boolean blindness would be unflattering in a stable API.

@mbrubeck
Copy link
Contributor

mbrubeck commented Mar 14, 2017

What should the new API look like? A few options:

  1. Replace the boolean with an enum, so the API becomes graphemes(Extended) or graphemes(Legacy).

  2. Deprecate graphemes, and introduce new methods extended_graphemes() and legacy_graphemes().

  3. Change graphemes() to always return extended grapheme clusters, and add a new method legacy_graphemes().

I don't like (1) because the calls become very long, it requires importing more names, and like the current API it doesn't offer much guidance. I don't like (2) for most of the same reasons.

I like (3) because it makes the common case shorter, guiding people to the method that is most often correct. The downside is that it does not allow a deprecation period, but I don't think this is a big problem in this case.

@tapeinosyne
Copy link

tapeinosyne commented Mar 14, 2017

  1. Change graphemes() to always return extended grapheme clusters, and add a new method legacy_graphemes().

This would be my choice as well. I don't personally mind importing enums, but extended grapheme clustering is clearly recommended[1] by the relevant Unicode annex and it seems sensible to offer it as an implicit default.

[1] Per the Unicode Standard Annex #29:

the extended grapheme cluster boundaries are recommended for general processing, while the legacy grapheme cluster boundaries are maintained primarily for backwards compatibility with earlier versions of this specification

@nox
Copy link
Author

nox commented Mar 15, 2017

+1 for 3/

@tapeinosyne
Copy link

tapeinosyne commented Mar 21, 2017

(Note that, as of #23, this issue also applies to the newly added cursor API.)

@raindev
Copy link

raindev commented Jun 15, 2018

An alternative option to introduce a more ergonomic API without breaking backwards compatibility is to use functions not grouped into a trait but accessible from the module directly. UnicodeSegmentation trait can then be deprecated and removed eventually. The trait is implemented for str so it won't be possible to call the methods directly on a string (i.e. s.graphemes() vs graphemes(s)) but that's pretty minor annoyance. A small additional benefit is that import will be less stuttering: use unicode_segmentation::graphemes instead of use unicode_segmentation::UnicodeSegmentation.

@raindev
Copy link

raindev commented Jun 15, 2018

I'll be happy to submit a patch too. Or maybe @xfix is interested in updating #31. Would be good to move it forward.

@upsuper
Copy link

upsuper commented Jun 6, 2019

I would also support option 3. It would be a breaking change but I guess that's fine. We can probably release a 1.3.1 which translates the new API into the old, so that we can move forward gradually.

@timClicks
Copy link
Contributor

Another +1 for option 3 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants