RFC: new GraphemeCursor API #21

raphlinus · 2017-03-03T23:59:08Z

The existing API is unsuitable for cursor movement in xi-editor, because (a) xi's string representation is a rope, not a contiguous &str, and (b) I need to be able to start at an arbitrary offset in the string and find the previous or next boundary.

I propose implementing a new cursor-flavored API. The GraphemeCursor struct would be little more than the state machine state; it would not store a reference to the string. Queries would pass in a &str chunk and an offset within that chunk. It is the caller's responsibility to ensure that the offset is consistent with the cursor location. The return value of a query would either be a new offset, an indication that the boundary is beyond the extent of the provided chunk, or a request for pre-context. In the latter case, the caller would supply a string chunk preceding the chunk containing the cursor (the return value would probably include a negative offset), then retry the original query. In the second case, the caller would advance to the previous or next chunk, then retry.

Queries supported would include is_boundary, next_boundary, and prev_boundary. The latter two queries are stateful, in that the cursor is moved to the result of the query.

As an implementation detail, the existing iterator would be implemented as a relatively thin layer on top of the cursor. The implementations for supplying pre-context, and for advancing to previous and next chunks, would be trivial.

Since it is very easy to get details wrong, I would plan to do automated testing with randomly generated input strings, verifying that the next and prev boundaries are consistent with each other (which would hopefully automate detection of bugs such as #19), and that results with chunked input are consistent with whole-string.

Having a cursor-based implementation would help implement features such as #7.

I'd love feedback on whether this general direction makes sense before diving into implementation. If this is successful, I'd probably want to do something similar for word boundaries as well.

The text was updated successfully, but these errors were encountered:

Very much work in progress. See unicode-rs#21

SimonSapin · 2017-03-04T07:47:08Z

Sounds good to me.

@alexcrichton, @kwantam, any opinion?

raphlinus · 2017-03-04T07:48:05Z

I'm prototyping this in a fork: https://github.com/unicode-rs/unicode-segmentation . There are some small differences from what I wrote above, but the bones should be in place.

alexcrichton · 2017-03-06T15:20:12Z

@SimonSapin no opinion

tapeinosyne · 2017-03-13T21:42:36Z

The purpose of a cursor-oriented API is rather clear, and this proposal feels nice. (I have occasionally wished that even simpler string views supported similar methods.) If cursors make for a serviceable basis to the current iterators, all the better.

HadrienG2 · 2019-01-11T07:17:59Z

Should be fixed by #23 .

raphlinus · 2019-01-11T07:20:31Z

Yes, I think this issue can be closed.

raphlinus mentioned this issue Mar 4, 2017

'New Tab' menu item xi-editor/xi-editor#172

Merged

raphlinus added a commit to raphlinus/unicode-segmentation that referenced this issue Mar 4, 2017

Starting to implement rope-capable API

f0df6be

Very much work in progress. See unicode-rs#21

raphlinus mentioned this issue Mar 6, 2017

New cursor-based implementation of grapheme clusters #23

Merged

bredov mentioned this issue Apr 19, 2017

Panic with 'byte index is not a char boundary' rust-lang/rustfmt#1464

Closed

tapeinosyne mentioned this issue Feb 12, 2018

UWordBoundIndices doesn't expose the indices #35

Open

HadrienG2 mentioned this issue Jan 11, 2019

Forward-reverse grapheme mismatch on "\u{1F938}\u{1F3FE}\u{1F3FE}" #19

Closed

raphlinus closed this as completed Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: new GraphemeCursor API #21

RFC: new GraphemeCursor API #21

raphlinus commented Mar 3, 2017

SimonSapin commented Mar 4, 2017

raphlinus commented Mar 4, 2017

alexcrichton commented Mar 6, 2017

tapeinosyne commented Mar 13, 2017

HadrienG2 commented Jan 11, 2019 •

edited

raphlinus commented Jan 11, 2019

RFC: new GraphemeCursor API #21

RFC: new GraphemeCursor API #21

Comments

raphlinus commented Mar 3, 2017

SimonSapin commented Mar 4, 2017

raphlinus commented Mar 4, 2017

alexcrichton commented Mar 6, 2017

tapeinosyne commented Mar 13, 2017

HadrienG2 commented Jan 11, 2019 • edited

raphlinus commented Jan 11, 2019

HadrienG2 commented Jan 11, 2019 •

edited