Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching functions for byte ranges? #18

Open
thomcc opened this issue Jul 23, 2019 · 1 comment
Open

Searching functions for byte ranges? #18

thomcc opened this issue Jul 23, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@thomcc
Copy link
Contributor

thomcc commented Jul 23, 2019

I don't have a concrete use for this, but when writing the byteset code it occurred to me that people might want to use it for e.g. searching for the next ASCII number, or the next lowercase ASCII character (ISTM like this is more generally useful for ranges of u8 rather than char, but I could be wrong).

These can be accomplished with the byteset functions, but less efficiently than dedicated functions for finding bytes in a range.

One of the flags to PCMPESTRI does allow for range checks, and defining such a thing in earlier SSEs is easier for ranges than for arbitrary byte sets.

The byteset functions could also possibly autodetect this, in many cases it would be cheap (e.g. b"0123456789"), but for some it would take an extra pass over the byte table afterwards to look for the consecutive runs (e.g. b"0918273645", which seems unfortunate.

Additionally, having to type out the members of the set is less syntactically convenient than b'0'..=b'9' (or whatever).

Not sure if these are worth adding. Again, I don't really have a use, so maybe it's worth waiting for someone who does. And it's unclear how extensive you'd like the searching capabilities to be on ByteSlice anyway. Thoughts?

@BurntSushi BurntSushi added the enhancement New feature or request label Jul 23, 2019
@BurntSushi
Copy link
Owner

It's a good idea. One of the possible use cases here is in a regex engine, although using routines like this effectively in that context isn't straight-forward.

I'd probably prefer to wait until we have a concrete use case for this.

If initialization time becomes a problem then we can add something like a Finder to permit callers to amortize that cost if they need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants