Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-UTF-8 filenames #69

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Arcterus
Copy link

@Arcterus Arcterus commented Jul 4, 2018

This should resolve #23. The implementation works on both Linux and Windows for me (with the available tests). It's still very possible something broke though, so I'd appreciate it if another set of eyes could carefully check. I'm also very open to design changes if desired.

As part of the changes, I removed the methods matches_path() and matches_path_with() as they are now redundant, but I can add them back for the sake of backwards-compatibility.

@Arcterus
Copy link
Author

Arcterus commented Jul 4, 2018

I also have not modified the documentation, so there's probably some stuff that needs to change.

@BurntSushi
Copy link
Member

Could you please describe the high level solution to the problem that you implemented?

@Arcterus
Copy link
Author

Arcterus commented Jul 4, 2018

Alright, no problem. I tried to keep the logic used in the original code the same, so most all of the changes were around getting the same code to work without relying on String/&str.

Because the only characters we need to match against are the few special characters (like * and ?) and other characters that come from the input itself, I abstracted the pattern creation and matching functions to use platform-dependent iterators (EncodeWide on Windows and an iterator over a byte slice on other platforms). Additionally, anything that stored char now stores the platform-dependent type used by the iterators (u16 on Windows and u8 on other platforms), and the special characters are provided by a platform-dependent trait/struct (PlatformCharset/PatternCharset).

By doing so, we just check need to check whatever value is retrieved from the iterator against the special characters retrieved from the platform-dependent trait/struct. If the value is not a special character, we can just store it in PatternToken like we stored chars in the previous implementation.

@Arcterus
Copy link
Author

Any feedback?

@JohnTitor
Copy link
Member

Seems force-pushing broke the diff, the branch now only has eeb8a73.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

glob("*") does not support matching non-utf8 filenames
4 participants