New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch slugify regex to support more Unicode character groups #8167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the fact that this is a bug fix, I'm inclined to include it in the next release, even though site urls will change because of it.
@jekyll/core I'd appreciate your thoughts on this approach.
@mattr- How about adding a configuration option that enables the new regexes..? Opt-in for the change..? |
馃憢 I paired with @swanson, as we put together the fix. I wonder if making this an opt-in feature might accidentally select against benefiting people most likely to benefit from this fix. The original bug report came from the I suspect that most of the users of this tool are not impacted by this error, which is why it didn't get reported until recently, in @deepestblue's detailed bug report. This is certainly an edge-case fix, but I'm afraid that making it an opt in feature, the populations most positioned to benefit from the fix would now be implicitly required to go dig deep through the source code to figure out how to use this fix in a way that targets their use case. Especially if the bug fix aids in slugifying non-standard characters, certainly all individuals who would use Jekyll for Tamil-language websites, or any other non-English website, would be expected to read source code in English to accommodate their non-English use case. Could it be possible to package this into some sort of Beta build, or get insights from individuals who are close to the target use case, to see if this fix introduces breaking changes? This is also a function of me not understanding how to make this an opt-in feature. @ashmaroli, could you point me in the right direction on how this would be made opt-in, and where one would go to see that this feature would be available? |
I think there is a clear path to making this opt-in in the context of being used as a It is less clear how to resolve the use-case for generating post/page URLs (which was the problem outlined in the initial issue). It appears that there is not an existing codepath for passing jekyll/lib/jekyll/drops/url_drop.rb Lines 21 to 24 in b146257
I'm not against adding such a path (if that is deemed appropriate), but it is fairly far-reaching so wanted to check-in before starting a bigger effort. |
I don't want yet another configuration option for this. |
For the record, I only suggested to use a config option because this has a chance of automatically altering existing URLs (albeit edge-cases). |
@jekyllbot: merge +fix |
Thank you @swanson and @josh-works for your contribution. |
This is a 馃檵 feature or enhancement.
Summary
Modifies the Regex to support a wider range of Unicode character groups. The original issue has more context on why the existing Regex was not working as expected for Tamil (or other Unicode characters), including external references to a possible fix.
This was deemed to be a breaking change in the original issue. There are no failing tests to help guide us in confirming that it would break existing sites, but it does seem possible.
The original issue mentioned making this change opt-in. We're happy to support this but we were not sure how to best implement that.
Possible proposal: add a new slug
mode
option (something likepretty_unicode
-- open to ideas on the name) that would apply this more advanced regex instead.Any ideas on how to proceed? We don't expect this to get merged as-is, happy to continue a discussion here.
Context
See #7973
Paired with @josh-works so CCing him :)