New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDN polyfill for the UTS46 variant is incomplete #159
Comments
It also seems to not throw an error when there's a leading dash that is not allowed. See guzzle/guzzle#2550 (comment). |
Anyone willing to improve this? |
At the risk of shooting myself in the foot, I'd be willing to donate the library I just finished to improve the situation here. OverviewI wrote this for my own use after having tried other polyfills and being disappointed with the results. You can find the repository here for testing, though I have not added it to Packagist yet contrary to the install instructions in the readme. The library is fully UTS#46 compliant and passes the test suites for Unicode versions 12.1.0 and 13.0.0 (more details below). Code coverage is roughly 95%. I understand that swapping the underlying library here isn't really the most palatable solution to the problem, but still I wanted to offer. IDNA is a rather complex topic and doesn't really have a simple solution. I'll be happy to answer any questions you may have and I'll try to provide you with enough information to make an informed decision. Right now, the Normalizer implementation used is the wildcard, and as far as I can tell there are only 2 existing polyfills available; TestsThe table below contains preliminary test results. This was run against the IDNA test suite for Unicode version 13.0.0. There are a total of 6,225 test cases. The tests are run 3 times; once for
[1] From the source comment I assumed this was version 6.3.0. I will file a new issue with the error details from running While the results with a Normalizer implementation that uses an appropriate version of Unicode are very promising, and great for validating conformance, they are essentially irrelevant since the use case for installing this is that the user does not have the I understand the desire to have a test suite that 100% passes and that is what is expected from a quick look at the docs, but unfortunately that can't be guaranteed here with the available Normalizer polyfills. So, I don't know how you feel about having a test suite that may or may not pass depending on the Normalizer's Unicode version. PerformanceCould be better, could be worse. Sorry, the details in this department are a bit light and unscientific. OPCache and PCRE JIT were enabled, but the autoloader was not optimized. The fastest run ( The Unicode data is an array with ~8,700 entries that represents the entire repertoire of Unicode code points. A binary search is done on this array to lookup the information for each code point in the input, but nothing is cached. So, could possibly be improved with caching lookup results, but determining the best heuristic for cache eviction and all that is challenging. Preloading the Unicode data in PHP 7.4+ could be a potential win. I haven't actually checked performance in PHP 8 yet, but the JIT may help with all the binary searches. ImplicationsMy library ships with Unicode data for a specific version of Unicode (in this case it is using 13.0.0). While this is great for helping to ensure the stability of the resulting output, it does come with some drawbacks:
Additionally,
DifferencesI've tried very hard for parity between my library and the native ICU implementation, however, there are some differences.
ConclusionFirst, let me apologize for the headache you now have. This is probably a lot to take in and is far from a perfect solution. In a perfect world, everyone would have access to the EDIT: Fixed the number of assertions for the last row in the tests table. |
Wow, nice :) Would you be up to sending your code in a PR on this repository, replacing the current one?
We considered this as bug fixes in the past. For sure a major version bump would not be desired.
that's fine, we already do this hypothesis in other polyfills and it proved being a non-issue.
Can't we really? Did you spot any difference? If yes, at which version of PCRE/PHP? I'd be tempted to consider this as an issue we don't need to work around. Let's use
Can this be relaxed? Which 7.1 features are required?
Don't you want to rename your class to |
#266 should fix a least some (maybe all?) failures with the polyfill-intl-normalizer, can you please give it a try? |
Yes, I can do that.
Unfortunately, there is a difference. Using So, this is an issue even on recent versions of PHP and gets progressively worse the older the PHP version is. Looking at PCRE changelog, the highest supported version of Unicode is 7.0.0. So, we can't expect any version of PHP using PCRE1 to have better Unicode support in regular expressions. My PHP 7.1/7.2 install has a PCRE version of 8.43. My PHP 7.3/7.4 install reports I initially wrote this library using the
Depends on how relaxed you want to get. I don't require anything super fancy. You could go back to at least PHP 5.5 if you rip out the type hints, const and function imports, and require the appropriate
I'm fine with renaming to Idn, I don't feel strongly either way.
Yes, I will absolutely give this a try. Thanks! |
UTS#46 defines a series of options that can be toggled by the caller to customize it's behavior. Two of those options, To support configuring those 2 options would require adding 2 new constants even when the polyfill doesn't get used, but they would be no-ops when using the |
I tried #266 and it didn't change the number of errors. Looking more closely, the errors with
Depending on how you interpret the following statement from Section 8.2. Testing Conformance the easiest thing to do here would be to change the tests to not check that there are no errors when it expects no errors.
This is more of a false negative than being more strict. In the meantime I'll go check out the normalization spec and think about it a bit. |
So, I ended up just rolling my own I couldn't figure out how to disable the On a side note, the nightly build seems to be complaining about a non-existant constant on the
|
Actually, PHP has a bundled version of PCRE which is used by default. The versions are:
|
See #149 (comment) and following comments
The text was updated successfully, but these errors were encountered: