Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bug #280 Don't force labels containing URL delimiters to stay in thei…
…r Unicode form when using idn_to_ascii (TRowbotham) This PR was squashed before being merged into the 1.18-dev branch. Discussion ---------- Don't force labels containing URL delimiters to stay in their Unicode form when using idn_to_ascii Fixes #279. The problem is this line: https://github.com/symfony/polyfill/blob/master/src/Intl/Idn/Idn.php#L162 and only affects `idn_to_ascii`. Strictly speaking, this isn't part of the spec. The fix is to remove this conditional statement. The reason why it is here is an assumption that I made regarding a discrepancy between what the spec produces and what the test cases expect. The problem that I was trying to solve is that the tests expect that a label that contains a "?" and also has an error should remain in it's Unicode form, however, the spec says to always convert the label to it's ASCII form. There are roughly 200 test cases where this is the case. As a result of this, I extrapolated this to mean that labels containing URL delimiters and had an error should stay in their Unicode form, as it seemed odd that "?" would be singled out here, however, this was clearly an incorrect assumption as shown by the simple test provided in #279. Example problematic test case: ```diff 213) Rowbot\Idna\Test\IdnaV2Test::testToAsciiTransitional with data set #6224 ('憡?Ⴔ.XN--1UG73GL146A', '憡?Ⴔ.𐋮≠', '[C2, P1, V6]', '憡?Ⴔ.xn--1ug73gl146a', '[C2, P1, V6, A3]', '', '') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'憡?Ⴔ.xn--1ug73gl146a' +'xn--?-c1g3623d.xn--1ug73gl146a' ``` These errors do not appear in the Symfony tests because we opt not to check the transformed domain when it contains errors, which is what the official [ICU test suite](https://github.com/unicode-org/icu/blob/master/icu4j/main/tests/core/src/com/ibm/icu/dev/test/normalizer/UTS46Test.java#L749) does, but this is likely the reason why the discrepancy exists in the first place. I have notified the Unicode Consortium about the test case discrepancy and have been told that it will be discussed at the next Unicode Technical Committee meeting. Commits ------- 43fbe88 Don't force labels containing URL delimiters to stay in their Unicode form when using idn_to_ascii
- Loading branch information