Create tests and integrate #1403 #1407
Comments
I'll take a look at this. |
Ok, so, I started to create unit-tests for this and check in the UI and there are more unicode things broken before getting to this point (in Python 2 -- things seem to work fine in Python 3). Some things aren't even fixable in Python 2. For instance, doing:
Will not return a full char as would be expected (even though it's unicode), it'll only return a part of that unicode which is actually a completely different char. Also, I've found some cases where the debugger breaks getting truly binary data, so, I'll use this issue to fix that too. |
The above sample with emoji should work correctly on Linux, but not on Mac or Windows, because of UTF-16 being used on those. So we can slice in the middle of a surrogate pair. But it should be fairly easy to detect - just need to check for a trailing high surrogate (0xD800–0xDBFF), and trim it if it's there. |
@int19h It seems I have the same behavior on Windows and Linux... Researching a bit more, it seems that it may be dependent on how CPython itself is compiled (https://stackoverflow.com/questions/1446347/how-to-find-out-if-python-is-compiled-with-ucs-2-or-ucs-4), By coincidence, it seems that pytest was not working with Jython at all because of the way they implemented removing high surrogate pairs (I reported pytest-dev/pytest#5256 for them a few days ago). Anyways, I'm working on this task right now and will provide a pull request after I test and work with those corner cases. I'll probably just remove the high surrogate in that case as you suggested... the major issue is that the string length can be much different from what's expected as |
Create tests and integrate #1403
The text was updated successfully, but these errors were encountered: