Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Create tests and integrate #1403 #1407

Closed
fabioz opened this issue May 8, 2019 · 4 comments
Closed

Create tests and integrate #1403 #1407

fabioz opened this issue May 8, 2019 · 4 comments

Comments

@fabioz
Copy link
Contributor

fabioz commented May 8, 2019

Create tests and integrate #1403

@karthiknadig karthiknadig changed the title Create tests and integrate https://github.com/microsoft/ptvsd/pull/1403 Create tests and integrate #1403 May 14, 2019
@fabioz fabioz self-assigned this May 15, 2019
@fabioz
Copy link
Contributor Author

fabioz commented May 15, 2019

I'll take a look at this.

@fabioz
Copy link
Contributor Author

fabioz commented May 15, 2019

Ok, so, I started to create unit-tests for this and check in the UI and there are more unicode things broken before getting to this point (in Python 2 -- things seem to work fine in Python 3).

Some things aren't even fixable in Python 2. For instance, doing:

print(u"😄😄😄😄😄😄😄😄😄"[3:4])

Will not return a full char as would be expected (even though it's unicode), it'll only return a part of that unicode which is actually a completely different char.

Also, I've found some cases where the debugger breaks getting truly binary data, so, I'll use this issue to fix that too.

@int19h
Copy link
Contributor

int19h commented May 16, 2019

The above sample with emoji should work correctly on Linux, but not on Mac or Windows, because of UTF-16 being used on those. So we can slice in the middle of a surrogate pair. But it should be fairly easy to detect - just need to check for a trailing high surrogate (0xD800–0xDBFF), and trim it if it's there.

@fabioz
Copy link
Contributor Author

fabioz commented May 16, 2019

@int19h It seems I have the same behavior on Windows and Linux...

Researching a bit more, it seems that it may be dependent on how CPython itself is compiled (https://stackoverflow.com/questions/1446347/how-to-find-out-if-python-is-compiled-with-ucs-2-or-ucs-4),

By coincidence, it seems that pytest was not working with Jython at all because of the way they implemented removing high surrogate pairs (I reported pytest-dev/pytest#5256 for them a few days ago).

Anyways, I'm working on this task right now and will provide a pull request after I test and work with those corner cases.

I'll probably just remove the high surrogate in that case as you suggested... the major issue is that the string length can be much different from what's expected as len(u"😄") == 2 (but as that's a corner case it may be ok).

fabioz added a commit to fabioz/ptvsd that referenced this issue May 17, 2019
@fabioz fabioz closed this as completed in f115036 May 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants