Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for certain unicode literals depends on sys.maxunicode #350

Closed
anishathalye opened this issue Nov 20, 2019 · 7 comments
Closed

Support for certain unicode literals depends on sys.maxunicode #350

anishathalye opened this issue Nov 20, 2019 · 7 comments
Projects

Comments

@anishathalye
Copy link
Contributor

It seems that support for unicode literals over code point 0xffff, introduced in cf1c86c, depends on sys.maxunicode. So for example, it'll work with standard Python 3 builds, but won't work with standard Python 2 builds, where sys.maxunicode is 2**16-1 (built with UCS-2).

Here's a simple test program that works on Python 3, but not on a standard Python 2 build:

# coding=utf8
import yaml

s = '''
"😢"
'''

assert yaml.safe_load(s) == '😢'

It produces the following error on Python 2:

Traceback (most recent call last):
  File "pyyaml_unicode.py", line 8, in <module>
    assert yaml.safe_load(s) == '😢'
  File "/usr/local/lib/python2.7/site-packages/yaml/__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "/usr/local/lib/python2.7/site-packages/yaml/__init__.py", line 112, in load
    loader = Loader(stream)
  File "/usr/local/lib/python2.7/site-packages/yaml/loader.py", line 34, in __init__
    Reader.__init__(self, stream)
  File "/usr/local/lib/python2.7/site-packages/yaml/reader.py", line 81, in __init__
    self.determine_encoding()
  File "/usr/local/lib/python2.7/site-packages/yaml/reader.py", line 137, in determine_encoding
    self.update(1)
  File "/usr/local/lib/python2.7/site-packages/yaml/reader.py", line 174, in update
    self.check_printable(data)
  File "/usr/local/lib/python2.7/site-packages/yaml/reader.py", line 149, in check_printable
    'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #xd83d: special characters are not allowed
  in "<string>", position 2

Is there any reasonable way to support this functionality on Python 2 as well?

@perlpunk
Copy link
Member

Just a note:
In python 2 the assert should look like this I believe:

assert yaml.safe_load(s) == u'😢'

And then it works for me in a standard Python 2.7.14 on linux (openSUSE).
But I know it doesn't on other Python 2 builds (windows, macos).

I don't have such a machine available, so I'm not able to check if this would be possible.
Maybe you can try to fix it?

btw, https://pythonclock.org/ says python 2 will retire in 1 month 11 days...

@anishathalye
Copy link
Contributor Author

Yep, agree about the assert. It's crashing before it gets to that point, though.

If you want to test on your own machine, you can get such a Python by compiling with --enable-unicode=ucs2.

I know that Python 2 will be retired soon, but I think many of my users still use Python 2, so I need to keep supporting it for some of my applications.

@anishathalye
Copy link
Contributor Author

I think #351 fixes it.

@perlpunk
Copy link
Member

perlpunk commented Dec 2, 2019

We released 5.2: https://pypi.org/project/PyYAML/5.2/

edit: oops, wrong comment, as the fix was not in 5.2

@perlpunk perlpunk added this to Backlog in 5.3 Release Dec 2, 2019
@anishathalye
Copy link
Contributor Author

The plan is to fix in 5.3 (or earlier, if there's a 5.2.1)?

@perlpunk
Copy link
Member

perlpunk commented Dec 2, 2019

Yes. I hope it won't be too long until 5.3

@perlpunk perlpunk moved this from Backlog to Done in 5.3 Release Dec 8, 2019
@perlpunk
Copy link
Member

perlpunk commented Jan 6, 2020

released https://pypi.org/project/PyYAML/5.3/

@perlpunk perlpunk closed this as completed Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

2 participants