Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support to Unicode characters over codepoint 0xffff #63

Merged
merged 5 commits into from Aug 8, 2017

Conversation

peterkmurphy
Copy link
Contributor

This patch is aimed at solving issue #25 . As a side effect, the testing code has been trimmed to accommodate the fixes. Why have I done that? I might as well repeat what I said on gmail:

The problem I am having is with the testing code, which will need a lot of unwrangling. There are some frankly ... "bizarre" assumptions in what won't work in parseable code, except it sometimes will be parseable by accident. For example:


def test_unicode_input_errors(unicode_filename, verbose=False):
    data = open(unicode_filename, 'rb').read().decode('utf-8')
    for input in [data.encode('latin1', 'ignore'), # <--- Look at this!
                    data.encode('utf-16-be'), data.encode('utf-16-le'),
                    codecs.BOM_UTF8+data.encode('utf-16-be'),
                    codecs.BOM_UTF16_BE+data.encode('utf-16-le'),
                    codecs.BOM_UTF16_LE+data.encode('utf-8')+'!']:
        try:
            yaml.load(input)
        except yaml.YAMLError, exc:
            if verbose:
                print exc
        else:
            raise AssertionError("expected an exception")

The idea: let's cause some bizarre combinations of byte sequences, attempt to parse it, and if it doesn't throw a YAMLError, raise an exception. Except that when one does data.encode('latin1', 'ignore') on data, one results in ten line breaks, which is happily parseable as YAML. So no exception raised, so AssertionError.

What should I do in this case - remove test_unicode_input_errors from the PyYaml testing code? Yes, the number of tests will go down, which is generally not a good thing, but if the tests are based on dodgy assumptions...

In some cases I have altered testing code; others I have removed them.

@@ -32,4 +32,3 @@ Submit bug reports and feature requests to the PyYAML bug tracker:

PyYAML is written by Kirill Simonov <xi@resolvent.net>. It is released
under the MIT license. See the file LICENSE for more details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing trailing newlines is bad form as it's diff noise. I'd undo it for all the files you've added it for, which seems to be every file you've touched. Also files should end with a trailing newline to be POSIX valid

@@ -674,7 +678,7 @@ def analyze_scalar(self, scalar):
# Check for indicators.
if index == 0:
# Leading indicators are special characters.
if ch in u'#,[]{}&*!|>\'\"%@`':
if ch in u'#,[]{}&*!|>\'\"%@`':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trimming trailing whitespace is bad form as it's diff noise

@peterkmurphy
Copy link
Contributor Author

peterkmurphy commented May 10, 2017 via email

@adamchainz
Copy link

@peterkmurphy I'm not an admin on this project, I can't do anything to your PR.

@samdmarshall
Copy link

you may need to ping @sigmavirus24 or another commiter for this project to get this merged and a new release made.

adamchainz pushed a commit to adamchainz/pyyaml that referenced this pull request May 16, 2017
adamchainz pushed a commit to adamchainz/pyyaml that referenced this pull request May 16, 2017
@adamchainz
Copy link

I copied this and tidied it up in #65

@peterkmurphy
Copy link
Contributor Author

peterkmurphy commented May 16, 2017 via email

@sigmavirus24 sigmavirus24 merged commit 94c3f07 into yaml:master Aug 8, 2017
@sigmavirus24
Copy link
Contributor

Thanks @peterkmurphy! 🎉 ✨

@jborean93
Copy link

@ingydotnet is there any chance this fix could be backported to the 3.x branch and a new release made? I can install the pre-release 4.x builds but I haven't seen any action recently that indicates a full release will be made on those changes.

cc @nitzmahone

@nitzmahone
Copy link
Member

I'd be +1 for that

@ingydotnet
Copy link
Member

@perlpunk and I are meeting up in a week. We might be able to discuss it then.

mtremer pushed a commit to ipfire/ipfire-2.x that referenced this pull request Feb 14, 2022
- Update from 3.13 to 6.0
- Update of rootfile
- Changelog
6.0 (2021-10-13)
* yaml/pyyaml#327 -- Change README format to Markdown
* yaml/pyyaml#483 -- Add a test for YAML 1.1 types
* yaml/pyyaml#497 -- fix float resolver to ignore `.` and `._`
* yaml/pyyaml#550 -- drop Python 2.7
* yaml/pyyaml#553 -- Fix spelling of “hexadecimal”
* yaml/pyyaml#556 -- fix representation of Enum subclasses
* yaml/pyyaml#557 -- fix libyaml extension compiler warnings
* yaml/pyyaml#560 -- fix ResourceWarning on leaked file descriptors
* yaml/pyyaml#561 -- always require `Loader` arg to `yaml.load()`
* yaml/pyyaml#564 -- remove remaining direct distutils usage
5.4.1 (2021-01-20)
* yaml/pyyaml#480 -- Fix stub compat with older pyyaml versions that may unwittingly load it
5.4 (2021-01-19)
* yaml/pyyaml#407 -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA
* yaml/pyyaml#472 -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader
* yaml/pyyaml#441 -- Fix memory leak in implicit resolver setup
* yaml/pyyaml#392 -- Fix py2 copy support for timezone objects
* yaml/pyyaml#378 -- Fix compatibility with Jython
5.3.1 (2020-03-18)
* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor
5.3 (2020-01-06)
* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- Fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- Increase size of index, line, and column fields
* yaml/pyyaml#260 -- Remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone
5.2 (2019-12-02)
* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
5.1.2 (2019-07-30)
* Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+
5.1.1 (2019-06-05)
* Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b1
5.1 (2019-03-13)
* yaml/pyyaml#35 -- Some modernization of the test running
* yaml/pyyaml#42 -- Install tox in a virtualenv
* yaml/pyyaml#45 -- Allow colon in a plain scalar in a flow context
* yaml/pyyaml#48 -- Fix typos
* yaml/pyyaml#55 -- Improve RepresenterError creation
* yaml/pyyaml#59 -- Resolves #57, update readme issues link
* yaml/pyyaml#60 -- Document and test Python 3.6 support
* yaml/pyyaml#61 -- Use Travis CI built in pip cache support
* yaml/pyyaml#62 -- Remove tox workaround for Travis CI
* yaml/pyyaml#63 -- Adding support to Unicode characters over codepoint 0xffff
* yaml/pyyaml#75 -- add 3.12 changelog
* yaml/pyyaml#76 -- Fallback to Pure Python if Compilation fails
* yaml/pyyaml#84 -- Drop unsupported Python 3.3
* yaml/pyyaml#102 -- Include license file in the generated wheel package
* yaml/pyyaml#105 -- Removed Python 2.6 & 3.3 support
* yaml/pyyaml#111 -- Remove commented out Psyco code
* yaml/pyyaml#129 -- Remove call to `ord` in lib3 emitter code
* yaml/pyyaml#149 -- Test on Python 3.7-dev
* yaml/pyyaml#158 -- Support escaped slash in double quotes "\/"
* yaml/pyyaml#175 -- Updated link to pypi in release announcement
* yaml/pyyaml#181 -- Import Hashable from collections.abc
* yaml/pyyaml#194 -- Reverting yaml/pyyaml#74
* yaml/pyyaml#195 -- Build libyaml on travis
* yaml/pyyaml#196 -- Force cython when building sdist
* yaml/pyyaml#254 -- Allow to turn off sorting keys in Dumper (2)
* yaml/pyyaml#256 -- Make default_flow_style=False
* yaml/pyyaml#257 -- Deprecate yaml.load and add FullLoader and UnsafeLoader classes
* yaml/pyyaml#261 -- Skip certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#263 -- Windows Appveyor build

Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>

 --git a/config/rootfiles/packages/python3-yaml b/config/rootfiles/packages/python3-yaml
x 0870a2346..bd4009a08 100644
* yaml/pyyaml#195 -- Build libyaml on travis
* yaml/pyyaml#196 -- Force cython when building sdist
* yaml/pyyaml#254 -- Allow to turn off sorting keys in Dumper (2)
* yaml/pyyaml#256 -- Make default_flow_style=False
* yaml/pyyaml#257 -- Deprecate yaml.load and add FullLoader and Uns
oader classes
* yaml/pyyaml#261 -- Skip certain unicode tests when maxunicode not
xffff
* yaml/pyyaml#263 -- Windows Appveyor build

Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>
Reviewed-by: Peter Müller <peter.mueller@ipfire.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants