Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

project_loader: handle invalid unicode chars #1941

Merged
merged 6 commits into from Feb 27, 2018

Conversation

kalikiana
Copy link
Contributor

@kalikiana kalikiana commented Feb 19, 2018

  • Have you followed the guidelines for contributing?
  • Have you signed the CLA?
  • If this is a bugfix. Have you checked that there is a bug report open for the issue you are trying to fix on bug reports?
  • If this is a new feature. Have you discussed the design on the forum?
  • Have you successfully run ./runtests.sh static?
  • Have you successfully run ./runtests.sh unit?

Fixes: LP: #1737571

Note that the exception is currently raised with valid and invalid unicode chbaracters due to the upstream bug pyyaml#25. But we'll want to handle the error cleanly even if the upstream issue is fixed.

Steps to reproduce:

$ snap install snapcraft --classic --edge
$ snapcraft init
$ # Modify "summary" or "version" to include the 馃挬 emoji
$ snapcraft pull
$ # Build fails to validate snapcraft.yaml

This branch adds a patch to the PyYAML used by the Snapcraft snap to handle the unicode code points erroneously flagged as invalid, such as the 馃挬 emoji. A Snapcraft snap built from this PR will successfully validate a summary or description making use of the 馃挬 emoji.

New test cases:

  • tests.unit.project_loader.test_config.test_invalid_yaml_invalid_unicode_chars
  • tests.integration.general.test_global_properties

Note: The integration level test for the PyYAML work-around is skipped unless testing with a snap or Debian package (ie. SNAPCRAFT_FROM_SNAP=1 is set or SNAPCRAFT_FROM_DEB=1 is set), both of which are patched. I verified with a snap from the branch, and running the tests in a virtual environment respectively.

sergiusens
sergiusens previously approved these changes Feb 19, 2018
@chipaca
Copy link
Contributor

chipaca commented Feb 19, 2018

Why is that invalid in a summary/title/description? What is a valid summary/title/description?

(I don't think summaries, titles nor descriptions can nor should be ascii-only)

@kalikiana
Copy link
Contributor Author

@chipaca See my note in the description. There's a bug in pyyaml's handling of the unicode spec.

@chipaca
Copy link
Contributor

chipaca commented Feb 19, 2018

yes, that's what's brought me here: you're upgrading a "the yaml parser we chose to use does not handle this" to "this is not allowed". it's fine to say "not supported (use a \u escape sequence instead)", or even do the conversion ourselves ahead of time, but to say "not allowed" is a lie (you can work around it by using a \u escape), and not what we want either (these fields are for human-readable text, ie unicode).

There is validation to be done on these fields: I don't think they should include any character from the private use area nor any control character. But that's separate.

@chipaca
Copy link
Contributor

chipaca commented Feb 19, 2018

FWIW,

>>> sys.stdout.buffer.write("馃挬\n".encode("ascii", errors='backslashreplace'))
\U0001f4a9

@kalikiana
Copy link
Contributor Author

Seems like the Debian package was patched, so the version in pip behaves differently, causing the failure in CI. I changed the original test case now to use a code point that's actually invalid vs. parsed as invalid upstream, and also added a work-around so that emoji like in the bug report are considered valid regardless of version.

Copy link
Contributor

@chipaca chipaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and sorry for the noise. 馃憤

@sergiusens sergiusens dismissed their stale review February 20, 2018 09:29

the patch should be brought into the snap

@sergiusens sergiusens added this to the 2.40 milestone Feb 27, 2018
@sergiusens sergiusens added the Bug Actual bad behavior that don't fall into maintenance or documentation label Feb 27, 2018
@@ -47,6 +48,7 @@ parts:
TRIPLET_PATH="$SNAPCRAFT_PART_INSTALL/usr/lib/$(gcc -print-multiarch)"
LIBSODIUM=$(readlink -n $TRIPLET_PATH/libsodium.so.18)
ln -s $LIBSODIUM $TRIPLET_PATH/libsodium.so
patch -d $SNAPCRAFT_PART_INSTALL/lib/python3.6/site-packages -p1 < $SNAPCRAFT_STAGE/pyyaml-support-high-codepoints.diff
after: [python, apt]
Copy link
Contributor

@kyrofa kyrofa Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to add patches to after here. It has python which has patches in ITS after... but that will break if it ever changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the after for python includes this, so the chain is somewhat complete; I will sort of ignore this as we are going back to stage-packages as soon as our SRUed deb gets in with the fixes for classic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch otherwise ;-)

Copy link
Contributor

@kyrofa kyrofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit in the YAML, but I'll leave my +1 here anyway.

@sergiusens
Copy link
Collaborator

tested the snap, also works.

@sergiusens sergiusens merged commit e9775f2 into canonical:master Feb 27, 2018
@sergiusens sergiusens added this to Bugs & Tech debt in 18.04 Mar 26, 2018
xnox pushed a commit to xnox/snapcraft that referenced this pull request Jul 3, 2020
Note that the exception is currently raised with valid and invalid unicode chbaracters due to the upstream bug pyyaml#25. But we'll want to handle the error cleanly even if the upstream issue is fixed.

This branch adds a patch to the PyYAML used by the Snapcraft snap to handle the unicode code points erroneously flagged as invalid, such as the hankey emoji. A Snapcraft snap built from this PR will successfully validate a summary or description making use of the hankey emoji.

New test cases:

    tests.unit.project_loader.test_config.test_invalid_yaml_invalid_unicode_chars
    tests.integration.general.test_global_properties

Note: The integration level test for the PyYAML work-around is skipped unless testing with a snap or Debian package (ie. SNAPCRAFT_FROM_SNAP=1 is set or SNAPCRAFT_FROM_DEB=1 is set), both of which are patched. I verified with a snap from the branch, and running the tests in a virtual environment respectively.

LP: #1737571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Actual bad behavior that don't fall into maintenance or documentation
Projects
18.04
Bugs & Tech debt
Development

Successfully merging this pull request may close these issues.

None yet

4 participants