Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Support for the YAML 1.2 Core and JSON schemas #512

Closed
wants to merge 2 commits into from

Conversation

perlpunk
Copy link
Member

@perlpunk perlpunk commented Mar 26, 2021

Superseded by #555

This is a draft and subject to discussion.
See also #486

Thanks to @SUSE for another hackweek! I had four days of work time dedicated to an open source project of my choice. https://hackweek.suse.com/20/projects/yaml-1-dot-2-schema-support-for-pyyaml

This PR depends on #483

Introduction

For a quick overview of the schema changes between YAML 1.1 and 1.2, look here: https://perlpunk.github.io/yaml-test-schema/schemas.html

While also the syntax was changed in YAML 1.2, this pull request is about the schema changes.
As an example, in 1.1, Y, yes, NO, on etc. are resolved as booleans in 1.1.
This sounds convenient, but also means that all these 22 different strings must be quoted if they are not meant as booleans. A very common obstacle is the country code for Norway, NO ("Norway Problem").
In YAML 1.2 this was improved by reducing the list of boolean representations.

Also other types have been improved. The 1.1 regular expression for float allows . and ._ as floats, although there isn't a single digit in these strings.

While the 1.2 Core Schema, the recommended default for 1.2, still allows a few variations (true, True and TRUE, etc.), the 1.2 JSON Schema is there to match JSON behaviour regarding types, so it allows only true and false.

Current State

PyYAML implements the 1.1 types (with a few changes like leaving out the single character booleans y, Y etc.), and it was never updated to support one of the 1.2 Schemas.

Problem

Besides the above mentioned problems with the 1.1 types, more and more libraries are created or updated for YAML 1.2, probably also thanks to the relatively new YAML Test Suite, and PyYAML should be able to read and write YAML files used or produced by other libraries.

This PR

The PyYAML Safeloader, which is currently the most recommended Loader if you don't need special behaviour, implements YAML 1.1 types.
I added CoreLoader, CoreResolver, CoreConstructor, CoreRepresenter, CoreDumper, and the same for JSONLoader etc.

The suggestion is that CoreLoader and JSONLoader are recommended to try out, and the other mentioned classes might be subject to changes or removal in a later release. This way we have time to figure out a better API, while users can already use the new top level Core Loader.

One problem is that PyYAML's callbacks are class based, and while I was able to make the code a bit more compact via a dictionary of types/callbacks, there are still method calls which must be in a certain class.
The !!merge << key for example needs special handling.

That way it's tedious to add custom Loaders. Turning the class based approach into an instance based is on our wishlist.

One example use case we have in mind is, that you want to use the 1.2 CoreLoader, but on top of that you want it to recognize timestamps and mergekeys.
Or you want a very basic loader that should treat everything as a string except booleans and null.

Example

import yaml

input = """
- TRUE
- yes
- ~
- true
- .inf
- 23
- 010
- 0o10
- 0b100
- 0x20
- -0x20
- 1_000
- 3:14
- 0011
- +0
- 0001.23
"""

data = yaml.load(input, Loader=yaml.BaseLoader)
print(data)
print('--------------------------------------------- SafeLoader')
data = yaml.load(input, Loader=yaml.SafeLoader)
print(data)
print('--------------------------------------------- CoreLoader')
data = yaml.load(input, Loader=yaml.CoreLoader)
print(data)
print('--------------------------------------------- JSONLoader')
data = yaml.load(input, Loader=yaml.JSONLoader)
print(data)

print('--------------------------------------------- SafeDumper')
out = yaml.dump(data, Dumper=yaml.SafeDumper, default_flow_style=True)
print(out)
print('--------------------------------------------- CoreDumper')
out = yaml.dump(data, Dumper=yaml.CoreDumper, default_flow_style=True)
print(out)
print('--------------------------------------------- JSONDumper')
out = yaml.dump(data, Dumper=yaml.JSONDumper, default_flow_style=True)
print(out)

edit: some of the tests are failing, but this is unrelated and caused by an issue with Github Actions caching

@dmazin
Copy link

dmazin commented Apr 19, 2021

@perlpunk Thank you for making this happen! I salute you!

@shelper
Copy link

shelper commented Jun 3, 2021

any progress? couldn't wait to use

@perlpunk
Copy link
Member Author

Superseded by #555

@ingydotnet ingydotnet removed this from Assigned in PyYAML 6.0 Release Planning Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants