Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle top level yaml property on #696

Open
wsams opened this issue Jan 31, 2023 · 4 comments
Open

Handle top level yaml property on #696

wsams opened this issue Jan 31, 2023 · 4 comments

Comments

@wsams
Copy link

wsams commented Jan 31, 2023

A document containing a top-level key named on is rendered as a boolean value instead of the literal string on when loading and dumping a document.

If a top level yaml property matches the regex for the resolver,

Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:bool',
        re.compile(r'''^(?:yes|Yes|YES|no|No|NO
                    |true|True|TRUE|false|False|FALSE
                    |on|On|ON|off|Off|OFF)$''', re.X),
        list('yYnNtTfFoO'))

the boolean value after compiling is rendered in yaml output literally. So not just on, but words like off and no as well.

For example,

import yaml

tmpl = """on:
  push:
    branches: [ main ]
"""

print(yaml.dump(yaml.load(tmpl, Loader=yaml.Loader)))

will output,

true:
  push:
    branches:
    - main

I would expect,

on:
  push:
    branches:
    - main

but the word on is replaced with a boolean value true.

I'm using pyyaml==6.0 with Python 3.10.9.

Update: It appears any key that is a boolean value will be parsed as a boolean. For example,

no:
  push:
    branches:
    - main

also renders as,

false:
  push:
    branches:
    - main
@wsams
Copy link
Author

wsams commented Feb 1, 2023

I tracked down the issue to the regex in the BaseResolver.resolve function. Here below is the resolve function with a hack that solves this particular issue (just to point out the area of interest). It makes sense now why on and off and no are parsed as booleans. They match that regex. I added a comment in the function below for how I "patched" the issue.

    def resolve(self, kind, value, implicit):
        if kind is ScalarNode and implicit[0]:
            if value == '':
                resolvers = self.yaml_implicit_resolvers.get('', [])
            else:
                resolvers = self.yaml_implicit_resolvers.get(value[0], [])
            wildcard_resolvers = self.yaml_implicit_resolvers.get(None, [])
            for tag, regexp in resolvers + wildcard_resolvers:


                # RIGHT HERE I added the additional condition to exclude "on"
                if regexp.match(value) and value != "on":



                    return tag
            implicit = implicit[1]
        if self.yaml_path_resolvers:
            exact_paths = self.resolver_exact_paths[-1]
            if kind in exact_paths:
                return exact_paths[kind]
            if None in exact_paths:
                return exact_paths[None]
        if kind is ScalarNode:
            return self.DEFAULT_SCALAR_TAG
        elif kind is SequenceNode:
            return self.DEFAULT_SEQUENCE_TAG
        elif kind is MappingNode:
            return self.DEFAULT_MAPPING_TAG

This is probably not a viable solution, so if anyone has any thoughts on how to fix it I would look into submitting a PR.

@nitzmahone
Copy link
Member

nitzmahone commented Feb 1, 2023

Yeah, sadly the way the implicit resolver setup works today against the default resolver class at import-time:

pyyaml/lib/yaml/resolver.py

Lines 170 to 175 in 957ae4d

Resolver.add_implicit_resolver(
'tag:yaml.org,2002:bool',
re.compile(r'''^(?:yes|Yes|YES|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF)$''', re.X),
list('yYnNtTfFoO'))

... it's tricky to robustly work around this particular 1.1ism at runtime without monkeypatching.

If you just want a local patch to make it only implicitly recognize 1.2 booleans, that's pretty easy- swap out the existing bool resolver in the above location with this one:

Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:bool',
        re.compile(r'^(?:true|false)$', re.X),
        list('tf'))

Monkeypatching the default resolver at runtime is a little harder, but not terrible (and much cheaper than what you're doing now)- just clear out the implicit resolver dispatch table and repopulate with only the ones you want, eg:

import re
import yaml

from yaml.resolver import Resolver

# zap the Resolver class' internal dispatch table
Resolver.yaml_implicit_resolvers={}

# note the 1.2 bool impl here
Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:bool',
        re.compile(r'^(?:true|false)$', re.X),
        list('tf'))

# and now the rest of the default implicit resolvers
Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:float',
        re.compile(r'''^(?:[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+][0-9]+)?
                    |\.[0-9][0-9_]*(?:[eE][-+][0-9]+)?
                    |[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
                    |[-+]?\.(?:inf|Inf|INF)
                    |\.(?:nan|NaN|NAN))$''', re.X),
        list('-+0123456789.'))

Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:int',
        re.compile(r'''^(?:[-+]?0b[0-1_]+
                    |[-+]?0[0-7_]+
                    |[-+]?(?:0|[1-9][0-9_]*)
                    |[-+]?0x[0-9a-fA-F_]+
                    |[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$''', re.X),
        list('-+0123456789'))

Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:merge',
        re.compile(r'^(?:<<)$'),
        ['<'])

Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:null',
        re.compile(r'''^(?: ~
                    |null|Null|NULL
                    | )$''', re.X),
        ['~', 'n', 'N', ''])

Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:timestamp',
        re.compile(r'''^(?:[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]
                    |[0-9][0-9][0-9][0-9] -[0-9][0-9]? -[0-9][0-9]?
                     (?:[Tt]|[ \t]+)[0-9][0-9]?
                     :[0-9][0-9] :[0-9][0-9] (?:\.[0-9]*)?
                     (?:[ \t]*(?:Z|[-+][0-9][0-9]?(?::[0-9][0-9])?))?)$''', re.X),
        list('0123456789'))

Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:value',
        re.compile(r'^(?:=)$'),
        ['='])

# The following resolver is only for documentation purposes. It cannot work
# because plain scalars cannot start with '!', '&', or '*'.
Resolver.add_implicit_resolver(
        'tag:yaml.org,2002:yaml',
        re.compile(r'^(?:!|&|\*)$'),
        list('!&*'))

print(yaml.safe_load('hi_mom: on'))

We have grand plans to make this kind of thing way easier with the 1.2 schema config support, but life keeps getting in the way... :(

Feel free to grab this hack, but also no guarantees that it'll work forever- after all it is reaching deep into the guts and current implementation details of the default resolver 😉

@mheguy-flo
Copy link

I ran into this problem. My quick fix was to change on in my yaml to 'on'.

@perlpunk
Copy link
Member

perlpunk commented May 8, 2024

See #486 for all related issues regarding YAML 1.2 support.

You can use the following project on top of PyYAML for YAML 1.2 support: https://pypi.org/project/yamlcore/

>>> import yamlcore
>>> print(yaml.dump(yaml.load(tmpl, Loader=yamlcore.CoreLoader), Dumper=yamlcore.CoreDumper))
on:
  push:
    branches:
    - main

Also see https://perlpunk.github.io/yaml-test-schema/schemas.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@wsams @perlpunk @nitzmahone @mheguy-flo and others