Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing numbers according to YAML 1.2 #627

Open
david-mohr opened this issue Jul 6, 2021 · 6 comments
Open

Parsing numbers according to YAML 1.2 #627

david-mohr opened this issue Jul 6, 2021 · 6 comments

Comments

@david-mohr
Copy link

Based on the spec for YAML 1.2, underscores shouldn't be allowed in numbers:

https://yaml.org/spec/1.2/2009-07-21/spec.html#id2604185

However the latest js-yaml still allows them:

package.json

{
  "dependencies": {
    "js-yaml": "^4.1.0"
  }
}

test.js

const yaml = require('js-yaml');
const data = yaml.load(`
string: '1_2_3'
also_string: 1_2_3
`);
console.log(typeof data.string);
console.log(typeof data.also_string);
$ node test.js
string
number

Is this intended behaviour? Thanks in advance.

@rlidwka
Copy link
Member

rlidwka commented Jul 15, 2021

Does it break anything? Are there any strings with numbers and underscores that should stay strings?

found this issue, kinda similar symfony/symfony#27120

Based on the spec for YAML 1.2, underscores shouldn't be allowed in numbers

YAML 1.2 doesn't define how numbers are parsed. As far as YAML is concerned, there're just 3 types: maps, collections and scalars. How they are represented in memory is up to parser/schema. Reference parser wouldn't tell you if it's a number or not.

What you linked to is a recommended default schema. I would be very hesitant to make a subset of it, but superset might be fine for historical/interoperability reasons.

We used to strictly implement https://yaml.org/type/int.html schema (the one 1.2 spec refers to in section 10.4.). Unfortunately, it has problems (people confused sexagesimals with port numbers, and octals didn't align well with ecmascript changes), so it was reduced a bit. But underscores remained from there.

So the questions are:

  • how other 1.2 parsers do it?
  • is it worth breaking existing documents?

I'm looking at ruamel.yaml (that was one of the first yaml 1.2 parsers I remember), it casts digits with underscores to numbers as well:

$ python3
>>> from ruamel.yaml import YAML
>>> YAML().load("foo: 1_2_3_4")
ordereddict([('foo', 1234)])

@perlpunk
Copy link

  1. I think the behaviour should reflect either the one from YAML 1.1, or one of the YAML 1.2 schemas. A mixture is unfortunate, and it will never be in sync to what other libraries are doing
  2. You can use this repo https://perlpunk.github.io/yaml-test-schema/ to test behaviour when you implement the YAML 1.2 Core Schema. (Note that the JSON schema is defined a bit ambiguously)

how other 1.2 parsers do it?

Other YAML 1.2 libraries use the YAML 1.2 Core schema as the default, e.g. https://github.com/eemeli/yaml, https://github.com/perlpunk/YAML-PP-p5

Note that a "parser" doesn't do anything here, it's the constructor. A YAML parser only cares about the syntax.

is it worth breaking existing documents?

Well, it depends on what you do by default.
That should stay the same, and people can opt in into a new schema. Or you make an explicit breaking change and default to the Core schema.

@rlidwka
Copy link
Member

rlidwka commented Jul 15, 2021

I've checked, there are 2 differences with YAML 1.2 core schema left:

  • support for underscores in integers/floats (we do, they don't)
  • support for binary ints like 0b10101 (we do, they don't)

change for next major version maybe, not sure

@perlpunk do you have any idea why YAML 1.2 spec removed those?

@perlpunk
Copy link

perlpunk commented Jul 15, 2021

I don't know why binary was dropped.
But for underscores, it turned out that it didn't make sense the way it was defined for 1.1 (and the regular expression would get too complicated if the goal was to fix it), e.g. should .1_ or . or ._ or even .________ be parsed as a number? The regex says yes.

@david-mohr
Copy link
Author

We use yaml-cpp which is where we first ran into the conflict: jbeder/yaml-cpp#781 (they don't support underscore)

I'm happy to update the PR to make underscores "opt out". I would just need some guidance on what you want to call the property or how to make a new schema.

@pimterry
Copy link

I have a concrete case where this breaks things. This OpenAPI spec uses 18_24 as a map key. When parsed with js-yaml, that key becomes 1824, breaking internal references and making that spec completely unparseable.

The yaml package from npm parses this correctly (as a string key) but js-yaml does not:

> yaml = require('yaml')
> jsYaml = require('js-yaml')

> yaml.parse('18_24: test')
{ '18_24': 'test' }

> jsYaml.load('18_24: test')
{ '1824': 'test' }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@perlpunk @rlidwka @pimterry @david-mohr and others