Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colon can be in a plain scalar #28

Closed
wants to merge 1 commit into from
Closed

Conversation

cloudwu
Copy link

@cloudwu cloudwu commented Dec 3, 2016

No description provided.

@dbeer1
Copy link

dbeer1 commented Dec 6, 2016

👍 This exact issue is preventing us from adopting libyaml, we have lots of documents with urls in them. I see this was discussed in #21, any ETA for merging? Thanks!

@ingydotnet
Copy link
Member

@dbeer1 are you saying you have lots of docs like:

url: http://example.com

or

- {url: http://example.com}

libyaml is perfectly fine with the former. I haven't seen much of the latter (urls in flow collections) in the wild.

@dbeer1
Copy link

dbeer1 commented Dec 7, 2016

Yeah, we have a lot of lines like this (in basically all our files because most of them were created from a template):

Urls:  [ http://127.0.0.1:1234, http://127.0.0.1:2345 ]

We are currently using yaml-cpp to parse, which can correctly read this document. Looking to switch to libyaml as it has been significantly faster in our tests and works on all of our platforms, but it chokes on documents like that.

@dbeer1
Copy link

dbeer1 commented Dec 7, 2016

I've read the various discussions on this:

It seems like this scanner error was introduced as a placeholder while the syntax of mappings was being discussed. Is this reason still valid? This error is somewhat problematic as other YAML parser and emitter implementations follow the spec and will create documents that cannot interoperate with libyaml. Searching the internet, I've seen many complaints about this behavior and talk of people patching it locally.

@ingydotnet
Copy link
Member

Reviewing the spec, it seems to be clear that a space must follow the colon, if the colon is to be considered a mapping separator.

I completely concur with that logic. Therefore I think we can move forward with this.

But...

I now have a working test suite and libyaml test harness here: https://github.com/ingydotnet/yaml-dev-kit/
If you clone that and run make test it will fetch and build the latest libyaml and run 160 tests. Someone should give me a PR for some tests for this. (Or identify if one exists). Or I'll do it after my bike ride.

One edge case is this: {foo:}. It fails but {foo} and {foo } work. I think they should all parse the same.

@dbeer1
Copy link

dbeer1 commented Dec 8, 2016

I agree that {foo:} should be equivalent to { foo : } and not { "foo:" : null }, but that does conflict with the spec as written right? On the other hand, its easy enough to code and does match the current pyyaml behavior (but not libyaml behavior!), which allows one of ',[]{}' to follow a colon in a flow context.

Compare https://github.com/yaml/pyyaml/blob/master/lib/yaml/scanner.py#L1301 vs https://github.com/yaml/libyaml/blob/master/src/scanner.c#L3438

@ingydotnet
Copy link
Member

@dbeer, right, these should all become valid:

- {foo:}
- {foo:{}}
- {foo:[]}
- {foo:,bar}
- [foo:]

I'll add tests for them to the suite. BTW, I just moved the test suite to
https://github.com/yaml/yaml-test-suite

@ingydotnet
Copy link
Member

Here's an existing test that is affected by this:

https://github.com/yaml/yaml-test-suite/blob/master/test/4ABK.tml

As you can see it is marked to be skipped by libyaml testing.

@dbeer
Copy link

dbeer commented Dec 8, 2016

I think you probably meant to address @dbeer1

@dbeer1
Copy link

dbeer1 commented Dec 8, 2016

I restored the empty value behavior in pyyaml in yaml/pyyaml#45. I guess we should do the same here with libyaml?

Now it processes your test as follows:

>>> import yaml
>>> print yaml.load("""
... - {foo:}
... - {foo:{}}
... - {foo:[]}
... - {foo:,bar}
... - [foo:]
... """)
[{'foo': None}, {'foo': {}}, {'foo': []}, {'foo': None, 'bar': None}, [{'foo': None}]]
>>> 

The case with omitted keys seems to be significantly more complicated as the grammers in the parser need to be reworked.

@dbeer1
Copy link

dbeer1 commented Feb 8, 2017

The pyyaml pull request has been merged and I think the logic here should be updated to match. That is, a ':' should terminate a scalar in the flow context if it is followed not just by a space but also by ',[]{}'. For example: {a:} -> { "a" : null } instead of {a:} -> { "a:" : null }

@ingydotnet
Copy link
Member

Thanks @dbeer1. I'll add a test to the YAML Test Suite for this.

sigmavirus24 pushed a commit to sigmavirus24/libyaml that referenced this pull request Apr 6, 2017
@jstoiko
Copy link

jstoiko commented Dec 22, 2017

any plans to merge this?

@perlpunk
Copy link
Member

I added a test for a plain url in a flow mapping: yaml/yaml-test-suite@1eacbab

I also updated tests/run-test-suite/test/libyaml-parser.list (not yet in master)
The PR looks good to me, all tests are still passing with the updated test list, and the new test is passing.
@ingydotnet @sigmavirus24 should I merge this?

Copy link
Contributor

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and merge @perlpunk

Copy link
Member

@ingydotnet ingydotnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me.

@perlpunk
Copy link
Member

perlpunk commented Dec 28, 2017

There's one thing that should be noted, though.
Previously this was invalid:

- {foo:}
- {foo:,bar}
- [foo:]

Now it is parsed valid, but the result is wrong, as the colon becomes part of the content:

- {"foo:": }
- {"foo:": ,bar}
- ["foo:": ]

Given that this is the wrong behaviour (as mentioned previously in this thread), the question is, does it make sense to add this fix, making something become valid, but wrong?

Cc @sigmavirus24 @ingydotnet

@perlpunk
Copy link
Member

I also updated tests/run-test-suite/test/libyaml-parser.list (not yet in master)
The PR looks good to me, all tests are still passing with the updated test list, and the new test is passing.

I also wanted to add a script for testing error cases (#81) before merging this, however, @ingydotnet requested changes for which I don't have time now, so this has to go without the additional tests.

@perlpunk
Copy link
Member

So it seems we are not sure what the correct behaviour is.
Here's a gist of what popular processors do:
https://gist.github.com/anonymous/c3e98f57c0cc9cdc71fd28b7681e280b

@ingydotnet
Copy link
Member

I think that {foo:} is probably the same semantic as {"foo:": }. I also find it an unfortunate edge case. But also it's a rare edge case.

I think we can shore this up in 1.3, and that applying this patch is correct for now.

I'd like to hold off on applying this until we hear from @flyx.

@ingydotnet ingydotnet mentioned this pull request Jan 7, 2018
@ingydotnet ingydotnet added this to Ideas in Release 0.1.8 Jan 8, 2018
@jstoiko
Copy link

jstoiko commented Jan 12, 2018

I couldn't find tests for those edge cases in yaml-test-suite.

@perlpunk: where does that gist come from? I couldn't find it anywhere in the test suite.

@perlpunk
Copy link
Member

@jstoiko yeah, these cases are not yet in the test suite because we still have to figure out the correct results :-/
I created that gist with https://github.com/yaml/yaml-editor

@flyx
Copy link

flyx commented Jan 29, 2018

{foo:} is perfectly valid as per YAML 1.2. However, in YAML 1.1, the colon indeed belongs to the scalar following the productions in the spec. 1.1 requires a s-separate(n,c) (coming from here) after the colon and } is no separation character.

Allowing this basically makes libyaml implement neither 1.1 nor 1.2 (since there are more 1.2 changes it is not aware of). However, it takes its liberties anyway. I have no strong opinion about whether it is a good idea to allow it.

@asomov
Copy link

asomov commented Jan 31, 2018

If this is accepted is makes other parsers inconsistent.
For instance SnakeYAML implements 1.1. If this change is accepted users of SnakeYAML will complain that the very same YAML document is parsed with this parser, but cannot be parsed with SnakeYAML.

perlpunk added a commit to perlpunk/libyaml that referenced this pull request Feb 2, 2018
This is a followup to yaml#28

See http://yaml.org/spec/1.1/#nb-plain-char(c) and the following
productions.

This commit will allow `[http://example]`, but still fail for:
- `[:foo]`
- `[foo:]`
@perlpunk
Copy link
Member

perlpunk commented Feb 2, 2018

I created PR #104
It will only support things like [http://example], but not the other cases mentioned in #28 (comment)

perlpunk added a commit to perlpunk/libyaml that referenced this pull request Feb 2, 2018
This is a followup to yaml#28

See http://yaml.org/spec/1.1/#nb-plain-char(c) and the following
productions.

This commit will allow `[http://example]`, but still fail for:
- `[:foo]`
- `[foo:]`
@perlpunk
Copy link
Member

perlpunk commented Feb 2, 2018

@asomov Allowing [a:b] conforms to the 1.1 spec (and IMHO makes sense), so supporting this should be ok, and SnakeYAML should probably allow it, too. See also my comments in #104

@perlpunk
Copy link
Member

perlpunk commented Feb 2, 2018

@asomov and, btw, YAML parsers are already pretty inconsistent: http://matrix.yaml.io/ ;-)

@jstoiko
Copy link

jstoiko commented Feb 2, 2018

FYI, go-yaml has already addressed the issue: go-yaml/yaml#295

@asomov
Copy link

asomov commented Feb 3, 2018

@perlpunk : the page you show is inconsistent, not the parsers. Why they try to test 1.2 spec against the parsers which explicitly do not support it !
PyYAML, libyaml, SnakeYAML are consistent - they succeed or fail together. If libyaml decides to accepts this (inconsistent) patch then it fall out of the group.

@asomov
Copy link

asomov commented Feb 3, 2018

@jstoiko : why do you propose this change to all the parsers WITHOUT proposal for the corner cases ?

@perlpunk
Copy link
Member

perlpunk commented Feb 3, 2018

@asomov

PyYAML, libyaml, SnakeYAML are consistent - they succeed or fail together.

That's not true. PyYAML accepts [ http://example.org ]. Have a closer look at the test cases to see that there are some differences.

Why they try to test 1.2 spec against the parsers which explicitly do not support it !

Well, sure, they should be tested against 1.1 test cases. But none of us has time to write a test suite for 1.1. We already have enough to do with 1.2.
The only solution would be to remove a lot of parsers from the matrix, but I don't want that. I think it's useful to see how they behave against 1.2 test cases, even if they implement 1.1 only. To me, YAML 1.2 is the successor of 1.1 which fixes a number of things, and ideally parsers should have adopted it, but I can understand why that didn't really happen for existing 1.1 parsers. That's a reason why @ingydotnet started the https://github.com/yaml/yaml-test-suite. There should have been a test suite from the beginning.

I could try to make it more clear on the matrix page which parsers implement 1.1/1.0. What do you think, would that help? Maybe create an issue for the https://github.com/perlpunk/yaml-test-matrix, and then we can discuss it there, since it's probably off topic here.

@asomov
Copy link

asomov commented Feb 3, 2018

@perlpunk : PyYAML 3.12 does exactly what it should - rejects to accept inconsistent colon

`
python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import yaml
from yaml import load, dump
from yaml import Loader, Dumper
data = load(' [ http://example.org ]', Loader=Loader)
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/yaml/init.py", line 72, in load
return loader.get_single_data()
File "/usr/local/lib/python3.5/dist-packages/yaml/constructor.py", line 35, in get_single_data
node = self.get_single_node()
File "/usr/local/lib/python3.5/dist-packages/yaml/composer.py", line 35, in get_single_node
if not self.check_event(StreamEndEvent):
File "/usr/local/lib/python3.5/dist-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/local/lib/python3.5/dist-packages/yaml/parser.py", line 143, in parse_implicit_document_start
StreamEndToken):
File "/usr/local/lib/python3.5/dist-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/local/lib/python3.5/dist-packages/yaml/scanner.py", line 252, in fetch_more_tokens
return self.fetch_plain()
File "/usr/local/lib/python3.5/dist-packages/yaml/scanner.py", line 676, in fetch_plain
self.tokens.append(self.scan_plain())
File "/usr/local/lib/python3.5/dist-packages/yaml/scanner.py", line 1299, in scan_plain
"Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details.")
yaml.scanner.ScannerError: while scanning a plain scalar
in "", line 1, column 4:
[ http://example.org ]
^
found unexpected ':'
in "", line 1, column 8:
[ http://example.org ]
^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details.

`

@perlpunk
Copy link
Member

perlpunk commented Feb 3, 2018

@asomov ok, I can confirm that when testing that locally. The PyYAML in https://github.com/yaml/yaml-editor, however, accepts it. That's because the PyYAML in yaml-editor uses PyYAML master branch and this already has the fix from yaml/pyyaml#45.
But looking at the Spec and example "Example 9.12. Plain Characters", it's clear to me that the intent was to allow colons (even at the beginning):

# Outside flow collection:
- ::std::vector
- Up, up and away!
- -123
# Inside flow collection:
- [ ::std::vector,
  "Up, up and away!",
  -123 ]

Only that production number 154/155 seem to have a mistake, allowing also {}[], after the colon, which doesn't make sense (to me), because:

Inside flow collections, plain scalars are further restricted to avoid
containing the “[”, “]”, “{”, “}” and “,” characters [...]

http://yaml.org/spec/1.1/#id907281 "9.1.3. Plain"
http://yaml.org/spec/1.1/#nb-plain-char-in
What do you think?

@asomov
Copy link

asomov commented Feb 3, 2018

@perlpunk (as I already said many times)

  1. YAML spec 1.1 is inconsistent.
  2. Any attempt to "fix" 1.1 parsers is inconsistent (see p 1)
  3. Instead of waisting time to mess with parsers come with a proposal for YAML 1.3 or YAML 2.
  4. Spend your time implementing the new spec
  5. Enjoy.

@flyx
Copy link

flyx commented Feb 3, 2018

I would say that it makes perfect sense to allow [ http://example.org ] since this is valid YAML 1.1. Is there any reason to challenge that? There is nothing in the spec's productions that forbid that, so a conforming parser has to parse it.

However, {foo:} is invalid as discussed above and in this case, I agree with @asomov that „fixing“ this might not be desirable because it leads to further inconsistencies between implementations. People wanting that to be fixed should go with a YAML 1.2 parser, since 1.2 is the revision that fixes this in the spec.

@perlpunk
Copy link
Member

perlpunk commented Feb 3, 2018

@asomov

Instead of waisting time to mess with parsers come with a proposal for YAML 1.3 or YAML 2.

We are working on YAML 1.3: https://github.com/yaml/summit.yaml.io/wiki/YAML-RFC-Index

Spend your time implementing the new spec

I'm implementing a Processor in Perl for 1.2, and I'm trying to design it in a way that it can also support 1.3 in the future.

I'm contributing to yaml-test-suite and yaml-editor, and I created yaml-test-matrix to avoid a mess in the future.

@perlpunk
Copy link
Member

perlpunk commented Feb 3, 2018

@flyx so you would support #104?

@perlpunk
Copy link
Member

perlpunk commented Feb 3, 2018

to make it clear: I'd vote for closing this and instead use #104

perlpunk added a commit that referenced this pull request Jun 25, 2018
This is a followup to #28

See http://yaml.org/spec/1.1/#nb-plain-char(c) and the following
productions.

This commit will allow `[http://example]`, but still fail for:
- `[:foo]`
- `[foo:]`
perlpunk added a commit to perlpunk/libyaml that referenced this pull request Jun 25, 2018
This is a followup to yaml#28

See http://yaml.org/spec/1.1/#nb-plain-char(c) and the following
productions.

This commit will allow `[http://example]`, but still fail for:
- `[:foo]`
- `[foo:]`
ingydotnet pushed a commit that referenced this pull request Jun 30, 2018
This is a followup to #28

See http://yaml.org/spec/1.1/#nb-plain-char(c) and the following
productions.

This commit will allow `[http://example]`, but still fail for:
- `[:foo]`
- `[foo:]`

See https://gist.github.com/perlpunk/de2c631a7b0e9002de351a680eadb573
for all the relevant cases where this change applies.
@perlpunk
Copy link
Member

#104 was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Release 0.1.8
  
Possible Release Changes
Development

Successfully merging this pull request may close these issues.

None yet

9 participants