Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from pyyaml to ruamel.yaml #3170

Closed
FichteFoll opened this issue Feb 27, 2019 · 7 comments
Closed

Switch from pyyaml to ruamel.yaml #3170

FichteFoll opened this issue Feb 27, 2019 · 7 comments
Labels
duplicate feature features we would like to implement

Comments

@FichteFoll
Copy link
Contributor

Problem

The config parser does not accept the !!omap tag for an ordered mapping.

YAML mappings are, by default unordered, so in order to ensure that a mapping is to be parsd and represented in order, one need to use the ordered map format, indicated by !!omap and a list of one-elemental mappings. However, this is not supported.

~ Ξ beet --version
configuration error: file /home/fichte/.config/beets/config.yaml could not be read: expected a mapping node, but found sequence
  in "/home/fichte/.config/beets/config.yaml", line 11, column 8

Config file:

directory: /data/audio/music/sorted
library: /data/audio/music/sorted/musiclibrary.db

per_disc_numbering: true

import:
  copy: true
  log: beetslog.txt
paths: !!omap
  - sourcetype:Anime: '#soundtrack/%ifdef{source,$source/}$album%aunique{}%ifdef{anime, [$anime]}/$disc_and_track - $title'
  - doujin:true: "#%ifdef{touhou,touhou,doujin}\
      /$albumartist_or_label\
      /$long_date - $album%aunique{}%ifdef{event, [$event]}\
      /$disc_and_track - $title"
  - albumtype:soundtrack: '#soundtrack/%if{source,$source/}$year - $album%aunique{}/$disc_and_track - $title'
  - comp: $albumartist_or_label/$year - $album%aunique{}/$disc_and_track - $artist - $title
  - singleton: $albumartist/$title
  - default: $albumartist/$year - $album%aunique{}/$disc_and_track - $title

album_fields:
  long_date: '"-".join(f"{n:02}" for n in [year, month, day] if n)'
  soundtrack_source: 'f"{source}/" if locals().get("source", None) and source != album else ""'

item_fields:
  disc_and_track:
    f"{disc}.{track:0{max(2, len(str(tracktotal)))}}"
    if disctotal > 1
    else f"{track:0{max(2, len(str(tracktotal)))}}"
  albumartist_or_label: label if comp and label else albumartist
plugins: fetchart lastgenre missing duplicates inline replaygain

lastgenre:
  count: 1

replaygain:
  backend: gstreamer

Via setup.py, beets uses pyyaml. I'd like to suggest using ruamel.yaml instead, which is the only YAML 1.2 parser in Python that I know of and has been maintained for a long time. pyyaml has recently switched maintainers, but there are still many inherent problems with the distribution, most notably yaml.load being unsafe by default, and the fact it still only parses YAML1.1.
This is only a minor inconvenience as the current setup works, although it technically shouldn't be relied on by the YAML spec.

Setup

@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label Feb 27, 2019
@sampsyo
Copy link
Member

sampsyo commented Feb 27, 2019

Hello! For what it's worth, !!omap is actually not necessary for your paths configuration. We have specifically customized pyyaml to parse all mappings as Python OrderedDicts. We like this approach because it makes configurations easier to write by hand.

If you're curious, here's where the magic happens:

beets/beets/util/confit.py

Lines 616 to 664 in f6ad98a

class Loader(yaml.SafeLoader):
"""A customized YAML loader. This loader deviates from the official
YAML spec in a few convenient ways:
- All strings as are Unicode objects.
- All maps are OrderedDicts.
- Strings can begin with % without quotation.
"""
# All strings should be Unicode objects, regardless of contents.
def _construct_unicode(self, node):
return self.construct_scalar(node)
# Use ordered dictionaries for every YAML map.
# From https://gist.github.com/844388
def construct_yaml_map(self, node):
data = OrderedDict()
yield data
value = self.construct_mapping(node)
data.update(value)
def construct_mapping(self, node, deep=False):
if isinstance(node, yaml.MappingNode):
self.flatten_mapping(node)
else:
raise yaml.constructor.ConstructorError(
None, None,
u'expected a mapping node, but found %s' % node.id,
node.start_mark
)
mapping = OrderedDict()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
try:
hash(key)
except TypeError as exc:
raise yaml.constructor.ConstructorError(
u'while constructing a mapping',
node.start_mark, 'found unacceptable key (%s)' % exc,
key_node.start_mark
)
value = self.construct_object(value_node, deep=deep)
mapping[key] = value
return mapping
# Allow bare strings to begin with %. Directives are still detected.
def check_plain(self):
plain = super(Loader, self).check_plain()
return plain or self.peek() == '%'

Thanks for pointing out ruamel.yaml, which does indeed look like it might be a better bet in the future. It's worth exploring!

@FichteFoll
Copy link
Contributor Author

FichteFoll commented Feb 28, 2019

For what it's worth, !!omap is actually not necessary for your paths configuration.

Thanks for the heads-up. This is mentioned in the documentation (although I had to look for quite a while earlier today until I found it) and I have been using this feature ever since I started using beets. I've also implemented preservation of dict order by myself in some pyyaml-based parser before.

By the way, another problem with pyyaml is that it encourages adding parsers or representers for certain tags or Python objects to the global default loaders. ruamel.yaml additionally has a different API where you only need to modify your instance of a loader/dumper instead. (It also still supports the old pyyaml version for compatibility, however.)

@sampsyo sampsyo added feature features we would like to implement and removed needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." labels Feb 28, 2019
@sampsyo sampsyo changed the title !!omap in config file not respected Switch from pyyaml to ruamel.yaml Feb 28, 2019
@sampsyo
Copy link
Member

sampsyo commented Feb 28, 2019

Cool! Thanks for the extra details. That (avoiding global modifications to the loaders) does sound very attractive—it would be worth investigating a switch. I'm changing the title of this issue to make it about that.

@FichteFoll
Copy link
Contributor Author

Actually, it looks like pyyaml is finally getting some traction: yaml/pyyaml#257 (also related yaml/pyyaml#193, CHANGES). Of course, if you were using safe_load already, this particular issue doesn't affect you.
The global loader modifications are still there, however.

Initially the issue wasn't about prompting a switch, although I find it weird that pyyaml by default doesn't load !!omap. Because for me it does, although not into a dict (?).

In [1]: import yaml

In [3]: yaml.safe_load("""
   ...: !!omap
   ...: - this: is a
   ...: - map: but
   ...: - with: all keys
   ...: - in: order
   ...: """)
Out[3]: [('this', 'is a'), ('map', 'but'), ('with', 'all keys'), ('in', 'order')]

In [4]: from ruamel.yaml import YAML

In [5]: yaml2 = YAML()

In [7]: yaml2.load("""
   ...: !!omap
   ...: - this: is a
   ...: - map: but
   ...: - with: all keys
   ...: - in: order
   ...: """)
Out[7]: CommentedOrderedMap([('this', 'is a'), ('map', 'but'), ('with', 'all keys'), ('in', 'order')])

Also note that roundtrip loading (which preserves formatting and comments) can be disabled if you don't intend on writing the data again.

@sampsyo
Copy link
Member

sampsyo commented Feb 28, 2019

Cool; good to know. The lack of global state was what convinced me this was worth looking into, but I'm also intrigued to learn about the round-trip capability. We currently apply a few hacks to fake round-trippability; it would be nice to rely on a real implementation in the library...

@arcresu
Copy link
Member

arcresu commented Mar 26, 2019

With the current title this looks like a duplicate of #1702

@sampsyo
Copy link
Member

sampsyo commented Mar 26, 2019

Thanks! It does indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate feature features we would like to implement
Projects
None yet
Development

No branches or pull requests

3 participants