Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make yamlfix configurable while keeping it opinionated #117

Closed
muripic opened this issue Aug 25, 2021 · 21 comments · Fixed by #189
Closed

Make yamlfix configurable while keeping it opinionated #117

muripic opened this issue Aug 25, 2021 · 21 comments · Fixed by #189
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@muripic
Copy link
Collaborator

muripic commented Aug 25, 2021

Description

yamlfix can only be run with the default settings, which are hard-coded.

In some cases it might be necessary to override a rule. I think black and yamllint could be good models for this: sensible defaults out-of-the-box, but if you really need or want to, you can override or even ignore some of them, such as maximum line length.

However, since yamlfix is meant to be opinionated, we need to keep sensible defaults working without any need for configuration.

Possible Solution

@Jasha10 has implemented this functionality already on autoimport in this PR. It should be easier now to implement it for yamlfix.

As for the spec, it could be a good idea to use the same config options and keys as yamllint, whenever possible.

It may make sense to add a configuration option for each of the changes that we make so that users can choose what transformations to apply potentially speeding up the program run.

Related Issue

After #4 and #116 are taken care of (the default needs to be working first), we could make line-length configurable.

@lyz-code lyz-code added the enhancement New feature or request label Aug 25, 2021
@wwuck
Copy link
Contributor

wwuck commented Sep 20, 2021

It would be nice to have a configuration option to put each list member on a single line when the [] list syntax is used.

eg. currently a large list of additional dependencies in .pre-commit-config.yaml:

additional_dependencies:
  [
    cohesion==1.0.0,
    darglint==1.8.0,
    dlint==0.11.0,
    flake8==3.9.2,
    flake8-aaa==0.12.0,
    flake8-annotations-complexity==0.0.6,
    flake8-annotations-coverage==0.0.5,
    flake8-bandit==2.1.2,
    flake8-black==0.2.3,
    flake8-breakpoint==1.1.0,
    flake8-broken-line==0.3.0,
    flake8-bugbear==21.9.1,
    flake8-builtins==1.5.3,
    flake8-commas==2.0.0,
    flake8-comprehensions==3.6.1,
    flake8-copyright==0.2.2,
    flake8-datetimez==20.10.0,
    flake8-debugger==4.0.0,
    flake8-docstrings==1.6.0,
    flake8-executable==2.1.1,
    flake8-expression-complexity==0.0.9,
    flake8-fine-pytest==1.0.2,
    flake8-isort==4.0.0,
    flake8-libfaketime==1.1,
    flake8-literal==1.1.1,
    flake8-logging-format==0.6.0,
    flake8-markdown==0.2.0,
    flake8-no-implicit-concat==0.1.4,
    flake8-noqa==1.1.0,
    flake8-pep3101==1.3.0,
    flake8-print==4.0.0,
    flake8-printf-formatting==1.1.2,
    flake8-pyi==20.10.0,
    flake8-pytest-style==1.5.0,
    flake8-quotes==3.3.0,
    flake8-requirements==1.5.1,
    flake8-return==1.1.3,
    flake8-rst==0.8.0,
    flake8-rst-docstrings==0.2.3,
    flake8-simplify==0.14.2,
    flake8-sql==0.4.1,
    flake8-strftime==0.3.2,
    flake8-string-format==0.3.0,
    flake8-super==0.1.3,
    flake8-tuple==0.4.1,
    flake8-type-checking==1.0.3,
    flake8-typing-imports==1.10.1,
    flake8-unused-arguments==0.0.6,
    flake8-use-fstring==1.1,
    flake8-use-pathlib==0.2.0,
    flake8-variables-names==0.0.4,
    flake8-2020==1.6.0,
    pandas-vet==0.2.2,
    pep8-naming==0.12.1,
  ]

is squashed into the less readable form:

additional_dependencies: [cohesion==1.0.0, darglint==1.8.0, dlint==0.11.0,
  flake8==3.9.2, flake8-aaa==0.12.0, flake8-annotations-complexity==0.0.6,
  flake8-annotations-coverage==0.0.5, flake8-bandit==2.1.2, flake8-black==0.2.3,
  flake8-breakpoint==1.1.0, flake8-broken-line==0.3.0, flake8-bugbear==21.9.1,
  flake8-builtins==1.5.3, flake8-commas==2.0.0, flake8-comprehensions==3.6.1,
  flake8-copyright==0.2.2, flake8-datetimez==20.10.0, flake8-debugger==4.0.0,
  flake8-docstrings==1.6.0, flake8-executable==2.1.1, flake8-expression-complexity==0.0.9,
  flake8-fine-pytest==1.0.2, flake8-isort==4.0.0, flake8-libfaketime==1.1,
  flake8-literal==1.1.1, flake8-logging-format==0.6.0, flake8-markdown==0.2.0,
  flake8-no-implicit-concat==0.1.4, flake8-noqa==1.1.0, flake8-pep3101==1.3.0,
  flake8-print==4.0.0, flake8-printf-formatting==1.1.2, flake8-pyi==20.10.0,
  flake8-pytest-style==1.5.0, flake8-quotes==3.3.0, flake8-requirements==1.5.1,
  flake8-return==1.1.3, flake8-rst==0.8.0, flake8-rst-docstrings==0.2.3, flake8-simplify==0.14.2,
  flake8-sql==0.4.1, flake8-strftime==0.3.2, flake8-string-format==0.3.0,
  flake8-super==0.1.3, flake8-tuple==0.4.1, flake8-type-checking==1.0.3, flake8-typing-imports==1.10.1,
  flake8-unused-arguments==0.0.6, flake8-use-fstring==1.1, flake8-use-pathlib==0.2.0,
  flake8-variables-names==0.0.4, flake8-2020==1.6.0, pandas-vet==0.2.2, pep8-naming==0.12.1]

@muripic
Copy link
Collaborator Author

muripic commented Sep 20, 2021

Hi @wwuck ! Thank you for your comment, I plan to start working on this soon and any input is very welcome 😄
I think that if you want an item per line, the alternative list syntax would be the way to go:

additional_dependencies:
  - cohesion==1.0.0
  - darglint==1.8.0
  - dlint==0.11.0
  - flake8==3.9.2
  - flake8-aaa==0.12.0
 ...

I do think it would be great if we could tell yamlfix which list format to use. Adding multiple formatting options to the list with brackets format, though, in my opinion would be falling into a bit of a rabbit hole, since we could also add two per line, three and so on.
So, to sum up, I think that having an option for list_format, with two values (full or abbreviated, though these are just tentative names), and a line_length option to limit the line length if you go for the abbreviated/brackets format, would provide enough flexibility while keeping things simple. For example, in this case, if you specify list_format: full, it would convert your list to the format above. What do you think? Does it make sense to you?

I'm sure @lyz-code also has something interesting to say about this 😃

@wwuck
Copy link
Contributor

wwuck commented Sep 20, 2021

Yes that would be nice to have. Thanks.

@lyz-code
Copy link
Owner

I think that by default, yamlfix should try to make any list fit into one line such as:

additional_dependencies: [cohesion==1.0.0, flake8==3.9.2]

If suddenly it gets over the character line limit, it should use the muripic's proposed format:

additional_dependencies:
  - cohesion==1.0.0
  - darglint==1.8.0
  - ...

If we delete enough lines that it fits again in one line, it should switch back to one line.

I would vote against supporting @wwuck's list format as it requires more characters and doesn't follow the current way of defining lists. I may be missing some benefit of that format, please @wwuck tell us why you'd prefer that one over the dash. If ruyaml supports both formats, and we just need to pass an external argument to their formatter, I could agree on implementing it, but I think that adding more code to maintain for no good reason may not be a good idea in the long run.

That being said, if you want to support both formats with the list_format and it doesn't mean many more lines of code, I'm ok with it.

@wwuck
Copy link
Contributor

wwuck commented Sep 20, 2021

I’m happy either way, as long as I can get a list format where all members are on individual lines as this is a huge benefit for git diff when adding or removing a single list element.

Either syntax is fine to me; I had just forgotten temporarily that the dash format was a list. Yaml format is still quite confusing to remember sometimes.

@lyz-code
Copy link
Owner

lyz-code commented Oct 21, 2021

@dbatten5 started implementing the same functionality in autoimport. We can reuse that code here as they are very similar projects

@dbatten5
Copy link

@dbatten5 started implementing the same functionality in autoimport. We can reuse that code here as they are very similar projects

i wonder if that functionality should be extracted into a dedicated package...

@lyz-code
Copy link
Owner

Maybe it makes sense, yes

@dbatten5
Copy link

dbatten5 commented Oct 25, 2021

Maybe it makes sense, yes

i've made a python package called maison (using lots of bits from your lovely pacakges) which encapsulates that behaviour, just made the first release. i plan to add more features to it eg. config validation/defaults with pydantic, multiple config file sources etc. when i get the chance

@lyz-code
Copy link
Owner

Awesome! thank you so much :)

@lyz-code
Copy link
Owner

@Jasha10 has implemented this functionality already on autoimport in this PR. It should be easier now to implement it for yamlfix

@lyz-code lyz-code added good first issue Good for newcomers help wanted Extra attention is needed labels Nov 23, 2021
@wdobbels
Copy link

Two settings on my wishlist:

  • string quotation for values (to avoid inconsistencies between '123' and 123B). With the setting you could choose if you always use quotes (potentially choose between single and double quotes), or if you use no quotes where possible (current behaviour). Keys would never use quotes (unless you prefer to have another setting for that).
  • null value: null, ~, empty field

I can help contribute if these are things you want in this project.

@lyz-code
Copy link
Owner

Hi @wdobbels thanks for taking the time to write the comment. Two questions on your wishlisted settings:

I don't understand the first setting, I've just tested and the next snippet is not changed by yamlfix:

---
integer_value: 123
string_value: '123'

So it looks like it's intelligent enough to deduce that '123' is a string of an integer and it should not be changed, if instead you had string_value: 'another_string' it will transform it to string_value: another_string.

Can you please share a reproducible example to help me understand better what you mean?

On your second feature, I see that:

# A document may be null.
---
---
# This mapping has four keys,
# one has a value.
empty:
canonical: ~
english: null
~: null key
---
# This sequence has five
# entries, two have values.
sparse:
  - ~
  - 2nd entry
  -
  - 4th entry
  - Null

Get's converted to:

---
...
---
# This mapping has four keys,
# one has a value.
empty:
canonical:
english:
null: null key
---
# This sequence has five
# entries, two have values.
sparse:
  - 
  - 2nd entry
  - 
  - 4th entry
  -

Which raises quite a lot of edits that do not follow what looks to be the specification (2 also follows that specification.

So if it really is the standard, I'd say that it's more of a bug than something to be selected by a setting configuration. To better understand your reasons, why would you want to use null or ~ over the empty field (different from an empty string)? I find the third more clear and I'm not sure if it's worth the work to code it and maintain it afterwards. Furthermore it can slow (although probably irrelevantly) the run on yamlfix on files. But if you have a strong opinion on it, the code change is not big and it's easy to maintain, I'll be happy to accept your PR :). Regarding this issue I think so as not to high-jack the issue, we can discuss it in another one.

Thinking about what you said, it may make sense to add a configuration option for each of the changes that we make so that users can choose what transformations to apply potentially speeding up the program run.

@wdobbels
Copy link

Hello @lyz-code !

What I mean by the first setting is a way for string values to always have quotes. So instead of converting the following:

key1: "12"
key2: "12B"
key3: "ABC"

to:

key1: "12"
key2: 12B
key3: ABC

It would keep it as the former. That makes files like those more uniform; it's just a personal preference.

The second option is also just a personal preference: instead of using empty fields I like to have explicit null (an empty field to me looks like someone forgot to fill in a value).

I've started adding the possibility for configs using maison, but I've had some trouble with outdated imports (especially regarding flakeheaven).

@lyz-code
Copy link
Owner

Hi @wdobbels thanks for starting to work on it :).

I understand your point of view, and I'm fine with those configurations as long as they don't add too much complexity on the code. Please disable by default the strings one.

If you make a pull request in WIP state, I may be able to help you solve the import issues. Are you adding them with pdm?

@marcules
Copy link
Contributor

@lyz-code I'd like to implement the features mentioned by @wdobbels - in particular the "choose your own none representation" and "chose consistent quotation marks for simple scalar values" and "quote everything consistently" (keys+values).

Looking at the code I did not see an easy way to pass options into services.py / fix_files.
I was thinking of creating a new "RuamelOptions" class to keep the related config options for those things together and extending fix_files with an additional optional parameter.

I wanted your input on this, as classes seem to be a stylistic difference on how the project is set up currently.
I got a working prototype locally and can create a PR without any tests for now if you want, so you can see how it feels as an extension to the project, wdyt?

@lyz-code
Copy link
Owner

Hey @marcules thanks for taking the time to work on this issue, I really like your approach. We can continue the discussion on the PR if you like

@marcules
Copy link
Contributor

@lyz-code it might be hard to use the same config options as in yamllint as the yamllint config format does not seem to be compatible with maison -> yaml vs toml / ini
https://github.com/adrienverge/yamllint/blob/master/yamllint/conf/default.yaml
and it even has it's own "extends" syntax where multiple files need to be read to have a merged config.

Maybe if maison is extended to read yaml files as well, we could use the same configs - but for now it would look like you'd have to either specify an additional yamlfix.ini / pyproject.toml with a tools.yamlfix key in it

https://dbatten5.github.io/maison/usage/ says it's supporting only ini and toml files for now.

I can proceed with this, but the yamllint files cannot be used "as is" for yamlfix, even if the keys themselves have the same names.

@lyz-code
Copy link
Owner

lyz-code commented Dec 14, 2022

Haven't thought about reusing yamllint configuration file, this could be a good idea as long as we can inject our own configuration on that file without breaking yamllint (imagine we add new keys) and that we reuse the same keys.

It can be confusing though for the users to configure yamlfix with yamlint. Maybe a cleaner approach could be to read both yamllint conf and yamlfix and if there is a key set in yamllint that is used by yamlfix and that configuration is not set on yamlfix conf (to avoid duplication) then we take it from there. It will add complexity though :/ . Not sure what path to follow in this case, what do you think?

For the configuration, if we don't use maison I'd be inclined to use goodconf as it uses pydantic data validation behind the scenes (and I use it for my projects that read the configuration from a yaml :P)

@lyz-code
Copy link
Owner

Although eventually yamllint may support toml files when that time comes, it will be interesting for the python projects to read yamlfix configuration from there too.

@lyz-code
Copy link
Owner

Available since 1.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants