Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix block scalar mangling bug #231 #232

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wrouesnel
Copy link

The regex based parsing for fixing comments was breaking block scalars. By using the ruyaml round trip handler, instead the comment formatting now can correctly identify block-scalars and avoid mangling them.

Resolves #231

Checklist

  • Add test cases to all the changes you introduce
  • Update the documentation for the changes

@wrouesnel wrouesnel force-pushed the fix_block_whitespace_handling branch from 803493d to 751a7e6 Compare April 6, 2023 11:55
@coveralls
Copy link

Pull Request Test Coverage Report for Build 4628587345

  • 65 of 66 (98.48%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 99.634%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/yamlfix/adapters.py 56 57 98.25%
Totals Coverage Status
Change from base Build 4595856345: -0.2%
Covered Lines: 545
Relevant Lines: 547

💛 - Coveralls

Copy link
Owner

@lyz-code lyz-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick PR @wrouesnel, I'm a bit concerned with the duplication of work though which will make the code slower.

If you find a way to fix this without that duplication I'll be happy to merge this

#Comment with some whitespace below
""" # noqa: W293
)
fixed_source = dedent(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fixed_source == source I think it will be cleaner to use result == source. If they are different I was not able to tell them apart :S

@@ -481,6 +481,14 @@ def test_fix_code_functions_emit_debug_logs(
"Restoring jinja2 variables...",
"Restoring double exclamations...",
"Fixing comments...",
# Fixing comments causes a re-run of fixers, so we get duplicates from here
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the re-running of previous fixers, can't you fix comments before so that we don't have to rerun everything?

I think it is really bad from the point of view of the performance of the tool to do this.

@lyz-code
Copy link
Owner

lyz-code commented Apr 7, 2023

Looking at the code, weren't you able to tweak the existing _fix_comments function to correctly handle the case instead of rewriting it?

@lyz-code
Copy link
Owner

Check my comment on this issue in case it might help simplify the pull request

@wrouesnel wrouesnel force-pushed the fix_block_whitespace_handling branch from 751a7e6 to 0b4c13b Compare June 22, 2023 00:07
The regex based parsing for fixing comments was breaking block scalars.
By using the ruyaml round trip handler, instead the comment formatting
now can correctly identify block-scalars and avoid mangling them.
@wrouesnel wrouesnel force-pushed the fix_block_whitespace_handling branch from 0b4c13b to 0ed70b0 Compare June 22, 2023 01:15
@wrouesnel
Copy link
Author

The problem with solving this issue is that the fixer in question takes a line-oriented approach to processing the files, and can't handle multiple lines.

The specific issue is here -

# Comment in the middle of the line, but it's not part of a string

            # Comment in the middle of the line, but it's not part of a string
            if (
                config.comments_min_spaces_from_content > 1
                and " #" in line
                and line[-1] not in ["'", '"']
            ):
                line = re.sub(r"(.+\S)(\s+?)#", rf"\1{comment_start}", line)

This code has no way to know if it's part of a string or not, because it can't know if it's part of a block, which has a number of different possible presentations.

I did try several ideas for detecting block-scalar membership, but when you get right down to it that's just re-implementing a half-baked YAML parser when there's a much better YAML parser already out there.

IMO it's more important for the fixer to be correct (i.e. not change key values) rather then fast, but I can't see a way to handle this otherwise (which isn't just trying to half-bake a YAML parser.

@lyz-code
Copy link
Owner

I'm sorry but I still feel that this PR is adding a lot of complexity to the code only to fix #231. I don't see myself maintaining this code. Why don't you take a simpler approach to have another fixer run after the comments are run that iterates over the source code and:

  • If the line contains a comment that starts and ends with #:
    • Checks if the following lines start and end with #
    • Get the length of the longest line
    • Pad the rest of lines to match the desired length

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

yamlfix squashes spaces in block strings
3 participants