Fix block scalar mangling bug #231 #232

wrouesnel · 2023-04-06T11:14:02Z

The regex based parsing for fixing comments was breaking block scalars. By using the ruyaml round trip handler, instead the comment formatting now can correctly identify block-scalars and avoid mangling them.

Resolves #231

Checklist

Add test cases to all the changes you introduce
Update the documentation for the changes

coveralls · 2023-04-06T11:58:03Z

Pull Request Test Coverage Report for Build 4628587345

65 of 66 (98.48%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.2%) to 99.634%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/yamlfix/adapters.py	56	57	98.25%

Totals
Change from base Build 4595856345:	-0.2%
Covered Lines:	545
Relevant Lines:	547

💛 - Coveralls

lyz-code

Thank you for the quick PR @wrouesnel, I'm a bit concerned with the duplication of work though which will make the code slower.

If you find a way to fix this without that duplication I'll be happy to merge this

lyz-code · 2023-04-07T10:09:10Z

tests/unit/test_adapter_yaml.py

+            #Comment with some whitespace below
+            """  # noqa: W293
+        )
+        fixed_source = dedent(


If fixed_source == source I think it will be cleaner to use result == source. If they are different I was not able to tell them apart :S

lyz-code · 2023-04-07T10:11:21Z

tests/unit/test_services.py

@@ -481,6 +481,14 @@ def test_fix_code_functions_emit_debug_logs(
            "Restoring jinja2 variables...",
            "Restoring double exclamations...",
            "Fixing comments...",
+            # Fixing comments causes a re-run of fixers, so we get duplicates from here


I don't like the re-running of previous fixers, can't you fix comments before so that we don't have to rerun everything?

I think it is really bad from the point of view of the performance of the tool to do this.

lyz-code · 2023-04-07T10:15:52Z

Looking at the code, weren't you able to tweak the existing _fix_comments function to correctly handle the case instead of rewriting it?

lyz-code · 2023-05-24T15:47:43Z

Check my comment on this issue in case it might help simplify the pull request

The regex based parsing for fixing comments was breaking block scalars. By using the ruyaml round trip handler, instead the comment formatting now can correctly identify block-scalars and avoid mangling them.

wrouesnel · 2023-06-22T01:21:22Z

The problem with solving this issue is that the fixer in question takes a line-oriented approach to processing the files, and can't handle multiple lines.

The specific issue is here -

yamlfix/src/yamlfix/adapters.py

Line 584 in 51410d6

# Comment in the middle of the line, but it's not part of a string

            # Comment in the middle of the line, but it's not part of a string
            if (
                config.comments_min_spaces_from_content > 1
                and " #" in line
                and line[-1] not in ["'", '"']
            ):
                line = re.sub(r"(.+\S)(\s+?)#", rf"\1{comment_start}", line)

This code has no way to know if it's part of a string or not, because it can't know if it's part of a block, which has a number of different possible presentations.

I did try several ideas for detecting block-scalar membership, but when you get right down to it that's just re-implementing a half-baked YAML parser when there's a much better YAML parser already out there.

IMO it's more important for the fixer to be correct (i.e. not change key values) rather then fast, but I can't see a way to handle this otherwise (which isn't just trying to half-bake a YAML parser.

lyz-code · 2023-06-22T08:43:32Z

I'm sorry but I still feel that this PR is adding a lot of complexity to the code only to fix #231. I don't see myself maintaining this code. Why don't you take a simpler approach to have another fixer run after the comments are run that iterates over the source code and:

If the line contains a comment that starts and ends with #:
- Checks if the following lines start and end with #
- Get the length of the longest line
- Pad the rest of lines to match the desired length

wrouesnel force-pushed the fix_block_whitespace_handling branch from 803493d to 751a7e6 Compare April 6, 2023 11:55

lyz-code requested changes Apr 7, 2023

View reviewed changes

lyz-code mentioned this pull request May 2, 2023

yamlfix squashes | notation #237

Closed

wrouesnel force-pushed the fix_block_whitespace_handling branch from 751a7e6 to 0b4c13b Compare June 22, 2023 00:07

Fix block scalar mangling bug lyz-code#231

0ed70b0

The regex based parsing for fixing comments was breaking block scalars. By using the ruyaml round trip handler, instead the comment formatting now can correctly identify block-scalars and avoid mangling them.

wrouesnel force-pushed the fix_block_whitespace_handling branch from 0b4c13b to 0ed70b0 Compare June 22, 2023 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix block scalar mangling bug #231 #232

Fix block scalar mangling bug #231 #232

wrouesnel commented Apr 6, 2023

coveralls commented Apr 6, 2023

lyz-code left a comment

lyz-code Apr 7, 2023

lyz-code Apr 7, 2023

lyz-code commented Apr 7, 2023

lyz-code commented May 24, 2023

wrouesnel commented Jun 22, 2023

lyz-code commented Jun 22, 2023

Fix block scalar mangling bug #231 #232

Are you sure you want to change the base?

Fix block scalar mangling bug #231 #232

Conversation

wrouesnel commented Apr 6, 2023

Checklist

coveralls commented Apr 6, 2023

Pull Request Test Coverage Report for Build 4628587345

💛 - Coveralls

lyz-code left a comment

Choose a reason for hiding this comment

lyz-code Apr 7, 2023

Choose a reason for hiding this comment

lyz-code Apr 7, 2023

Choose a reason for hiding this comment

lyz-code commented Apr 7, 2023

lyz-code commented May 24, 2023

wrouesnel commented Jun 22, 2023

lyz-code commented Jun 22, 2023