Darker changes lines when the previous lines got changed #388

wkentaro · 2022-08-29T16:00:07Z

Describe the bug

Darker changes lines when the previous lines got changed.
It happens when there are trailing closing parenthesis of the function call.

To Reproduce

unzip darker_tests.zip
cd darker_tests
git --no-pager diff
darker -l 79 spam.py --diff

Expected behavior

Only the line that was changed by editing gets updated. (in the below screenshot, only the first call of func1)

Screenshots

Environment (please complete the following information):

OS: macOS
Python version: 3.9.10
Git version [e.g. 2.36.0]

% git --version
git version 2.32.1 (Apple Git-133)
hub version refs/heads/master

Darker version [e.g. 1.5.0]

% darker --version
1.5.0

Black version [e.g. 22.3.0]

% black --version
black, 22.1.0 (compiled: yes)

The text was updated successfully, but these errors were encountered:

wkentaro · 2022-08-29T16:20:39Z

This is due to difflib not splitting these 2 as different chunks.

In [3]: list(opcodes_to_chunks(opcodes, src, dst))
Out[3]:
[(1,
  ('def func1(a_long_long, b_long_long, c_long_long, d_long_long):',
   '    return a_long_long + b_long_long + c_long_long + d_long_long',
   '',
   '',
   'def main():'),
  ('def func1(a_long_long, b_long_long, c_long_long, d_long_long):',
   '    return a_long_long + b_long_long + c_long_long + d_long_long',
   '',
   '',
   'def main():')),
 (6,
  ('    e = func1(a_long_long=999, b_long_long=2000, c_long_long=3000, d_long_long=4000)',
   '    f = func1(a_long_long=1001, b_long_long=2001, c_long_long=3001, d_long_long=4001)'),
  ('    e = func1(',
   '        a_long_long=999, b_long_long=2000, c_long_long=3000, d_long_long=4000',
   '    )',
   '    f = func1(',
   '        a_long_long=1001, b_long_long=2001, c_long_long=3001, d_long_long=4001',
   '    )')),
 (8,
  ('    print(e, f)', '', '', 'if __name__ == "__main__":', '    main()'),
  ('    print(e, f)', '', '', 'if __name__ == "__main__":', '    main()'))]

akaihola · 2022-08-29T18:37:05Z

Thanks @wkentaro, this is a great summary and example case for a tricky problem. Since difflib doesn't understand the Python syntax, it's impossible for it to reliably tell where statement boundaries are.

In #221, I attempted a more involved approach by digging into the internals of Black and getting a stream of original/reformatted chunks directly from there, but I very quickly ran into a dead end. There was a fundamental reason why this can't be done in a foolproof way even with Black having access to the AST. Unfortunately I didn't write down the details of it, but it would probably be obvious if we went through it again. It had something to do with indented blocks and the extents of AST nodes.

If you're interested to ponder about the problem, we could continue discussion here or maybe in a call.

akaihola · 2022-08-29T18:48:01Z

Black has evolved quite a bit since I worked on #221, but essentially I tried to reimplement what today is in black._format_str_once() and turn it into a generator of (<original line(s)>, <reformatted line(s)>) pairs instead of collecting all reformatted lines into a string.

I was hoping to get a more granular list of chunks than what difflib can give us by comparing the results of reformatting to the original.

But as said, I failed (or at least was defeated by something that seemed insurmountable).

akaihola · 2022-08-29T18:49:05Z

Also, having to reimplement parts of Black internals would of course make maintenance of Darker vastly heavier for obvious reasons – at least if we'd like to allow users to benefit from features of new Black versions going forward.

akaihola · 2022-08-29T19:15:30Z

Oh, I think I now remember a very very difficult case: non-standard indentation.

Let's say you have this file:

if True:
 for x in range(5):
  try:
   print(10 / x)
  except ZeroDivisionError:
   print("Can't divide")

Currently difflib bundles all the reindented lines into one chunk, and if you e.g. change only the fourth line to print(20/x), Darker will simply still reformat the whole file.

The dream with more granular handling would of course be that the "wrong" indentation be retained and other reformatting applied. A naïve implementation would simply cause the indentation to not match the surrounding code and be either invalid or not match the original AST. The outermost indentation level could perhaps be reverted in a post-processing step to match surrounding code, but somehow I feel there are going to be nasty edge cases where this would break apart.

akaihola added the enhancement New feature or request label Aug 29, 2022

akaihola added this to To do in Akaihola's Open source work via automation Aug 29, 2022

wkentaro linked a pull request Sep 4, 2022 that will close this issue

Apply more granular changes by using # fmt: skip #391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Darker changes lines when the previous lines got changed #388

Darker changes lines when the previous lines got changed #388

wkentaro commented Aug 29, 2022

wkentaro commented Aug 29, 2022

akaihola commented Aug 29, 2022

akaihola commented Aug 29, 2022

akaihola commented Aug 29, 2022

akaihola commented Aug 29, 2022

Darker changes lines when the previous lines got changed #388

Darker changes lines when the previous lines got changed #388

Comments

wkentaro commented Aug 29, 2022

wkentaro commented Aug 29, 2022

akaihola commented Aug 29, 2022

akaihola commented Aug 29, 2022

akaihola commented Aug 29, 2022

akaihola commented Aug 29, 2022