Import format changes based on line ending type #1566

VakarisZ · 2020-10-16T13:12:51Z

Import

from impacket.smb3structs import (
    SMB2_CREATE, SMB2_FLAGS_DFS_OPERATIONS, SMB2_IL_IMPERSONATION, SMB2_OPLOCK_LEVEL_NONE, SMB2Create,
    SMB2Create_Response, SMB2Packet)

checked with command python -m isort . -c -l 120 --wl 120 passes on dos line endings, but with unix line endings it fails because it expects

from impacket.smb3structs import (SMB2_CREATE, SMB2_FLAGS_DFS_OPERATIONS, SMB2_IL_IMPERSONATION, SMB2_OPLOCK_LEVEL_NONE,
                                  SMB2Create, SMB2Create_Response, SMB2Packet)

Visually, both line endings look the same, so I'd expect them to be treated the same (either as 1 or as 2 characters).

The text was updated successfully, but these errors were encountered:

anirudnits · 2020-10-17T07:41:59Z

From my brief inspection, I believe the issue lies here

isort/isort/io.py

Line 44 in 6807cfb

text = TextIOWrapper(buffer, encoding, line_buffering=True, newline="")

with newline="". From python docs:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n',
'\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled,but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

I'm not sure if this the exact reason and I would delve a bit deeper when I get time.

areeh · 2020-10-23T13:55:56Z

@anirudnits
Bear in mind that setting relates to isort's history of changing newlines, so while that might fix the 1 vs 2 character issue, it might bring back the changing line endings issue. See eg: #493 #671

areeh · 2020-10-23T14:16:30Z

For reference, here's how Black appears to solve both issues at once (edited heavily for brevity):

with open(src, "rb") as buf:
    src_contents, encoding, newline = decode_bytes(buf.read())

dst_contents = format_file_contents(src_contents)

with open(src, "w", encoding=encoding, newline=newline) as f:
    f.write(dst_contents)

def decode_bytes(src: bytes) -> Tuple[FileContent, Encoding, NewLine]:
    """Return a tuple of (decoded_contents, encoding, newline).
    `newline` is either CRLF or LF but `decoded_contents` is decoded with
    universal newlines (i.e. only contains LF).
    """
    srcbuf = io.BytesIO(src)
    encoding, lines = detect_encoding(srcbuf.readline)
    if not lines:
        return "", encoding, "\n"

    newline = "\r\n" if b"\r\n" == lines[0][-2:] else "\n"
    srcbuf.seek(0)
    with io.TextIOWrapper(srcbuf, encoding) as tiow:
        return tiow.read(), encoding, newline

Note that they preserve newlines for output, and use only LF internally (which should give consistent formatting in this case)

Actual implementation in Black source code:
https://github.com/psf/black/blob/407052724fa1c97ee8bcd4e96de650def00be03e/src/black/__init__.py#L1012
https://github.com/psf/black/blob/407052724fa1c97ee8bcd4e96de650def00be03e/src/black/__init__.py#L806

areeh · 2020-12-02T13:28:22Z

I abandoned my PR for a couple of reasons:

Black handles the newline issue in one place, I couldn't find a way to handle it without adding to StringIOs scattered throughout the lib (as a first time contributor, I want to limit my code changes)
I couldn't find a nice way to set newlines on the output when outputting to stdout (black explicitly decided not to handle this as far as I know)
the approach I was trying would give 1 type of newline for the entire output, which doesn't seem to match tests such as:

assert api.sort_code_string("import A\n\r\nimportA\n\n") == "import A\r\n\r\nimportA\r\n\n"

timothycrosley · 2021-03-20T07:17:22Z

Hi @VakarisZ,

I'm sorry you experienced this error! This has been fixed in develop and will be released to PyPI in the 5.8.0 release of isort.

To address the thread about how newlines are handled that stemmed from this issue: isort can't do things exactly like Black does because it is streaming based, and doesn't load a full file at once. This is much more efficient and scalable but it does come with some costs. That said, isort does a good job of keeping track of the line separator used for a chunk of imports, and generally uses this information when checking line length - I tracked it down to a single place where the parsed line separator was ignored and \n hard coded, I then added an additional test case to ensure this doesn't pop up again.

Thanks!

~Timothy

VakarisZ · 2021-03-23T09:16:00Z

Thanks, @timothycrosley, you rock!

timothycrosley added bug Something isn't working Hacktoberfest help wanted Extra attention is needed labels Oct 17, 2020

areeh mentioned this issue Oct 23, 2020

wip #1566 #1576

Closed

timothycrosley added a commit that referenced this issue Mar 20, 2021

Create failing test for issue #1566

a1f2782

timothycrosley added a commit that referenced this issue Mar 20, 2021

Add changelog entry for issue #1566

844c85c

timothycrosley added a commit that referenced this issue Mar 20, 2021

Fix issue #1566: Fixed single location parsed line separator isn't used

404d809

timothycrosley closed this as completed Mar 20, 2021

timothycrosley added this to the 5.8.0 milestone Mar 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import format changes based on line ending type #1566

Import format changes based on line ending type #1566

VakarisZ commented Oct 16, 2020 •

edited

anirudnits commented Oct 17, 2020

areeh commented Oct 23, 2020

areeh commented Oct 23, 2020

areeh commented Dec 2, 2020 •

edited

timothycrosley commented Mar 20, 2021

VakarisZ commented Mar 23, 2021

Import format changes based on line ending type #1566

Import format changes based on line ending type #1566

Comments

VakarisZ commented Oct 16, 2020 • edited

anirudnits commented Oct 17, 2020

areeh commented Oct 23, 2020

areeh commented Oct 23, 2020

areeh commented Dec 2, 2020 • edited

timothycrosley commented Mar 20, 2021

VakarisZ commented Mar 23, 2021

VakarisZ commented Oct 16, 2020 •

edited

areeh commented Dec 2, 2020 •

edited