Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LGTM.com errors #2851

Merged
merged 3 commits into from Oct 10, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion nltk/internals.py
Expand Up @@ -242,7 +242,7 @@ def read_str(s, start_position):
try:
return eval(s[start_position : match.end()]), match.end()
except ValueError as e:
raise ReadError("invalid string (%s)" % e) from e
raise ReadError("valid escape sequence", start_position) from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it say "invalid" instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReadError documentation reads:

    :param expected: What was expected when an error occurred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. i guess i am used to errors telling me what's wrong, rather than what is expected. but honestly, i don't even think this code works!! from my own poking around, eval-ing a string with an invalid escape sequence does NOT raise a ValueError, it instead raises a DeprecationWarning. This code dates back to 2008, so perhaps then python used raise an error?

There seems to be active discussion in the python community about this, and they may raise an error down the road.

all of this is to say, i know you're just trying to improve small bits of code quality and not anything more. So I added a quick unit test to check other errors work (they seem to), and this looks fine now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error message needs to make it easy for people to fix. Doesn't that mean reporting the invalid string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but then that's because the ReadError class is poorly designed. This could be fixed in a different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadError class is defined further up in the file as:

class ReadError(ValueError):
    """
    Exception raised by read_* functions when they fail.
    :param position: The index in the input string where an error occurred.
    :param expected: What was expected when an error occurred.
    """

    def __init__(self, expected, position):
        ValueError.__init__(self, expected, position)
        self.expected = expected
        self.position = position

    def __str__(self):
        return f"Expected {self.expected} at {self.position}"

This PR fixed a bug where constructor was only passed one argument instead of two and the new string displayed to the user in the context of line 245 will be f"Expected valid escape sequence at {start_position}".



_READ_INT_RE = re.compile(r"-?\d+")
Expand Down
5 changes: 4 additions & 1 deletion nltk/util.py
Expand Up @@ -1013,7 +1013,7 @@ def skipgrams(sequence, n, k, **kwargs):
######################################################################

# inherited from pywordnet, by Oliver Steele
def binary_search_file(file, key, cache={}, cacheDepth=-1):
def binary_search_file(file, key, cache=None, cacheDepth=-1):
"""
Return the line from the file with first word key.
Searches through a sorted file using the binary search algorithm.
Expand All @@ -1036,6 +1036,9 @@ def binary_search_file(file, key, cache={}, cacheDepth=-1):
end = file.tell() - 1
file.seek(0)

if cache is None:
cache = {}

while start < end:
lastState = start, end
middle = (start + end) // 2
Expand Down