Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LGTM.com errors #2851

Merged
merged 3 commits into from Oct 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions nltk/internals.py
Expand Up @@ -238,11 +238,11 @@ def read_str(s, start_position):
break

# Process it, using eval. Strings with invalid escape sequences
# might raise ValueEerror.
# might raise ValueError.
try:
return eval(s[start_position : match.end()]), match.end()
except ValueError as e:
raise ReadError("invalid string (%s)" % e) from e
raise ReadError("valid escape sequence", start_position) from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it say "invalid" instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReadError documentation reads:

    :param expected: What was expected when an error occurred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. i guess i am used to errors telling me what's wrong, rather than what is expected. but honestly, i don't even think this code works!! from my own poking around, eval-ing a string with an invalid escape sequence does NOT raise a ValueError, it instead raises a DeprecationWarning. This code dates back to 2008, so perhaps then python used raise an error?

There seems to be active discussion in the python community about this, and they may raise an error down the road.

all of this is to say, i know you're just trying to improve small bits of code quality and not anything more. So I added a quick unit test to check other errors work (they seem to), and this looks fine now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error message needs to make it easy for people to fix. Doesn't that mean reporting the invalid string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but then that's because the ReadError class is poorly designed. This could be fixed in a different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadError class is defined further up in the file as:

class ReadError(ValueError):
    """
    Exception raised by read_* functions when they fail.
    :param position: The index in the input string where an error occurred.
    :param expected: What was expected when an error occurred.
    """

    def __init__(self, expected, position):
        ValueError.__init__(self, expected, position)
        self.expected = expected
        self.position = position

    def __str__(self):
        return f"Expected {self.expected} at {self.position}"

This PR fixed a bug where constructor was only passed one argument instead of two and the new string displayed to the user in the context of line 245 will be f"Expected valid escape sequence at {start_position}".



_READ_INT_RE = re.compile(r"-?\d+")
Expand Down
20 changes: 20 additions & 0 deletions nltk/test/internals.doctest
Expand Up @@ -138,3 +138,23 @@ It works for classic classes, too:
False
>>> overridden(D().f)
True


read_str()
~~~~~~~~~~~~
>>> from nltk.internals import read_str

Test valid scenarios

>>> read_str("'valid string'", 0)
('valid string', 14)

Now test invalid scenarios
>>> read_str("should error", 0)
Traceback (most recent call last):
...
nltk.internals.ReadError: Expected open quote at 0
>>> read_str("'should error", 0)
Traceback (most recent call last):
...
nltk.internals.ReadError: Expected close quote at 1
5 changes: 4 additions & 1 deletion nltk/util.py
Expand Up @@ -1013,7 +1013,7 @@ def skipgrams(sequence, n, k, **kwargs):
######################################################################

# inherited from pywordnet, by Oliver Steele
def binary_search_file(file, key, cache={}, cacheDepth=-1):
def binary_search_file(file, key, cache=None, cacheDepth=-1):
"""
Return the line from the file with first word key.
Searches through a sorted file using the binary search algorithm.
Expand All @@ -1036,6 +1036,9 @@ def binary_search_file(file, key, cache={}, cacheDepth=-1):
end = file.tell() - 1
file.seek(0)

if cache is None:
cache = {}

while start < end:
lastState = start, end
middle = (start + end) // 2
Expand Down