New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: refactor prettyprinter to be more readable #2893
Conversation
nltk/tree/prettyprinter.py
Outdated
@@ -207,7 +207,7 @@ def dumpmatrix(): | |||
raise ValueError("All leaves must be integer indices.") | |||
if len(leaves) != len(set(leaves)): | |||
raise ValueError("Indices must occur at most once.") | |||
if not all(0 <= n < len(sentence) for n in leaves): | |||
if not all(n in range(0, len(sentence)) for n in leaves): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey mohaned! this actually felt a bit better before. plus now it's constructing a list for each element in leaves
which feels a bit inefficient. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right.
Thanks.
nltk/tree/prettyprinter.py
Outdated
@@ -75,7 +75,7 @@ def __init__(self, tree, sentence=None, highlight=()): | |||
leaves = tree.leaves() | |||
if ( | |||
leaves | |||
and not any(len(a) == 0 for a in tree.subtrees()) | |||
and any(len(a) > 0 for a in tree.subtrees()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replacement contains a logical error. NB:
>>> not any(len(a) == 0 for a in [[1], []])
False
>>> any(len(a) > 0 for a in [[1], []])
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah right, my bad.
I believe it should be replaced with all instead of any.
Considering the below test cases:-
>>> not any(len(a) == 0 for a in [[1], []])
False
>>> all(len(a) > 0 for a in [[1], []])
False
>>> not any(len(a) == 0 for a in [[], []])
False
>>> all(len(a) > 0 for a in [[], []])
False
>>> not any(len(a) == 0 for a in [[1], [1]])
True
>>> all(len(a) > 0 for a in [[1], [1]])
True
@@ -291,7 +291,7 @@ def dumpmatrix(): | |||
matrix[rowidx][i] = ids[m] | |||
nodes[ids[m]] = tree[m] | |||
# add column to the set of children for its parent | |||
if m != (): | |||
if len(m) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(m) > 0: | |
if len(m): |
I believe len(m)
would be equivalent to len(m) > 0
in truthy context. Both only evaluate to False if m
is an empty collection. (That said, I understand the argument that len(m) > 0
could be simpler to understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, but like you said len(m) > 0
is simpler to understand, so I would personally go with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understandable!
Thanks @12mohaned |
This pull request refactors the TreePrettyPrinter class to be more readable and clean. For example, line 78 can be simply expressed as:
if (... and len(a) > 0 and ..)
instead of
if (... and not any(len(a) == 0 and ...)
I believe it is more confusing to read the latter one.
Also, there is a lot of magic numbers and variables(i.e, scale = 2) without any comments explaining what they do, which is considered bad practice since it does not explain what it does represents.
Regards,
Mohaned Mashaly.