-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docstring tests #3050
Docstring tests #3050
Conversation
…all.py and causal.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly added some comments. Most of them related to ...
and # doctest: +ELLIPSIS
. I'm in favor of using ...
where possible rather than round()
, slicing a string to e.g. only print 6 words, or putting the entire output in the doctest output.
@ekaf @stevenbird what are your thoughts on the two "styles"?
nltk/metrics/agreement.py
Outdated
>>> round(t.S(), 3) | ||
0.82 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd, rounding to 3 gives 2 values after the dot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because if not rounded we have 0.8199999999999998 :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, that's rough, then 0.82...
wouldn't work either (presumably). Rounding is acceptable then, otherwise the output (e.g. 0.819...
) may change in the future or on different devices (e.g. to 0.820...
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't round(t.S(), 2) be more appropriate, then?
nltk/corpus/__init__.py
Outdated
>>> print(", ".join(brown.words())) | ||
The, Fulton, County, Grand, Jury, said, ... | ||
>>> print(", ".join(brown.words()[:6])) # only first 6 words | ||
The, Fulton, County, Grand, Jury, said |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my understanding that ...
is valid for tests, as long as you use # doctest: +ELLIPSIS
. That seems cleanest, especially for cases where the alternative is really long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I agree that's the easier and cleaner solution. How do I update this? Do I have to create a new pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need! You can push new commits to the docstring-tests
branch on your nltk fork, and they will automatically be included in this PR.
nltk/corpus/reader/framenet.py
Outdated
>>> fn.frame('Imposing_obligation') | ||
frame (1494): Imposing_obligation... | ||
>>> fn.frame('Imposing_obligation') # doctest: +NORMALIZE_WHITESPACE | ||
frame (1494): Imposing_obligation | ||
<BLANKLINE> | ||
[URL] https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Imposing_obligation.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, can we use # doctest: +ELLIPSIS
here?
nltk/draw/util.py
Outdated
>>> cn['color'] | ||
>>> print(cn['color']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Does it not correctly convert to string otherwise or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I remove the print() the test fails:
Failed example: cn['color'] Expected: red Got: 'red'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that makes sense! It's expecting a variable, not a string, because that's what we've put as the expected answer. Changing red
to "red"
should also work, but it does not matter much.
nltk/probability.py
Outdated
>>> cpdist['passed'].prob('VBD') | ||
0.423... | ||
>>> round(cpdist['passed'].prob('VBD'), 3) | ||
0.423 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, can this be fixed with ellipses? It feels a bit cleaner to me.
nltk/tag/brill_trainer.py
Outdated
>>> baseline.accuracy(gold_data) #doctest: +ELLIPSIS | ||
0.2450142... | ||
>>> round(baseline.accuracy(gold_data), 7) | ||
0.2433862 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the prior only failed because it is actually e.g. 0.243...
I resolved a quick merge conflict in |
@tomaarsen I'm with you in preferring ellipses in the output, rather than modifying the code in ways that no-one would do IRL |
@tomaarsen (#3050 (review)) I agree that your preferred style is cleaner and generally more robust. |
91f20c0
to
2c7a4ab
Compare
I updated the code according to the suggestions 👍 |
Indeed, this PR significantly reduces the number of doctest failures:
Note that #3048 fixes the remaining failures in nltk/parse/dependencygraph.py and nltk/tag/hunpos.py. |
There are 3 easy ways of solving the remaining failure in nltk/corpus/reader/framenet.py : either use print instead of pprint in line 1657, or doctest: +NORMALIZE_WHITESPACE, or format the expected test output with pprint:
|
After merging @ekaf's #3048 and then checking out this PR, I get the following failing tests still:
This differs slightly from the previous list. Do we want to get this merged now, and then look at those remaining tests in another issue/PR, or do we want to fix these in this PR too? I recognize that the remaining issues are a bit more challenging. |
@tomaarsen (#3050 (comment)) the two additional failures with nltk/test/unit/test_downloader.py in the previous list were due to being offline. When testing online I get the same list as you. |
I don't think this PR should try to fix everything, but maybe the easy framenet.py fix could fit here. |
As suggested by @ekaf, I've quickly fixed the framenet test. The failing tests are now:
I'll merge this once the tests go green. Thanks for the work @Madnex & @ekaf! |
I believe the original typo was misinterpreted and changed to something that was not originally intended.
* fixed pytests * fixed more pytests * fixed more pytest and changed multiline pytest issues fixes for snowball.py and causal.py * fixed pytests (mainly multiline or rounding issues) * fixed treebank pytests, removed test for return_string=True (deprecated) * fixed destructive.py pytests, removed test for return_string=True (deprecated) * fixed pytest (rounding issues) * fixed pytest (initialised missing object) * fixed pytest (formatting issues) * fixed pytest (formatting issues) * fixed pytest (formatting issues) * added pytest +SKIP for deprecated module stanford * updated AUTHORS.md * changed docstring corrections by usage of ELLIPSIS and different roundings * fixed AUTHORS.md to be consistent * Fix framenet doctest formatting with pprint * Change docstring on MultiListBox.__init__ I believe the original typo was misinterpreted and changed to something that was not originally intended. Co-authored-by: Jan Lennartz <jan.lennartz@ing.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Fixed multiple mostly small pytest errors. The tests regarding the StanfordParser are set to be skipped:
After this PR most of the files in the project should pass the pytest docstring tests.