Docstring tests #3050

Madnex · 2022-09-12T13:35:38Z

Fixed multiple mostly small pytest errors. The tests regarding the StanfordParser are set to be skipped:

nltk/parse/stanford.py
nltk/tag/stanford.py
nltk/tokenize/stanford.py
nltk/tokenize/stanford_segmenter.py

After this PR most of the files in the project should pass the pytest docstring tests.

…all.py and causal.py

…precated)

…-tests

tomaarsen

I quickly added some comments. Most of them related to ... and # doctest: +ELLIPSIS. I'm in favor of using ... where possible rather than round(), slicing a string to e.g. only print 6 words, or putting the entire output in the doctest output.

@ekaf @stevenbird what are your thoughts on the two "styles"?

tomaarsen · 2022-09-12T14:31:24Z

nltk/metrics/agreement.py

+    >>> round(t.S(), 3)
+    0.82


This seems odd, rounding to 3 gives 2 values after the dot?

Yes, because if not rounded we have 0.8199999999999998 :P

Oooh, that's rough, then 0.82... wouldn't work either (presumably). Rounding is acceptable then, otherwise the output (e.g. 0.819...) may change in the future or on different devices (e.g. to 0.820...).

Wouldn't round(t.S(), 2) be more appropriate, then?

tomaarsen · 2022-09-12T14:32:55Z

nltk/corpus/__init__.py

-    >>> print(", ".join(brown.words()))
-    The, Fulton, County, Grand, Jury, said, ...
+    >>> print(", ".join(brown.words()[:6])) # only first 6 words
+    The, Fulton, County, Grand, Jury, said


It was my understanding that ... is valid for tests, as long as you use # doctest: +ELLIPSIS. That seems cleanest, especially for cases where the alternative is really long.

Okay, I agree that's the easier and cleaner solution. How do I update this? Do I have to create a new pull request?

No need! You can push new commits to the docstring-tests branch on your nltk fork, and they will automatically be included in this PR.

tomaarsen · 2022-09-12T14:33:14Z

nltk/corpus/reader/framenet.py

-        >>> fn.frame('Imposing_obligation')
-        frame (1494): Imposing_obligation...
+        >>> fn.frame('Imposing_obligation') # doctest: +NORMALIZE_WHITESPACE
+        frame (1494): Imposing_obligation
+        <BLANKLINE>
+        [URL] https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Imposing_obligation.xml


Again, can we use # doctest: +ELLIPSIS here?

tomaarsen · 2022-09-12T14:33:55Z

nltk/draw/util.py

-        >>> cn['color']
+        >>> print(cn['color'])


Is this necessary? Does it not correctly convert to string otherwise or something?

When I remove the print() the test fails:

Failed example: cn['color'] Expected: red Got: 'red'

Oh, that makes sense! It's expecting a variable, not a string, because that's what we've put as the expected answer. Changing red to "red" should also work, but it does not matter much.

tomaarsen · 2022-09-12T14:35:16Z

nltk/probability.py

-        >>> cpdist['passed'].prob('VBD')
-        0.423...
+        >>> round(cpdist['passed'].prob('VBD'), 3)
+        0.423



Again, can this be fixed with ellipses? It feels a bit cleaner to me.

tomaarsen · 2022-09-12T14:36:05Z

nltk/tag/brill_trainer.py

-        >>> baseline.accuracy(gold_data) #doctest: +ELLIPSIS
-        0.2450142...
+        >>> round(baseline.accuracy(gold_data), 7)
+        0.2433862


I believe the prior only failed because it is actually e.g. 0.243...

tomaarsen · 2022-09-12T14:42:25Z

I resolved a quick merge conflict in AUTHORS.md in 91f20c0.

stevenbird · 2022-09-13T04:00:12Z

@tomaarsen I'm with you in preferring ellipses in the output, rather than modifying the code in ways that no-one would do IRL

ekaf · 2022-09-13T08:06:10Z

@tomaarsen (#3050 (review)) I agree that your preferred style is cleaner and generally more robust.

…dings

Madnex · 2022-09-13T11:50:58Z

I updated the code according to the suggestions 👍

ekaf · 2022-09-18T09:46:24Z

Indeed, this PR significantly reduces the number of doctest failures:


=========================== short test summary info ============================
FAILED nltk/text.py::nltk.text.Text.collocation_list
FAILED nltk/corpus/reader/framenet.py::nltk.corpus.reader.framenet.FramenetCorpusReader.lu
FAILED nltk/draw/table.py::nltk.draw.table.MultiListbox.__init__
FAILED nltk/draw/table.py::nltk.draw.table.MultiListbox.configure
FAILED nltk/draw/table.py::nltk.draw.table.Table
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.CoreNLPDependencyParser
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.CoreNLPParser
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.GenericCoreNLPParser.tag
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.GenericCoreNLPParser.tokenize
FAILED nltk/parse/dependencygraph.py::nltk.parse.dependencygraph.DependencyGraph._repr_svg_
FAILED nltk/tag/hunpos.py::nltk.tag.hunpos.HunposTagger
FAILED nltk/test/unit/test_downloader.py::test_downloader_using_existing_parent_download_dir
FAILED nltk/test/unit/test_downloader.py::test_downloader_using_non_existing_parent_download_dir
FAILED nltk/tokenize/punkt.py::nltk.tokenize.punkt.PunktSentenceTokenizer._match_potential_end_contexts
= 14 failed, 681 passed, 33 skipped, 9 xfailed, 22 warnings in 414.32s (0:06:54) =

Note that #3048 fixes the remaining failures in nltk/parse/dependencygraph.py and nltk/tag/hunpos.py.

ekaf · 2022-09-18T09:53:58Z

There are 3 easy ways of solving the remaining failure in nltk/corpus/reader/framenet.py : either use print instead of pprint in line 1657, or doctest: +NORMALIZE_WHITESPACE, or format the expected test output with pprint:

________ [doctest] nltk.corpus.reader.framenet.FramenetCorpusReader.lu _________
1648         Usage examples:
1649 
1650         >>> from nltk.corpus import framenet as fn
1651         >>> fn.lu(256).name
1652         'foresee.v'
1653         >>> fn.lu(256).definition
1654         'COD: be aware of beforehand; predict.'
1655         >>> fn.lu(256).frame.name
1656         'Expectation'
1657         >>> pprint(list(map(PrettyDict, fn.lu(256).lexemes)))
Expected:
    [{'POS': 'V', 'breakBefore': 'false', 'headword': 'false', 'name': 'foresee', 'order': 1}]
Got:
    [{'POS': 'V',
      'breakBefore': 'false',
      'headword': 'false',
      'name': 'foresee',
      'order': 1}]

nltk/corpus/reader/framenet.py:1657: DocTestFailure

tomaarsen · 2022-09-19T14:48:44Z

After merging @ekaf's #3048 and then checking out this PR, I get the following failing tests still:

====================================================================================== short test summary info ====================================================================================== 
FAILED nltk/text.py::nltk.text.Text.collocation_list
FAILED nltk/corpus/reader/framenet.py::nltk.corpus.reader.framenet.FramenetCorpusReader.lu
FAILED nltk/draw/table.py::nltk.draw.table.MultiListbox.__init__
FAILED nltk/draw/table.py::nltk.draw.table.MultiListbox.configure
FAILED nltk/draw/table.py::nltk.draw.table.Table
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.CoreNLPDependencyParser
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.CoreNLPParser
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.GenericCoreNLPParser.tag
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.GenericCoreNLPParser.tokenize
FAILED nltk/parse/transitionparser.py::nltk.parse.transitionparser.demo
FAILED nltk/tokenize/punkt.py::nltk.tokenize.punkt.PunktSentenceTokenizer._match_potential_end_contexts
========================================================== 11 failed, 688 passed, 29 skipped, 9 xfailed, 22 warnings in 125.34s (0:02:05) ===========================================================

This differs slightly from the previous list.

Do we want to get this merged now, and then look at those remaining tests in another issue/PR, or do we want to fix these in this PR too? I recognize that the remaining issues are a bit more challenging.

ekaf · 2022-09-20T06:47:34Z

@tomaarsen (#3050 (comment)) the two additional failures with nltk/test/unit/test_downloader.py in the previous list were due to being offline. When testing online I get the same list as you.

ekaf · 2022-09-20T11:40:33Z

I don't think this PR should try to fix everything, but maybe the easy framenet.py fix could fit here.

tomaarsen · 2022-09-21T13:18:44Z

As suggested by @ekaf, I've quickly fixed the framenet test. The failing tests are now:

====================================================================================== short test summary info ====================================================================================== 
FAILED nltk/text.py::nltk.text.Text.collocation_list
FAILED nltk/draw/table.py::nltk.draw.table.MultiListbox.__init__
FAILED nltk/draw/table.py::nltk.draw.table.MultiListbox.configure
FAILED nltk/draw/table.py::nltk.draw.table.Table
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.CoreNLPDependencyParser
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.CoreNLPParser
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.GenericCoreNLPParser.tag
FAILED nltk/parse/corenlp.py::nltk.parse.corenlp.GenericCoreNLPParser.tokenize
FAILED nltk/parse/transitionparser.py::nltk.parse.transitionparser.demo
FAILED nltk/tokenize/punkt.py::nltk.tokenize.punkt.PunktSentenceTokenizer._match_potential_end_contexts
========================================================== 10 failed, 689 passed, 29 skipped, 9 xfailed, 22 warnings in 143.19s (0:02:23) ===========================================================

I'll merge this once the tests go green. Thanks for the work @Madnex & @ekaf!

I believe the original typo was misinterpreted and changed to something that was not originally intended.

* fixed pytests * fixed more pytests * fixed more pytest and changed multiline pytest issues fixes for snowball.py and causal.py * fixed pytests (mainly multiline or rounding issues) * fixed treebank pytests, removed test for return_string=True (deprecated) * fixed destructive.py pytests, removed test for return_string=True (deprecated) * fixed pytest (rounding issues) * fixed pytest (initialised missing object) * fixed pytest (formatting issues) * fixed pytest (formatting issues) * fixed pytest (formatting issues) * added pytest +SKIP for deprecated module stanford * updated AUTHORS.md * changed docstring corrections by usage of ELLIPSIS and different roundings * fixed AUTHORS.md to be consistent * Fix framenet doctest formatting with pprint * Change docstring on MultiListBox.__init__ I believe the original typo was misinterpreted and changed to something that was not originally intended. Co-authored-by: Jan Lennartz <jan.lennartz@ing.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>

jingbels added 14 commits August 1, 2022 08:10

fixed pytests

478beda

fixed more pytests

975a544

fixed more pytest and changed multiline pytest issues fixes for snowb…

a50342e

…all.py and causal.py

fixed pytests (mainly multiline or rounding issues)

20ed4bb

fixed treebank pytests, removed test for return_string=True (deprecated)

111addb

fixed destructive.py pytests, removed test for return_string=True (de…

4ec0b21

…precated)

fixed pytest (rounding issues)

9e9a10c

fixed pytest (initialised missing object)

25d25a7

fixed pytest (formatting issues)

c2bf61a

fixed pytest (formatting issues)

b35f079

fixed pytest (formatting issues)

c213a51

Merge branch 'develop' of https://github.com/nltk/nltk into docstring…

b9058a3

…-tests

added pytest +SKIP for deprecated module stanford

a4df551

updated AUTHORS.md

6e85f00

Madnex mentioned this pull request Sep 12, 2022

Failing tests in DocStrings #2989

Closed

tomaarsen reviewed Sep 12, 2022

View reviewed changes

jingbels added 2 commits September 13, 2022 13:03

changed docstring corrections by usage of ELLIPSIS and different roun…

c2b26e9

…dings

changed docstring corrections by usage of ELLIPSIS and different roun…

2c7a4ab

…dings

Madnex force-pushed the docstring-tests branch from 91f20c0 to 2c7a4ab Compare September 13, 2022 11:42

fixed AUTHORS.md to be consistent

e7b189a

Merge branch 'develop' into docstring-tests

8eef3b6

Fix framenet doctest formatting with pprint

ddcb327

Change docstring on MultiListBox.__init__

e86eebc

I believe the original typo was misinterpreted and changed to something that was not originally intended.

tomaarsen merged commit 8a4cf5d into nltk:develop Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docstring tests #3050

Docstring tests #3050

Madnex commented Sep 12, 2022

tomaarsen left a comment

tomaarsen Sep 12, 2022

Madnex Sep 13, 2022

tomaarsen Sep 13, 2022

ekaf Sep 13, 2022

tomaarsen Sep 12, 2022

Madnex Sep 13, 2022

tomaarsen Sep 13, 2022

tomaarsen Sep 12, 2022

tomaarsen Sep 12, 2022

Madnex Sep 13, 2022

tomaarsen Sep 13, 2022

tomaarsen Sep 12, 2022

tomaarsen Sep 12, 2022

tomaarsen commented Sep 12, 2022

stevenbird commented Sep 13, 2022

ekaf commented Sep 13, 2022

Madnex commented Sep 13, 2022

ekaf commented Sep 18, 2022 •

edited

ekaf commented Sep 18, 2022 •

edited

tomaarsen commented Sep 19, 2022 •

edited

ekaf commented Sep 20, 2022

ekaf commented Sep 20, 2022

tomaarsen commented Sep 21, 2022

Docstring tests #3050

Docstring tests #3050

Conversation

Madnex commented Sep 12, 2022

tomaarsen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaarsen commented Sep 12, 2022

stevenbird commented Sep 13, 2022

ekaf commented Sep 13, 2022

Madnex commented Sep 13, 2022

ekaf commented Sep 18, 2022 • edited

ekaf commented Sep 18, 2022 • edited

tomaarsen commented Sep 19, 2022 • edited

ekaf commented Sep 20, 2022

ekaf commented Sep 20, 2022

tomaarsen commented Sep 21, 2022

ekaf commented Sep 18, 2022 •

edited

ekaf commented Sep 18, 2022 •

edited

tomaarsen commented Sep 19, 2022 •

edited