Prevent LazyModule
from increasing the size of nltk.__dict__
#3033
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
Pull request overview
LazyModule
that would increase the size ofnltk.__dict__
, which may break programs such aspytest --doctest-modules
from iterating over all modules.Details
NLTK uses 5
LazyModule
instances:nltk/nltk/__init__.py
Lines 164 to 168 in 3b72fa2
When inspecting
nltk
after importing withimport nltk
, you can see thatnltk.__dict__["toolbox"]
refers toLazyModule
. Then, after interacting withnltk.toolbox
, it will create a new entry innltk.__dict__
, accessible like so:nltk.__dict__["nltk.toolbox"]
. This one will point to the actually loaded toolbox module.Certain uses of
nltk.toolbox.bla
will then still go through LazyModule's__getattr__
, while the LazyModule should have replaced itself completely.The primary reason that this is an issue is because tools that iterate over
nltk.__dict__
will getRuntimeError: dictionary changed size during iteration
:Changes
After this PR, rather than setting
nltk.__dict__["nltk.toolbox"]
to be the toolbox module,nltk.__dict__["toolbox"]
is overridden to be the toolbox module directly.The output of the program above is now:
Which is my expected behaviour.
Beyond that,
pytest --doctest-modules ./nltk
works now:This is a very relevant and important change for #2989, as it will allow us to execute our doctests!
Note also that this change has no further negative effects, i.e. the LazyModule modules are still accessible like always.
I also moved the Markdown corpus reader imports into the
__init__
, to prevent the test suite from having to import those as dependencies. I'm open to alternatives here, as I'm not sure that's the best solution.