New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition tree.py
module into tree
package + pickle fix
#2863
Conversation
FYI @edloper |
Yes, I like this solution. How about writing pass-through functions and immediately deprecating them. |
That works. I'll throw it on my todo list for when I've got some more spare time. |
The old import still works, but throws a warning. Perfect!
I've expanded this PR to also tackle an issue very similar to #1324. The bug in question: from nltk.tree import ParentedTree
ptree = ParentedTree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
ptree.copy(deep=False) which throws: Traceback (most recent call last):
File "[sic]\nltk_2863.py", line 12, in <module>
ptree.copy(deep=False)
File "[sic]\nltk\tree.py", line 562, in copy
return type(self)(self._label, self)
File "[sic]\nltk\tree.py", line 1277, in __init__
super().__init__(node, children)
File "[sic]\nltk\tree.py", line 1047, in __init__
self._setparent(child, i, dry_run=True)
File "[sic]\nltk\tree.py", line 1371, in _setparent
raise ValueError("Can not insert a subtree that already " "has a parent.")
ValueError: Can not insert a subtree that already has a parent. The actual issue reported in #1324 is about a deepcopy, but I have no issues with a deepcopy. But upon some investigation, this is actually not really much of an implementation issue - it's a small design oversight. My commit 2d5ef12 has added these fixes, and some doctests with documentation on how shallow copying works for different Tree subclasses. Also, I've now moved I'd like to link this PR to #1324 too, so that issue gets closed if this gets merged.
|
I've moved forward with the changes suggested in #2875. I figured this PR would be a good place for it, as this has kind of become a PR that completely overhauls the All that remains is discussing if we want to add it to |
Looks good to me. I agree about not adding |
Thanks for this major reorganisation + many improvements! |
Resolves #2478, resolves #1324
Hello!
Pull request overview
ParentedTree
objects for Python 3.7+. Please see 938a64b to inspect only these changes.tree
package.Details
This PR contains two main changes, which are fairly unrelated beyond their location. As a result, if one change is welcomed, while the other isn't, then I'll happily remove one of the sections.
Allow unpickling of
ParentedTree
objects for Python 3.7+For context, this is in response to #2478, which shows that
will throw an error for Python 3.7+. The
pickle
unpickling behaviour seems very strange, but my understanding of it seems to be that in Python 3.7+ onwards, the__new__
method gets called with some bogus parameters.Beyond that, the
_parent
and_label
parameters aren't set properly on the child of some parent object (yet), and then_setparent
will complain.The long and short of it is that adding
__getnewargs__
as used bypickle
(see https://docs.python.org/3.9/library/pickle.html#pickling-class-instances) will specify the correct parameters for__new__
when unpickling, and then checking whether the child even has a_parent
attribute further fixes the issue entirely.I've added a doctest for this under
tree.doctest
which shows that this now works correctly for all supported Python versions.You can check these changes in 938a64b. I would recommend doing so, as this PR is kind of large otherwise.
Partitioning the
tree.py
module into atree
packageI noticed that
tree.py
was a collection of many Tree related classes, which together made up this really long file. I'm not a particular fan of this - as I prefer different kinds of classes to be in different files. Currently,nltk.tree
provides the following classes and methods:After this change, these methods and classes arrive from these files:
It's important to note that all of these methods and classes can still be imported with either
from nltk import ParentedTree
orntk.tree import ParentedTree
. This change is purely for code quality and restructuring the codebase a bit, and should not affect users (unless I made a mistake!).Beyond moving these classes to files, the only real changes I made was to move the import of the classes from
nltk.tree.immutable
into the_frozen_class
method. For example forTree
:This is to avoid a circular import.
Discussion
There are two other tree-related files which perhaps ought to be moved into the new
tree
package:treeprettyprinter.py
andtreetransforms.py
. If we didn't have to worry about people's codebases breaking upon updating, then I would move both of these files intotree
, and rename them toprettyprinter.py
andtransforms.py
respectively.However, doing so would mean that these files need to be imported in a new way, which is a breaking change. Instead of the current way of:
Users would need to do
Perhaps there is a way to move the files while still supporting the old imports? For example by writing some hack into
nltk/__init__.py
?Thank you @movabo for letting us know about the bug, and thank you @hitt08 for pointing me to it while I had some spare time to look into it.