New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883
Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two small comments on your work. Beyond that, looks great!
Eventually we'll want the deprecation of return_str
to be described in the method documentation, but the PR for that documentation is still open: #2878. That's something for when both this and that PR are merged (or at least one of the two).
6b78e28
to
7f6ebf8
Compare
@tomaarsen thanks for the review! I have amended the commit to contain your suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thank you.
I'll leave this open for a little bit so others can have a look at it.
I have a nit comment, I would suggest refactoring the if's in Treebank and WordTokenizer to be:
instead of
It is more readable and faster. Also, it is more pythonic and follows the PEP8 standard. |
as suggested by 12mohaned
Thanks @adamjhawley, @12mohaned, @tomaarsen |
Resolves #2879
As per the discussion on #2879, I have deprecated the use of the 'return_str' parameter in the
tokenize
method of bothNLTKWordTokenizer
andTreebankWordTokenizer
.I have currently done this in a way that raises a
DeprecationWarning
if a non-default value is passed to the parameter.