Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883

Merged
merged 2 commits into from Nov 18, 2021

Conversation

adamjhawley
Copy link
Contributor

Resolves #2879

As per the discussion on #2879, I have deprecated the use of the 'return_str' parameter in the tokenize method of both NLTKWordTokenizer and TreebankWordTokenizer.

I have currently done this in a way that raises a DeprecationWarning if a non-default value is passed to the parameter.

Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two small comments on your work. Beyond that, looks great!

Eventually we'll want the deprecation of return_str to be described in the method documentation, but the PR for that documentation is still open: #2878. That's something for when both this and that PR are merged (or at least one of the two).

nltk/tokenize/destructive.py Outdated Show resolved Hide resolved
nltk/tokenize/treebank.py Outdated Show resolved Hide resolved
@adamjhawley
Copy link
Contributor Author

@tomaarsen thanks for the review! I have amended the commit to contain your suggestions

Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you.
I'll leave this open for a little bit so others can have a look at it.

@12mohaned
Copy link
Contributor

I have a nit comment, I would suggest refactoring the if's in Treebank and WordTokenizer to be:

if return_str:
  ..........

instead of

if return_str is not False:
  ..........

It is more readable and faster. Also, it is more pythonic and follows the PEP8 standard.

as suggested by 12mohaned
@stevenbird stevenbird merged commit e629d7e into nltk:develop Nov 18, 2021
@stevenbird
Copy link
Member

Thanks @adamjhawley, @12mohaned, @tomaarsen

@adamjhawley adamjhawley deleted the feature/deprecate_return_str branch November 18, 2021 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate return_str parameter in NLTKWordTokenizer and TreebankWordTokenizer
4 participants