Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883

adamjhawley · 2021-11-10T19:20:43Z

Resolves #2879

As per the discussion on #2879, I have deprecated the use of the 'return_str' parameter in the tokenize method of both NLTKWordTokenizer and TreebankWordTokenizer.

I have currently done this in a way that raises a DeprecationWarning if a non-default value is passed to the parameter.

tomaarsen

I have two small comments on your work. Beyond that, looks great!

Eventually we'll want the deprecation of return_str to be described in the method documentation, but the PR for that documentation is still open: #2878. That's something for when both this and that PR are merged (or at least one of the two).

nltk/tokenize/destructive.py

nltk/tokenize/treebank.py

…rdTokenizer

adamjhawley · 2021-11-10T22:41:30Z

@tomaarsen thanks for the review! I have amended the commit to contain your suggestions

tomaarsen

Looks good to me! Thank you.
I'll leave this open for a little bit so others can have a look at it.

12mohaned · 2021-11-15T09:30:16Z

I have a nit comment, I would suggest refactoring the if's in Treebank and WordTokenizer to be:

if return_str:
  ..........

instead of

if return_str is not False:
  ..........

It is more readable and faster. Also, it is more pythonic and follows the PEP8 standard.

nltk/tokenize/destructive.py

as suggested by 12mohaned

stevenbird · 2021-11-18T12:55:22Z

Thanks @adamjhawley, @12mohaned, @tomaarsen

adamjhawley mentioned this pull request Nov 10, 2021

Deprecate return_str parameter in NLTKWordTokenizer and TreebankWordTokenizer #2879

Closed

tomaarsen added deprecation tokenizer labels Nov 10, 2021

tomaarsen requested changes Nov 10, 2021

View reviewed changes

nltk/tokenize/destructive.py Outdated Show resolved Hide resolved

nltk/tokenize/treebank.py Outdated Show resolved Hide resolved

Deprecated 'return_str' parameter in NLTKWordTokenizer and TreebankWo…

7f6ebf8

…rdTokenizer

adamjhawley force-pushed the feature/deprecate_return_str branch from 6b78e28 to 7f6ebf8 Compare November 10, 2021 22:31

tomaarsen approved these changes Nov 11, 2021

View reviewed changes

stevenbird approved these changes Nov 12, 2021

View reviewed changes

tomaarsen reviewed Nov 15, 2021

View reviewed changes

nltk/tokenize/destructive.py Outdated Show resolved Hide resolved

Simplified if-statement

ac092f8

as suggested by 12mohaned

stevenbird merged commit e629d7e into nltk:develop Nov 18, 2021

adamjhawley deleted the feature/deprecate_return_str branch November 18, 2021 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883

Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883

adamjhawley commented Nov 10, 2021

tomaarsen left a comment

adamjhawley commented Nov 10, 2021

tomaarsen left a comment •

edited

12mohaned commented Nov 15, 2021

stevenbird commented Nov 18, 2021

Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883

Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer #2883

Conversation

adamjhawley commented Nov 10, 2021

tomaarsen left a comment

Choose a reason for hiding this comment

adamjhawley commented Nov 10, 2021

tomaarsen left a comment • edited

Choose a reason for hiding this comment

12mohaned commented Nov 15, 2021

stevenbird commented Nov 18, 2021

tomaarsen left a comment •

edited