Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: perfom linting for punkt.py #2830

Merged
merged 4 commits into from Oct 9, 2021

Conversation

12mohaned
Copy link
Contributor

@12mohaned 12mohaned commented Sep 30, 2021

This pull request refactors the punks.py file that contain PunktTokenizer class by doing the following:-

  • Removing un-used variables.
  • Renaming variables to follow PEP8 standard.
  • Removing dead code or un-used keywords and variables.
  • Removing un-necessary else and if's.
  • Replace '==' in checking boolean values with the variable value. For example refactor if value == False with if not value: (performance reasons).

Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the look of this PR! I do have some comments though. Also, it seems that there was a small error in the PR somewhere causing the CI to fail. You can check the outputs for debugging. Perhaps you've renamed a variable to something that was already being used for an existing variable?

nltk/tokenize/punkt.py Outdated Show resolved Hide resolved
nltk/tokenize/punkt.py Outdated Show resolved Hide resolved
@@ -570,8 +570,8 @@ def _tokenize_words(self, plaintext):
yield self._Token(tok, parastart=parastart, linestart=True)
parastart = False

for t in line_toks:
yield self._Token(t)
for token in line_toks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to be consistent with abbreviations, e.g. for tok in line_toks or for token in line_tokens

@stevenbird
Copy link
Member

(FYI @jnothman)

@jnothman
Copy link
Contributor

jnothman commented Oct 3, 2021

(FYI @jnothman)

No issues on my side. Good to see the code receiving some TLC

@stevenbird stevenbird merged commit 82ceb20 into nltk:develop Oct 9, 2021
@stevenbird
Copy link
Member

Thanks @12mohaned

@12mohaned 12mohaned deleted the lint_tokenize branch November 21, 2021 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants