You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The TreebankWordDetokenizer().detokenize() method introduces extra spaces before periods when periods are treated as separate tokens in the input. The issue arises from the spaces added here:
Description
The
TreebankWordDetokenizer().detokenize()
method introduces extra spaces before periods when periods are treated as separate tokens in the input. The issue arises from the spaces added here:nltk/nltk/tokenize/treebank.py
Line 362 in d7b428d
which are not properly removed when there are words following the period.
Reproducible code
This code snippet produces the following output:
which contains an unexpected space before the first period.
Expected behavior
The expected output from
TreebankWordDetokenizer().detokenize()
should be:Environment
OS: macOS 14.1.1
Python: 3.11.6
nltk: 3.8.1
The text was updated successfully, but these errors were encountered: