You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am uncertain whether this is expected behavior, but I have encountered this issue on a real-case data. Some of my input data comes with EOS (end-of-sentence: !?.) punctuation at the beginning of the string, which causes the _match_potential_end_contexts method to fail.
To reproduce:
from nltk import sent_tokenize
text1 = "!!Fails to be tokenized."
text2 = "! Fails to be tokenized."
tokenized_text1 = sent_tokenize(text1)
tokenized_text2 = sent_tokenize(text2)
Is string preprocessing before tokenization the only solution? Thanks.
The text was updated successfully, but these errors were encountered:
Hi, I am uncertain whether this is expected behavior, but I have encountered this issue on a real-case data. Some of my input data comes with EOS (end-of-sentence:
!?.
) punctuation at the beginning of the string, which causes the_match_potential_end_contexts
method to fail.To reproduce:
Is string preprocessing before tokenization the only solution? Thanks.
The text was updated successfully, but these errors were encountered: