Skip to content

Commit

Permalink
Fixing custom_tokens input
Browse files Browse the repository at this point in the history
Fixing custom_tokens input for taking long token first when conflicts(same word of two or more tokens but other one is big token) comes
  • Loading branch information
GharudxD committed Apr 16, 2023
1 parent 489b040 commit 720f603
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion nltk/tokenize/destructive.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,8 @@ def custom_tokenize(self, text: str, custom_tokens: List) -> List[str]:
:return: List of tokens from `text`.
:rtype: List[str]
"""


custom_tokens = sorted(custom_tokens,reverse = True)
# Making unique SEP token for adding in text
# So that we can't split it
def get_sep_token(sep_token: str,text: str) -> str:
Expand Down

0 comments on commit 720f603

Please sign in to comment.