Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new custom token function #3142

Closed
wants to merge 4 commits into from

Conversation

GharudxD
Copy link

Added function for create custom token as per given list of custom token.
So If anyone wants some custom token they can create by just giving list else remain same.

@stevenbird
Copy link
Member

@GharudxD please make the case for this functionality being added to NLTK. There needs to be some existing demand otherwise we cannot justify adding to the codebase.

@stevenbird stevenbird self-assigned this Apr 17, 2023
@GharudxD
Copy link
Author

@GharudxD please make the case for this functionality being added to NLTK. There needs to be some existing demand otherwise we cannot justify adding to the codebase.

Yap, you are right we can't add all things in codebase but I thought sometimes when we tokenize sometimes all word didn't tokenize based on space. Some words treated as token like full forms organisation name etc, for this reasons I created this function.I thought if I add this feature in nltk peoples haven't do this manually.

@GharudxD GharudxD force-pushed the created-new-custom-tokenizer branch from 720f603 to a8b534a Compare April 22, 2023 06:20
@stevenbird
Copy link
Member

Sorry @GharudxD, this is too obscure. A better work-around is to pre-process the problematic input.

@stevenbird stevenbird closed this Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants