Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am not able to download punkt.zip file for tokenization purpose #202

Open
sumitsharmatops opened this issue Oct 2, 2023 · 4 comments
Open

Comments

@sumitsharmatops
Copy link

Hi, I am working on one NLP project where I am using NLTK, Previously I was downloading punkt via api (nltk.download('punket')) but not want to download this manually but both things are not working. That mean I am not able to download this manually or via API, How to do that. Please help me out for this

@tomaarsen
Copy link
Member

Hello!
There is a CDN issue with the "Jio" internet provider, which prevents it from accessing the NLTK data, e.g.: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.xml.
There are some workaround described here: nltk/nltk#3146

The primary workaround that helped people was temporarily using mobile hotspot.

  • Tom Aarsen

@DaveParr
Copy link

DaveParr commented Nov 5, 2023

I wasn't either, though in my case I was able to easily download it on the same machine fro the same network with no configuration change.

I looked in the index file and found the url, copied it into my browser then moved the file to the relevant place for NLTK to find it.

Obviously this is manual. but if it suits your use case, it may work.

@dvnasutosh
Copy link

The only solution I found is cloning the whole repository. This is what I have done today for my project. It doesn't solve the problem but I hope it gives you a workaround.

@dvnasutosh
Copy link

image

Or you could go to
https://github.dev/nltk/nltk_data
and download it from there. this seems to be a much better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants