New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaced http
with https
in most URLs
#2852
Conversation
It seems there was an obscure bug with the CoreNLP functionality that fails if ran with |
@tomaarsen I think all localhost urls should remain http, it does not make sense to use https in a local context where domain name can not be verified. |
I agree. I believe to have reverted all cases back to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I went through this PR but hadn't submitted the review, so all my comments are still pending
, doing this now to see what is still relevant.
This way, the website can still be recovered using an internet archive
In short, all websites that give a 404, 403, 401, DNS error, timeout, etc. were kept as what they were (generally Thanks a ton for reviewing this @purificant!! |
Thanks @tomaarsen and @purificant – this was long overdue! |
Hello!
Pull request overview
http://
withhttps://
in certain URLs.http://
withhttps://www.
.nltk.github.com
with links tohttps://www.nltk.org
.http://nltk.sourceforge.net
with links tohttps://www.nltk.org
.Workflow
I followed this workflow to find out which URLs to update:
http://
in the codebase.https://
.http://
withhttps://
.(Keep in mind, this also means if both websites give a 404.)
http://
doesn't load, but thehttps://
does, then usehttps://
. (This ended up not happening in 500+ links)http://
if thehttp://
loads, and thehttps://
doesn't!Exceptions
LICENSE.txt
.http://
in tweets andtwitter.ipynb
.Other changes
I also removed this link, it seems to link to shady websites.
nltk/nltk/corpus/reader/chasen.py
Line 7 in f8a179b
Notes
There are 454 more occurrences of
http://
in the entire codebase, across just 35 files. Most of these are in the aforementionedtwitter.ipynb
and the unit test tweets. The remainder are across the entire codebase.I also removed all uses of
http
in the website. Most important of this isExample Usage <http://www.nltk.org/howto>
in the left menu of the website.What now?
@stevenbird
Upon merging this PR, you might want to enforce HTTPS for the website. It's simply a button under Settings > Pages in https://github.com/nltk/nltk.github.com, called Enforce HTTPS. I would do this myself, but I don't have the permissions.
Future work
Perhaps we would benefit from making a script that crawls the codebase and queries all URLs. That way, we know which ones work and which ones don't. That said, that information in itself is not immediately useful. We would need to find internet archives of the old websites, or find the new locations of the original websites, if they moved.
cc: @iliakur @purificant