Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol-okhttp: implement IP filter #1107

Open
jnioche opened this issue Oct 14, 2023 · 3 comments
Open

Protocol-okhttp: implement IP filter #1107

jnioche opened this issue Oct 14, 2023 · 3 comments

Comments

@jnioche
Copy link
Contributor

jnioche commented Oct 14, 2023

See NUTCH-2930

In order to avoid information leakage to a public search index or web archive, it should be possible to configure Nutch in a way that no content is fetched from localhost, loop-back addresses, private address spaces.

NUTCH-2527 adds the configuration snippets to exclude URLs pointing to private addresses.

However, filtering URLs isn't enough because a DNS entry of an arbitrary host name may point to a private IP address. Blocking must happen on the protocol level because the IP address is only know in the protocol implementation. I'll add an implementation for protocol-okhttp.

@rzo1
Copy link
Contributor

rzo1 commented Oct 14, 2023

Sounds useful. Might also be useful to add adresses dynamically during a crawl in order to deal with abuse requests, etc.

@jnioche
Copy link
Contributor Author

jnioche commented Oct 15, 2023

NUTCH-2527 -> #543

@jnioche
Copy link
Contributor Author

jnioche commented Oct 15, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants