Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StageException: Could not download url https://urlhaus.abuse.ch/downloads/csv_recent/: HTTPSConnectionPool(host='... #61

Open
sentry-io bot opened this issue Dec 14, 2020 · 7 comments
Labels
Backend bug Something isn't working URLHaus

Comments

@sentry-io
Copy link

sentry-io bot commented Dec 14, 2020

Sentry Issue: CHECKMATE-7

timeout: The read operation timed out
(6 additional frame(s) were not displayed)
...
  File "http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "ssl.py", line 1012, in recv_into
    return self.read(nbytes, buffer)
  File "ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)

ReadTimeoutError: HTTPSConnectionPool(host='urlhaus.abuse.ch', port=443): Read timed out. (read timeout=10)
(3 additional frame(s) were not displayed)
...
  File "urllib3/packages/six.py", line 735, in reraise
    raise value
  File "urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "newrelic/hooks/external_urllib3.py", line 32, in _nr_wrapper_make_request_
    return wrapped(*args, **kwargs)
  File "urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "urllib3/connectionpool.py", line 337, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value

ReadTimeout: HTTPSConnectionPool(host='urlhaus.abuse.ch', port=443): Read timed out. (read timeout=10)
(4 additional frame(s) were not displayed)
...
  File "requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "newrelic/api/external_trace.py", line 101, in dynamic_wrapper
    return wrapped(*args, **kwargs)
  File "requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)

StageException: Could not download url https://urlhaus.abuse.ch/downloads/csv_recent/: HTTPSConnectionPool(host='urlhaus.abuse.ch', port=443): Read timed out. (read timeout=10)
  File "checkmate/async/tasks.py", line 61, in sync_urlhaus
    synced = URLHaus(request.db).update_db()
  File "checkmate/checker/url/url_haus.py", line 64, in update_db
    return self._update(self.UPDATE_FEED)
  File "checkmate/checker/url/url_haus.py", line 70, in _update
    values=(self._value_from_row(row) for row in feed(working_dir)),
  File "checkmate/checker/pipeline/core.py", line 49, in __call__
    source = stage(working_dir, source)
  File "checkmate/checker/pipeline/web.py", line 22, in __call__
    raise StageException(f"Could not download url {self._url}: {err}") from err
@seanh seanh added this to the 1 - Implement a checking service milestone Dec 14, 2020
@seanh
Copy link
Contributor

seanh commented Dec 14, 2020

This has happened twice, once on QA and once on prod, both at exactly the same time. I think it's unreliability on URLHaus's end. So I think we should change Checkmate's code to not send these to Sentry. As long as we have alarms for when ingesting blocklist patterns stops for a long time, we'll be fine

@jon-betts jon-betts self-assigned this Dec 14, 2020
@jon-betts jon-betts added the bug Something isn't working label Dec 14, 2020
@jon-betts
Copy link
Contributor

A couple of suggestions:

  • Increasing timeout
  • This probably needs some retries

@seanh
Copy link
Contributor

seanh commented Dec 14, 2020

@seanh
Copy link
Contributor

seanh commented Dec 14, 2020

I think increasing the timeout and adding a finite number of retries is good idea, we should do those. But given that we're gonna have an alarm if blocklist ingesting stops working (#56) I think we should also filter these timeout exceptions out from Sentry

@jon-betts jon-betts modified the milestones: 1 - Implement a checking service, 2A - Checkmate follow up work Jan 6, 2021
@seanh seanh modified the milestones: 2A - Checkmate follow up work, Add a blocklist to public Via and LMS's Via Jan 14, 2021
@jon-betts
Copy link
Contributor

A retry has been added and I'll close the issue in Sentry so we can see if it happens again.

@jon-betts
Copy link
Contributor

We're optimistically closing this as the error hasn't come back. We can keep an eye on Sentry to see if it returns.

@seanh
Copy link
Contributor

seanh commented Feb 16, 2021

This is still happening

@seanh seanh reopened this Feb 16, 2021
@seanh seanh added the Backend label Feb 16, 2021
@seanh seanh added this to Backlog in Secure Via via automation Feb 16, 2021
@seanh seanh added the URLHaus label Mar 5, 2021
@seanh seanh removed this from Backlog in Secure Via Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend bug Something isn't working URLHaus
Projects
None yet
Development

No branches or pull requests

3 participants