Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interrupted download reports as hash failure #11153

Open
1 task done
RonnyPfannschmidt opened this issue May 31, 2022 · 7 comments
Open
1 task done

interrupted download reports as hash failure #11153

RonnyPfannschmidt opened this issue May 31, 2022 · 7 comments
Labels
C: error messages Improving error messages C: network connectivity help wanted For requesting inputs from other members of the community state: blocked Can not be done until something else is done type: bug A confirmed bug or unintended behavior

Comments

@RonnyPfannschmidt
Copy link
Contributor

Description

follow-up to #4930

when a large package download is interrupted on a bad link, pip reports a bad hash instead of the interrupt of the download,
this leads to first misidentifying the problem

Collecting $REDACTED
  Downloading $REDACTED
     ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/52.4 MB 51.3 kB/s eta 0:12:13

ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    $REDACTED from $REDACTED#md5=04b4d65eda8bf72ae203d40031aa76a3:
        Expected md5 04b4d65eda8bf72ae203d40031aa76a3
             Got        c66b2d113159da2c6911c475ec00b26f

Expected behavior

pip should report the download as interrupted to indicate the actual problem
its a fact that the hash will of course differ, if you hash a subset instead of al lthe data,
however the error happens at obtaining the data, so failing at the hash is misleading,

i was earnestly trying to figure where my data had gotten corrupted until i realized that the progress was actually not done

pip version

22.1.1

Python version

3.8

OS

Fedora

How to Reproduce

unfortunately i cannot provide a broken network reproducer quickly

Output

No response

Code of Conduct

@RonnyPfannschmidt RonnyPfannschmidt added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels May 31, 2022
@uranusjr
Copy link
Member

uranusjr commented Jun 1, 2022

I wonder why pip treats the download as successfully completed in the first place. Is this a limitation in requests or even urllib3?

@Mr-Pepe
Copy link
Contributor

Mr-Pepe commented May 31, 2023

pip reads directly from Response.raw.stream and it seems that urllib3 does not raise an error if the connection gets closed while reading chunks. I don't know enough about urllib3 to tell whether it should raise an error or not. However, what pip can do is keep count of the downloaded bytes, compare them to the response's Content-Length header before checking hashes, and let the user know that the download was not successful. That seems like a fairly small change and would prevent confusion for the user. I can open an initial PR, unless you think this should be handled by urllib3.

@rahul-theorem
Copy link

rahul-theorem commented Jul 21, 2023

We're consistently seeing this when downloading whls/artifacts that are ~20MB+. We can look into what's causing the networking flakes but this has been a confusing error that we're regularly seeing. Would be very supportive of this change.

@uranusjr uranusjr added help wanted For requesting inputs from other members of the community C: network connectivity C: error messages Improving error messages and removed S: needs triage Issues/PRs that need to be triaged labels Jul 23, 2023
@uranusjr
Copy link
Member

I’m marking this as help wanted since it requires someone that can reliably reproduce this to look into how urllib3 marks the download as complete, and how to perform further sniffing in pip’s networking code to work around this. I would strongly suggest anyone reaching here to attempt to dig deeper into urllib3 to figure out what exactly went wrong and work on a pull request.

@dimbleby
Copy link

psf/requests#4956 perhaps

@zweger
Copy link

zweger commented Aug 21, 2023

I've also been seeing this issue occasionally in CI builds and have started to investigate this issue.
I've setup an intentionally broken local Flask server to proxy PyPI, but to randomly truncate the response and can reproduce this error.

It's true that this is related to the linked requests/urllib3 enforce_content_length issue, which is resolved as of urllib3 v2.0. Unfortunately, upgrading urllib3 alone is not sufficient to resolve this issue (although upgrading urllib3 does give a better error message). The problem is that, due to the way pip/requests streams the response from urllib3, the urllib3 retry logic which pip depends on is bypassed. This can actually happen in two places:

Tracebacks

Response truncated downloading package

  Downloading http://127.0.0.1:5000/files/packages/fa/1a/f191d32818e5cd985bdd3f47a6e4f525e2db1ce5e8150045ca0c31813686/Flask-2.3.2-py3-none-any.whl (96 kB)
ERROR: Exception:
Traceback (most recent call last):
  File "./pip/_vendor/urllib3/response.py", line 704, in _error_catcher
    yield
  File "./pip/_vendor/urllib3/response.py", line 829, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
pip._vendor.urllib3.exceptions.IncompleteRead: IncompleteRead(10 bytes read, 96857 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "./pip/_internal/cli/req_command.py", line 248, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "./pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "./pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
           ^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
                ^^^^^^
  File "./pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
                                       ^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
    super().__init__(
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
                ^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
    dist = self._prepare_distribution()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 540, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 611, in _prepare_linked_requirement
    local_file = unpack_url(
                 ^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 168, in unpack_url
    file = get_http_url(
           ^^^^^^^^^^^^^
  File "./pip/_internal/operations/prepare.py", line 109, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/network/download.py", line 147, in __call__
    for chunk in chunks:
  File "./pip/_internal/cli/progress_bars.py", line 53, in _rich_progress_bar
    for chunk in iterable:
  File "./pip/_internal/network/utils.py", line 63, in response_chunks
    for chunk in response.raw.stream(
  File "./pip/_vendor/urllib3/response.py", line 934, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 873, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 807, in _raw_read
    with self._error_catcher():
  File "/usr/lib64/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "./pip/_vendor/urllib3/response.py", line 721, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
pip._vendor.urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(10 bytes read, 96857 more expected)', IncompleteRead(10 bytes read, 96857 more expected))

Response truncated getting package metadata


http://127.0.0.1:5000 "GET /pypi/simple/flask/ HTTP/1.1" 200 39262
ERROR: Could not install packages due to an OSError.
Traceback (most recent call last):
  File "./pip/_vendor/urllib3/response.py", line 704, in _error_catcher
    yield
  File "./pip/_vendor/urllib3/response.py", line 829, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
pip._vendor.urllib3.exceptions.IncompleteRead: IncompleteRead(10 bytes read, 39252 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./pip/_vendor/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "./pip/_vendor/urllib3/response.py", line 934, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 905, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/urllib3/response.py", line 807, in _raw_read
    with self._error_catcher():
  File "/usr/lib64/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "./pip/_vendor/urllib3/response.py", line 721, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
pip._vendor.urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(10 bytes read, 39252 more expected)', IncompleteRead(10 bytes read, 39252 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "./pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "./pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
           ^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/resolution/resolvelib/found_candidates.py", line 44, in _iter_built
    for version, func in infos:
  File "./pip/_internal/resolution/resolvelib/factory.py", line 279, in iter_index_candidate_infos
    result = self._finder.find_best_candidate(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/package_finder.py", line 890, in find_best_candidate
    candidates = self.find_all_candidates(project_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/package_finder.py", line 831, in find_all_candidates
    page_candidates = list(page_candidates_it)
                      ^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/sources.py", line 134, in page_candidates
    yield from self._candidates_from_page(self._link)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/package_finder.py", line 791, in process_project_url
    index_response = self._link_collector.fetch_response(project_url)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/collector.py", line 461, in fetch_response
    return _get_index_content(location, session=self.session)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/collector.py", line 364, in _get_index_content
    resp = _get_simple_response(url, session=session)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/index/collector.py", line 135, in _get_simple_response
    resp = session.get(
           ^^^^^^^^^^^^
  File "./pip/_vendor/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_internal/network/session.py", line 519, in request
    return super().request(method, url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/requests/sessions.py", line 747, in send
    r.content
  File "./pip/_vendor/requests/models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./pip/_vendor/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
pip._vendor.requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(10 bytes read, 39252 more expected)', IncompleteRead(10 bytes read, 39252 more expected))

(This second error would be reported as JSONDecodeError in the current version of pip.)

There's some related retry discussion here: urllib3/urllib3#542

In essence, the issue is that (in this specific scenario), pip/requests/urllib3 don't cooperate very well to retry failed requests. I suspect that fixing this issue will require some other changes external to pip.

@zweger
Copy link

zweger commented Aug 23, 2023

There's a bunch of moving pieces here, so I'll just outline the steps which I believe are required to resolve these issues.

  1. Upgrading to urllib3 v2, which checks that Content-Length matches the body length.
  2. [Improvement] Pip could resume download package at halfway the connection is poor #4796 / Resume incomplete download #11180, which adds some retry functionality into pip for downloading packages. (For downloading files, pip uses urllib3 directly. For other things, pip uses requests.)
  3. Requests are not retried when received body length is shorter than Content-Length psf/requests#6512 , which would allow pip to retry other failed requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: error messages Improving error messages C: network connectivity help wanted For requesting inputs from other members of the community state: blocked Can not be done until something else is done type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

7 participants