Add --retry and --retry-timeout #1017

ashwoods · 2023-10-05T06:59:06Z

The Issue

We are using GitHub actions and uploads have been failing fairly consistently. We have already reported this to pypi, but we would like to ask if you'd consider a PR adding --retry --retry-timeout similar to curl. Thx!

The text was updated successfully, but these errors were encountered:

sigmavirus24 · 2023-10-05T13:52:31Z

Thoughts on `--retry`

I'm vaguely -0 on --retry. The only reason I'm considering it, however, is that I know that the pypi-publish-action that's maintained is something that's widely used and isn't something people instrument themselves typically.

I would otherwise suggest something like:

for counter in {1..$RETRY_MAX}; do
  twine upload --skip-existing dist/*
  exit_code = $?
  if (( $exit_code == 0 )) ; then
    break
  fi
  echo "Twine upload failed... retrying $counter of $RETRY_MAX"
done

Further, what classes of errors are worthy of a retry? Intermittent network errors can appear to be a number of things (connection reset by peer, EOF occurred in violation of protocol, etc.) and then there are other things that can look like intermittent network errors that aren't. Distinguishing them can be incredibly difficult and you may end up lumping in things we probably shouldn't bother retrying with things we ostensibly should. And that will cause other problems for folks.

Finally, when do retries start counting and do they reset? If we have 5 artifacts we're uploading, and 3 of those need to retry, does each get a max of --retry or do all 5 have a max number of retries set to --retry? We can decide that, but no matter which we pick, someone's going to expect something different and it's going to lead to stubbed toes and invalid bug reports.

The question on --retry is will it benefit something greater than 95% of our current users. So far, it seems you/your colleagues are a very vocal minority. To me, that seems like a "no for now, we can re-evaluate later", but I don't think we have sufficient data to say "no, we'll never do that".

Finally, pip's --retry works per-package it attempts to download/install but I've seen people get endlessly frustrated with that too as it then means that pip can take num_packages**num_retries amount of time to complete which is far longer than they otherwise expected.

Thoughts on `--retry-timeout`

I'm a strong -1 on --retry-timeout because I believe the option you're thinking of in curl is --retry-max-time. The retry-timeout would require a re-architecture of how Twine works today that would be a significant amount of work for a diminishing amount of returns and likely an increase in the number of bugs and instability. To implement something like curl's --retry-max-time we'll need to spin off the work into a separate thread that we can monitor and kill after the allotted amount of time. This would work fine given that everything in Twine is IO bound so threading wouldn't be affected by the GIL. That said, you now have to consider the semantics of what that option means for someone doing twine upload dist/*.

I say that because, let's say you have 5 artifacts in dist to upload (like the example above). If you upload the first 3, and start retrying on the 4th, when does your timer start on retries? Do you start it when you start retrying? Do you stop the timer once successful? What happens on the 5th if it also needs to start retrying? Do you restart the timer?

In curl, it's an easy semantic because one curl command maps (more or less) to a single HTTP request (ignoring redirects, because by default it will not follow them). Twine has variable behaviour. I'd argue that no matter what semantic we pick, it will not be intuitive for a large percentage of users.

Finally, there are shell ways of doing this as above: http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3

Summary

I'm (more or less) opposed to implementing this strictly in twine (but open to @bhrutledge convincing me otherwise) because:

I don't think we have enough data to warrant implementing this
I don't believe the benefits will outweigh the risks and added complexity
I don't believe there is one clear semantic for these options that will make sense to most people, regardless of how well we document it.
Implementing one semantic for either (or both of these) is going to lead to someone wanting a different option for the other semantic and a fair amount of strife
There are ways to implement this outside of twine today (see references above) but those don't work for folks not using twine directly, but are instead using a GitHub action from someone else.

asottile-sentry · 2023-10-10T13:15:16Z

fwiw there are already retries in upload:

twine/twine/repository.py

Lines 190 to 196 in afc37f8

    
           if 500 <= resp.status_code < 600: 
        
               number_of_redirects += 1 
        
               logger.warning( 
        
                   f'Received "{resp.status_code}: {resp.reason}"' 
        
                   "\nPackage upload appears to have failed." 
        
                   f" Retry {number_of_redirects} of {max_redirects}." 
        
               )

I think the actual ask from @ashwoods would be to also retry on a few other protocol-level errors

bhrutledge · 2023-10-10T13:59:37Z

If this is a matter of tweaking the existing retry logic as @asottile-sentry suggests, I'm game to review a PR, but otherwise, I agree with @sigmavirus24's summary.

sigmavirus24 · 2023-10-11T13:16:48Z

If this is a matter of tweaking the existing retry logic as @asottile-sentry suggests, I'm game to review a PR

Hard to say because that's not at all what was asked for.

ashwoods mentioned this issue Nov 28, 2023

upload: add connection exceptions to upload retry protocol #1031

Open

doc-sheet mentioned this issue Mar 12, 2024

build(python): Stop building Python package getsentry/sentry-cli#1972

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --retry and --retry-timeout #1017

Add --retry and --retry-timeout #1017

ashwoods commented Oct 5, 2023

sigmavirus24 commented Oct 5, 2023

asottile-sentry commented Oct 10, 2023

bhrutledge commented Oct 10, 2023

sigmavirus24 commented Oct 11, 2023

Add --retry and --retry-timeout #1017

Add --retry and --retry-timeout #1017

Comments

ashwoods commented Oct 5, 2023

The Issue

sigmavirus24 commented Oct 5, 2023

Thoughts on --retry

Thoughts on --retry-timeout

Summary

asottile-sentry commented Oct 10, 2023

bhrutledge commented Oct 10, 2023

sigmavirus24 commented Oct 11, 2023

Thoughts on `--retry`

Thoughts on `--retry-timeout`