Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Twine improve feedback for existing uploads? #919

Open
zzzeek opened this issue Aug 9, 2022 · 8 comments
Open

Can Twine improve feedback for existing uploads? #919

zzzeek opened this issue Aug 9, 2022 · 8 comments

Comments

@zzzeek
Copy link

zzzeek commented Aug 9, 2022

So the context is yesterday pypi had a problem and the effect was that new project pages, that is after twine does its upload and gives you a new URL to click, were throwing a 500 error.

So from my end, I assumed it had something to do with my package being one of those new "critical" packages, or my two factor wasn't being used for twine (wasn't sure if I had created an API token, turns out I had), or anything like that.

Then what happened was I basically panicked, because SQLAlchemy's URL was throwing a 500 error, repeatedly calling twine upload was seemingly re-uploading the same file over and over, and pypi's status was just happily green "operational".

Basically I had no idea what was going on, because twine kept succeeding, over and over, even though we all know that once you upload a file to pypi, it can't be replaced. So I went nuts making new API tokens, changing configs, and all of that, assuming here comes another "Didn't you know? pypi did BLAH BLAH BLAH you have to change your BLAH BLAH BLAH!" moment, and I could only assume that twine was silently failing in some way.

the actual issue was kind of stupid which is just that the web page itself on pypi couldn't display correctly. it was a display issue in the UX. the file was there, it was fine. pypi veterans knew to look under /simple/ and see that the package was there. If I knew that, I would have just been like, OK great, I'll wait til they fix that. but without any feedback i had no idea what was going on and then you get the big "zzzeek is tweeting at @pypi like a fool' etc. thing.

if you've read this far, you know where I'm going!

is there any way that twine can give some kind of feedback if the file is already there ? Like an option so that it will check for /simple/ to see that the package was uploaded already, and just tell me, or refuse to do the upload if the file was already there. I get that "idempotent" behavior is cool and all but I dont understand the rationale for it in this case. like architecturally I dont actually understand why Pypi isnt using POST semantics for uploading a release and not PUT. Seeing twine actually move the bytes up the wire over and over again, just for the bytes to be thrown in the garbage each time and a happy colorful "all done!" coming back, that seems like not the ideal interface for what is actually happening here.

thanks for listening!

@zzzeek zzzeek changed the title pypi' issue in progress, somehow the "enter" key just submitted it. editing now Aug 9, 2022
@zzzeek zzzeek changed the title issue in progress, somehow the "enter" key just submitted it. editing now pypi's "idempotent" behavior does not give users enough information about what's going on, can twine improve this w/ a check for existing upload? Aug 9, 2022
@dstufft
Copy link
Member

dstufft commented Aug 9, 2022

I get that "idempotent" behavior is cool and all but I dont understand the rationale for it in this case. like architecturally I dont actually understand why Pypi isnt using POST semantics for uploading a release and not PUT. Seeing twine actually move the bytes up the wire over and over again, just for the bytes to be thrown in the garbage each time and a happy colorful "all done!" coming back, that seems like not the ideal interface for what is actually happening here.

FWIW, the reason PyPI does that, is because it didn't used to, but people ran into problems where they hadn't deleted their dist/ directory between releases, so they created a new release and did twine upload dist/* and attempted to re-upload both the old and the new release.

This wasn't a problem prior to twine, because setup.py upload only uploaded the artifact that you created within that same evocation (but that had other problems, like you couldn't test the artifact before uploading, since uploading required creating the artifact in the same command).

I'll leave discussions about what else twine could or couldn't do here for the twine devs to worry about :) Just wanted to mention why it is the way it is on PyPI.

@zzzeek
Copy link
Author

zzzeek commented Aug 9, 2022

I get that "idempotent" behavior is cool and all but I dont understand the rationale for it in this case. like architecturally I dont actually understand why Pypi isnt using POST semantics for uploading a release and not PUT. Seeing twine actually move the bytes up the wire over and over again, just for the bytes to be thrown in the garbage each time and a happy colorful "all done!" coming back, that seems like not the ideal interface for what is actually happening here.

FWIW, the reason PyPI does that, is because it didn't used to, but people ran into problems where they hadn't deleted their dist/ directory between releases, so they created a new release and did twine upload dist/* and attempted to re-upload both the old and the new release.

OK....so....if that were me, Id say to them, "do you rm * when you want to delete a file? then why are you twine uploading *?" why make a foundational architecture decision to support what is basically simple user error, explicit is better than implicit, etc?

pypi could add a new POST-oriented API and twine could use that, unless there's some whole other dimension to this that I'm not seeing (like, everyone runs their own personal pypi and they dont have that feature, somethign like that).

@CaselIT
Copy link

CaselIT commented Aug 9, 2022

Personally I think it would be fine to just inform the user that the file was already there so nothing was done. twine could keep exiting with success, just mention after the upload that nothing was saved because the file already existed

@zzzeek
Copy link
Author

zzzeek commented Aug 9, 2022

Personally I think it would be fine to just inform the user that the file was already there so nothing was done. twine could keep exiting with success, just mention after the upload that nothing was saved because the file already existed

if twine would check and the file is there, it should skip the file / exit (and yes tell us that's what it did). it shouldn't upload the data if it already knows it will be silently skipped

@CaselIT
Copy link

CaselIT commented Aug 9, 2022

it shouldn't upload the data if it already knows it will be silently skipped

my reasoning was that maybe the pypi backend would do additional processing on the uploaded file, like hypothetically returning 200 instead of 201 if the file already exists and the hash is the same, while reporting a 409 error or similar if the file exists but the hash does not match (since files cannot be replaced)

@bhrutledge
Copy link
Contributor

I thought PyPI returned an error response if a file was already uploaded. There's even an option called --skip-existing to allow previously-uploaded files still in dist/, which inspects the response for an indication of the existing files.

However, I just verified that uploading the same file a second time shows the same success output as the first upload, which I agree is confusing. What does cause an error is uploading a file with the same name, but different contents (i.e. a change in the source, but not the version number). @dstufft or @di, is that an accurate description of PyPI's behavior? Has that changed recently? Is it documented somewhere?

if twine would check and the file is there, it should skip the file / exit (and yes tell us that's what it did). it shouldn't upload the data if it already knows it will be silently skipped

There's already logic to check if a file exists used in conjunction with --skip-existing before uploading:

# Note: The skip_existing check *needs* to be first, because otherwise
# we're going to generate extra HTTP requests against a hardcoded
# URL for no reason.
if upload_settings.skip_existing and repository.package_is_uploaded(package):
logger.warning(skip_message)
continue

(Note that this extra check is only done for PyPI, i.e. not TestPyPI, though it's probably trivial to add that).

I'm reluctant to make this the default behavior, and instead let PyPI be the source of truth on what's allowed and how it should be handled.

twine could keep exiting with success, just mention after the upload that nothing was saved because the file already existed

I don't think there's a way for Twine to know this based solely on the upload response. It looks like it's an empty 200 either way. I agree that using different HTTP status codes would be nice, but in absence of that, maybe the 200 response could have content indicating the file already exists? I'm also aware that a new upload API is in the works, so maybe this should be tabled until that's available. Again, @dstufft or @di, what do you think?

@zzzeek
Copy link
Author

zzzeek commented Aug 13, 2022

skip existing seems useful at least, I hadn't noticed that option. that in itself would likely have helped me assuming it displays in the output that the package was detected.

also it looks like --skip-existing would avoid the issue that @dstufft describes above with people saying "twine upload *".

from my POV it seems like if pypi had a POST interface that would make things cleaner, obviously this would all be new optional stuff for those of us with release scripts.

@CaselIT
Copy link

CaselIT commented Aug 13, 2022

twine could keep exiting with success, just mention after the upload that nothing was saved because the file already existed

I don't think there's a way for Twine to know this based solely on the upload response. It looks like it's an empty 200 either way. I agree that using different HTTP status codes would be nice, but in absence of that, maybe the 200 response could have content indicating the file already exists? I'm also aware that a new upload API is in the works, so maybe this should be tabled until that's available. Again, @dstufft or @di, what do you think?

Mine was just a suggestion, providing this information in the content would also work fine.

@bhrutledge bhrutledge changed the title pypi's "idempotent" behavior does not give users enough information about what's going on, can twine improve this w/ a check for existing upload? Can Twine improve feedback for existing uploads? Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants