Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't bundle new version of Sidekiq Pro #4514

Closed
mperham opened this issue Apr 8, 2021 · 27 comments · Fixed by #4563
Closed

Can't bundle new version of Sidekiq Pro #4514

mperham opened this issue Apr 8, 2021 · 27 comments · Fixed by #4563

Comments

@mperham
Copy link

mperham commented Apr 8, 2021

Hi, I released Sidekiq Pro 5.2.2 yesterday and people are telling me that they can't get it with Bundler 2.2.15. I noticed that ~/.bundle/cache/compact_index/*/versions is out of date on my machine. The cached version is from Feb.

~/.bundle/cache/compact_index/enterprise.contribsys.com.xxxxxxx.443.cdac25812e55451c9ec774c09cafb44f$ more versions 
created_at: 2021-02-04T22:11:58+00:00

My server version is from yesterday.

/var/www/enterprise$ more versions 
created_at: 2021-04-07T20:37:17+00:00

Even if I put gem 'sidekiq-pro', '5.2.2', Bundler tells me that version is not available.

Is there any debug logging I can enable or other way to diagnose the problem? Can you reproduce the issue with your Sidekiq Pro credentials?

@mperham mperham added the Bundler label Apr 8, 2021
@mperham
Copy link
Author

mperham commented Apr 8, 2021

If I rm -rf ~/.bundle/cache/compact_index/*contribsys* then everything installs fine.

@deivid-rodriguez
Copy link
Member

Hi @mperham, not sure what's going on :(

I tried and I can't repro, no.

You can use the --verbose flag to get more information. Which by the way I just noticed it prints unredacted URLs which seems not great, but that's a separate issue.

@weppos
Copy link

weppos commented Apr 12, 2021

I can reproduce the issue, in fact most of my team members are affected. If it helps, I can post the verbose output. It contains some credentials though which I will have to hide, as well other info I'd prefer to keep private.

What's the absolutely minimal info you need from it @deivid-rodriguez to be able to investigate? I'll be happy to help you with the data you need.

@deivid-rodriguez
Copy link
Member

To be honest, I'm a bit lost on this one. It could be some server side issue as well, not really sure. The more info you can provide the better, but it might still not be enough for me to figure out what's going on. Since you can reproduce the error consistently, it would also be helpful if you could run a bisection to figure out which change in bundler broke things for you. From a rubygems clone, you can get the HEAD version of bundler installed by running cd bundler && bin/rake install.

@weppos
Copy link

weppos commented Apr 12, 2021

To be honest, I'm a bit lost on this one. It could be some server side issue as well, not really sure. The more info you can provide the better, but it might still not be enough for me to figure out what's going on. Since you can reproduce the error consistently, it would also be helpful if you could run a bisection to figure out which change in bundler broke things for you. From a rubygems clone, you can get the HEAD version of bundler installed by running cd bundler && bin/rake install.

Running a bisect would require me to have a working version. But actually it never worked, neither in Bundler 2.2.12 (previous version I had), nor in 2.2.16 (which I just installed). So I'm not even sure what I can bisect (in fact, I'm not even sure who's the culprit here).

@mperham did anything change in the way you distribute packages? I know you built your own registry to allow distributing gems only to paid account through private credentials.

@mperham
Copy link
Author

mperham commented Apr 12, 2021 via email

@weppos
Copy link

weppos commented Apr 12, 2021

I could not use proper git bisect as when I went all the way back to Bundler 2.1 I entered a dependency hell and I had to get out the rabbit whole. But I did a sort of Bundler bisect. Here's my findings:

  • Last working Bundler version was Bundler 2.1.4. When I installed it, a bundle update sidekiq-pro immediately found the new version and updated the Gemfile from 5.2.1 to 5.2.2.
  • First not working Bundler version was Bundler 2.2.0.
  • Last working Sidekiq version was sidekiq-pro =< 5.2.1
  • To properly reproduce the issue, you must absolutely NOT have sidekiq-pro-5.2.2 installed. With sidekiq-pro-5.2.2 already installed locally, a Bundler 2.2.x run with a Gemfile listing 2.2.1 will still successfully bump to sidekiq-pro-5.2.2.

Which means:

  • Any Bundler version will work with Sidekiq-pro versions <= 5.2.1
  • Any Sidekiq version will work with Bundler versions <= 2.1.4

I can't tell what changed in Bundler as of 2.2.0 and/or Sidekiq Pro as of 5.2.2, and if the issue is on one side or either sides. Hopefully my findings can provide some starting point to get this investigated with someone with more knowledge than my on either sides.

@mperham
Copy link
Author

mperham commented Apr 12, 2021 via email

@deivid-rodriguez
Copy link
Member

I had a quick look at bundller's 2.2.0 changelog and I found #3865, which was supposed to actually fix things for you, not break them 😬.

Pinging @sonalkr132 and @indirect too since they were involved in sidekiq/sidekiq#4158 and know the server side of things better than me.

@indirect
Copy link
Member

oh no 😬 maybe I somehow made it the opposite, and a missing etag skips downloading entirely???

@deivid-rodriguez
Copy link
Member

Looking at the patch, it looks correct to me. As I read it, the patch would change bundler from raising a checksum mismatch error to always updating the cache when the response does not come with an etag. And that's what was intended.

Anyways, best way to make sure whether that's the culprit or not would be someone hitting the issue manually reverting the change.

@mperham
Copy link
Author

mperham commented Apr 15, 2021

No ETag. No updated cache.

❯ curl -i https://gems.contribsys.com/versions
HTTP/1.1 200 OK
Date: Thu, 15 Apr 2021 15:15:25 GMT
Server: Apache/2.4.29 (Ubuntu)
Last-Modified: Wed, 07 Apr 2021 20:37:16 GMT
Accept-Ranges: bytes
Content-Length: 453

created_at: 2021-04-07T20:37:16+00:00
---
sidekiq-pro 5.2.1,2.1.4,2.0.5,4.0.5,5.1.1,1.7.1,3.2.1,3.0.3,2.1.5,1.9.1,2.1.3,1.9.3,3.0.6,5.2.0,3.0.4,4.0.3,3.6.1,3.5.4,3.0.1,3.3.3,2.1.0,2.1.2,4.0.1,3.2.2,2.0.8,3.4.5,3.0.5,3.5.3,3.0.2,4.0.2,1.7.5,3.7.1,2.0.3,5.2.2,3.1.0,3.4.2,2.0.1,5.1.0,3.3.2,4.0.0,3.4.1,3.5.1,3.5.0,3.5.2,2.0.4,3.4.4,3.4.3,1.7.6,3.0.0,3.3.1,3.4.0,5.0.0,1.9.2,5.0.1,4.0.4,3.6.0,1.5.1,2.1.1,3.7.0,1.6.0,1.2.4 96f4bd708415e72383c58c731142ae51

@deivid-rodriguez
Copy link
Member

@mperham Everything looks ok to me. 5.2.2 is in there, and the checksum given matches the checksum of the content given by the /info/sidekiq-pro endpoint. Only thing that looks weird is that there seems to be no sorting criteria in there whereas the rubygems.org API gives gems in increasing release date order, I believe. But not sure how that can affect results.

I remembered there's the DEBUG_COMPACT_INDEX environment variable that you can turn on to diagnose this kind of issues. Maybe it sheds some light. Sorry I can't help more but things work just fine consistenly for me 😬.

@sonalkr132
Copy link
Member

The issue here that this script assumes that your server doesn't support etag and partial content. It creates new files from scratch (oppose to appending which would support partial content).
compact_index client will make the request with range headers if the cache file exists on local. You need to ensure from the server-side that you always reply with 200 (instead of 206). ie curl -vvv -r -20 https://gems.contribsys.com/versions (and /info/sidekiq-pro) should return the complete file with 200 status.
I realize that this is hacky but the client relies on Etag to invalidate the local cache.

@indirect
Copy link
Member

Ahhhhh, good catch, thanks for the explanation!

@mperham in case that wasn’t clear, Bundler will by default try to do a range request to get only the end of the file that it hasn’t seen yet. If the content of the file isn’t “newest last”, it might not get the newest info in the ending bytes. Since Bundler uses the etag (which isn’t provided) to know whether to discard the local file and replace it, you’ll need to return the entire file and a 200 response even to range requests. Sorry my fix didn’t account for that!

Probably worth opening a bug that Bundler should discard any partial range response and the local file if the response comes back without an etag and download the entire thing.

@deivid-rodriguez
Copy link
Member

deivid-rodriguez commented Apr 22, 2021

Wait, after yet another look, won't reverting #3865 do the trick here? I think I misunderstood that patch the first time I read it, and now I think it didn't do the right thing.

The previous behaviour seems exactly what we want and has been requested in #4543, namely, if the response would come without an etag, we would still append the partial range response to the temporary versions file, but we would end up not "committing" the result because the etag for the temporary versions would never match the given "empty etag". So we would discard that version file and retry with a full (not partial content) request.

This explains why this got broken with bundler 2.2.0, being the first release to include #3865.

What I think #3865 actually implemented was to accept the partial-range request "blindly" if the response didn't come with an ETag, and always append it to the existing versions file.

Let me know if this makes sense.

@deivid-rodriguez
Copy link
Member

oh no 😬 maybe I somehow made it the opposite, and a missing etag skips downloading entirely???

Yes, @indirect, I think that was actually the effect of the patch! With the patch the partial content response would be accepted, and that would skip redownloading the full file.

@indirect
Copy link
Member

If I am remembering correctly, the real life outcome before my patch was that Bundler would return an error and refuse to use the compact index at all if the response did not include an etag. So I think that's the underlying problem that needs fixing, and my patch uh... didn't do that, for the case where there's already a local copy of the compact index file on the client that needs to be updated. 😬

@deivid-rodriguez
Copy link
Member

deivid-rodriguez commented Apr 22, 2021

My understanding is that the old code would raise an error and refuse to update the compact index only if retrying a full update fails.

This is the relevant part of the code:

if retrying.nil? && local_path.file?
SharedHelpers.filesystem_access(local_temp_path) do
FileUtils.cp local_path, local_temp_path
end
headers["If-None-Match"] = etag_for(local_temp_path)
headers["Range"] =
if local_temp_path.size.nonzero?
# Subtract a byte to ensure the range won't be empty.
# Avoids 416 (Range Not Satisfiable) responses.
"bytes=#{local_temp_path.size - 1}-"
else
"bytes=#{local_temp_path.size}-"
end
end
response = @fetcher.call(remote_path, headers)
return nil if response.is_a?(Net::HTTPNotModified)
content = response.body
SharedHelpers.filesystem_access(local_temp_path) do
if response.is_a?(Net::HTTPPartialContent) && local_temp_path.size.nonzero?
local_temp_path.open("a") {|f| f << slice_body(content, 1..-1) }
else
local_temp_path.open("wb") {|f| f << content }
end
end
etag = (response["ETag"] || "").gsub(%r{\AW/}, "")
if etag.length.zero? || etag_for(local_temp_path) == etag
SharedHelpers.filesystem_access(local_path) do
FileUtils.mv(local_temp_path, local_path)
end
return nil
end
if retrying
raise MisMatchedChecksumError.new(remote_path, etag, etag_for(local_temp_path))
end
update(local_path, remote_path, :retrying)

Without short-cuircuiting the no-ETag case in line 62 (#3865), the method would end up being called recursively with the retrying flag in line 73. And if the retrying flag is set, bundler will avoid adding If-None-Match and Range headers to the request, thus requesting a full update (line 33). So I don't quite get what the original problem was, maybe the full update fallback was also failing before @mperham updated his server-side scripts, in which case we would indeed end up raising an error in line 70.

@deivid-rodriguez
Copy link
Member

I opened #4563 with some further analysis. For now it's just a revert of #3865, but I think we can probably keep the current code, but only if we are served a 200 response with full content to our partial range request.

In that case, accepting the response even if we're given no ETag would be fine, and also more efficient since no retry requests would be needed, as long as sidekiq-pro server scripts are changed to return the full response and a 200 status (as suggested in #4514 (comment)).

@deivid-rodriguez
Copy link
Member

So to sum up, I believe the idea in #3865 was fine and would make things more efficient, but it should've only been applied when the response is a full response. If the response is a partial content response, we can't really accept the response as is without an ETag.

@indirect
Copy link
Member

That makes sense! Thanks for investigating and figuring that out.

@mperham
Copy link
Author

mperham commented Apr 29, 2021

I'm trying to determine next steps: what's the conclusion here? Must I make a change server-side or will Bundler handle my server responses?

@deivid-rodriguez
Copy link
Member

I'm working on it right now.

Reverting #3865 as proposed in #4563 should fix the problem. Now that I understand the problem I can recreate the conditions to properly reproduce it, and I plan to add a regression spec to #4563. It will require two requests to update each endpoint of the compact index cache, but things will work fine.

I propose to focus on fixing this first, and then we can consider further improvements on both sides.

@deivid-rodriguez
Copy link
Member

Test is tricky but I can fully verify now manually that the problem happens with #3865, and is fixed by reverting it.

@deivid-rodriguez
Copy link
Member

Alright I have a test now, will update the PR soon.

@deivid-rodriguez
Copy link
Member

Pull request should be ready assuming CI passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants