Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidekiq UI responds with internal server error when session cookie is invalid #4671

Closed
basherru opened this issue Aug 19, 2020 · 20 comments · Fixed by #4672
Closed

Sidekiq UI responds with internal server error when session cookie is invalid #4671

basherru opened this issue Aug 19, 2020 · 20 comments · Fixed by #4672

Comments

@basherru
Copy link

basherru commented Aug 19, 2020

Ruby version: ruby 2.7.0p0
Sidekiq / Pro version(s): 6.1.1, 5.1.1
Rails version: 6.0.3.2

We are using Rails 6.0.3.2 with Sidekiq Pro. When the rack.session cookie is corrupted and the digest verification in Rack::Session::Cookie fails, it returns an empty hash as a session data. This leads to sidekiq web trying to get the :csrf key in the empty hash and getting nil, which is then passed to Base64.strict_encode64 sending the unpack1 message to it.

Steps to reproduce:

  1. Visit Sidekiq web UI with default session store
  2. Change like a bit in the rack.session cookie digest (after -- delimiter)
  3. Try to perform any action that require POST request with form-data
NoMethodError: undefined method `unpack1' for nil:NilClass
  from base64.rb:74:in `strict_decode64'
  from sidekiq/web/csrf_protection.rb:145:in `decode_token'
  from sidekiq/web/csrf_protection.rb:141:in `compare_with_real_token'
  from sidekiq/web/csrf_protection.rb:105:in `valid_token?'
  from sidekiq/web/csrf_protection.rb:76:in `accept?'
  from sidekiq/web/csrf_protection.rb:41:in `call'
  from rack/session/abstract/id.rb:266:in `context'
  from rack/session/abstract/id.rb:260:in `call'
  from rack/content_length.rb:17:in `call'
  from rack/urlmap.rb:74:in `block in call'
  from rack/urlmap.rb:58:in `each'
  from rack/urlmap.rb:58:in `call'
  from rack/builder.rb:244:in `call'
  from sidekiq/web.rb:105:in `call'
  from sidekiq/web.rb:110:in `call'
  from action_dispatch/routing/mapper.rb:19:in `block in <class:Constraints>'
  from action_dispatch/routing/mapper.rb:48:in `serve'
  from action_dispatch/journey/router.rb:49:in `block in serve'
  from action_dispatch/journey/router.rb:32:in `each'
  from action_dispatch/journey/router.rb:32:in `serve'
  from action_dispatch/routing/route_set.rb:834:in `call'
  from rack/etag.rb:27:in `call'
  from rack/conditional_get.rb:40:in `call'
  from rack/head.rb:12:in `call'
  from action_dispatch/middleware/callbacks.rb:27:in `block in call'
  from active_support/callbacks.rb:101:in `run_callbacks'
  from action_dispatch/middleware/callbacks.rb:26:in `call'
  from action_dispatch/middleware/actionable_exceptions.rb:17:in `call'
  from action_dispatch/middleware/debug_exceptions.rb:32:in `call'
  from action_dispatch/middleware/show_exceptions.rb:33:in `call'
  from rails/rack/logger.rb:37:in `call_app'
  from rails/rack/logger.rb:26:in `block in call'
  from active_support/tagged_logging.rb:80:in `block in tagged'
  from active_support/tagged_logging.rb:28:in `tagged'
  from active_support/tagged_logging.rb:80:in `tagged'
  from rails/rack/logger.rb:26:in `call'
  from action_dispatch/middleware/remote_ip.rb:81:in `call'
  from action_dispatch/middleware/request_id.rb:27:in `call'
  from rack/runtime.rb:22:in `call'
  from active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
  from action_dispatch/middleware/executor.rb:14:in `call'
  from rack/sendfile.rb:110:in `call'
  from action_dispatch/middleware/ssl.rb:74:in `call'
  from action_dispatch/middleware/host_authorization.rb:76:in `call'
  from rack/cors.rb:100:in `call'
  from raven/integrations/rack.rb:51:in `call'
  from lamian/middleware.rb:16:in `block in call'
  from lamian.rb:49:in `block in run'
  from lamian/logger.rb:27:in `run'
  from lamian.rb:49:in `run'
  from lamian/middleware.rb:16:in `call'
  from rails/engine.rb:527:in `call'
  from puma/configuration.rb:228:in `call'
  from puma/server.rb:713:in `handle_request'
  from puma/server.rb:472:in `process_client'
  from puma/server.rb:328:in `block in run'
  from puma/thread_pool.rb:134:in `block in spawn_thread'
@cavi21
Copy link

cavi21 commented Oct 11, 2020

Hi there this issue is still happening in a very weird and inconsistent way. I experienced when trying to retry some dead jobs, but couldn't debug it due to the fact that it was in production and also because some attempts were successful, not sure why. Maybe have something to do with the puma worker that is answering the request, not sure... really weird.

The gem versions that I'm using are:

sidekiq (6.1.2)
rack (2.2.3)
actionpack (6.0.3.4)
puma (4.3.6)

I will try to reproduce it in development and see if I can find more ideas of what is going on. Any idea or suggestion is welcome.

@mperham
Copy link
Collaborator

mperham commented Oct 11, 2020

You’ll need to give us an app that reproduces the problem.

@cavi21
Copy link

cavi21 commented Oct 12, 2020

Sure thing! I will post here with the details when I found the cases

@szechyjs
Copy link
Contributor

szechyjs commented Nov 3, 2020

I too am still seeing this issue:

NoMethodError: undefined method `unpack1' for nil:NilClass
  from base64.rb:74:in `strict_decode64'
  from sidekiq/web/csrf_protection.rb:150:in `decode_token'
  from sidekiq/web/csrf_protection.rb:146:in `compare_with_real_token'
  from sidekiq/web/csrf_protection.rb:110:in `valid_token?'
  from sidekiq/web/csrf_protection.rb:76:in `accept?'
  from sidekiq/web/csrf_protection.rb:41:in `call'
  from rack/session/abstract/id.rb:266:in `context'
  from rack/session/abstract/id.rb:260:in `call'
  from rack/content_length.rb:17:in `call'
  from rack/urlmap.rb:74:in `block in call'
  from rack/urlmap.rb:58:in `each'
  from rack/urlmap.rb:58:in `call'
  from rack/builder.rb:244:in `call'
  from sidekiq/web.rb:105:in `call'
  from sidekiq/web.rb:110:in `call'
  from action_dispatch/routing/mapper.rb:19:in `block in <class:Constraints>'
  from action_dispatch/routing/mapper.rb:48:in `serve'
  from action_dispatch/journey/router.rb:49:in `block in serve'
  from action_dispatch/journey/router.rb:32:in `each'
  from action_dispatch/journey/router.rb:32:in `serve'
  from action_dispatch/routing/route_set.rb:834:in `call'
  from scout_apm/instruments/rails_router.rb:29:in `call_with_scout_instruments'
  from omniauth/strategy.rb:192:in `call!'
  from omniauth/strategy.rb:169:in `call'
  from omniauth/builder.rb:45:in `call'
  from scout_apm/middleware.rb:17:in `call'
  from lib/include_version_header.rb:10:in `call'
  from rack/session/abstract/id.rb:266:in `context'
  from rack/session/abstract/id.rb:260:in `call'
  from action_dispatch/middleware/cookies.rb:648:in `call'
  from rack/etag.rb:27:in `call'
  from rack/conditional_get.rb:40:in `call'
  from rack/head.rb:12:in `call'
  from action_dispatch/middleware/callbacks.rb:27:in `block in call'
  from active_support/callbacks.rb:101:in `run_callbacks'
  from action_dispatch/middleware/callbacks.rb:26:in `call'
  from action_dispatch/middleware/actionable_exceptions.rb:17:in `call'
  from action_dispatch/middleware/debug_exceptions.rb:32:in `call'
  from action_dispatch/middleware/show_exceptions.rb:33:in `call'
  from lograge/rails_ext/rack/logger.rb:15:in `call_app'
  from rails/rack/logger.rb:26:in `block in call'
  from bp/shared/logger.rb:145:in `block in tagged'
  from bp/shared/logger.rb:23:in `tagged'
  from bp/shared/logger.rb:145:in `tagged'
  from rails/rack/logger.rb:26:in `call'
  from bp/shared/quiet_routes.rb:19:in `call'
  from action_dispatch/middleware/remote_ip.rb:81:in `call'
  from request_store/middleware.rb:19:in `call'
  from action_dispatch/middleware/request_id.rb:27:in `call'
  from rack/runtime.rb:22:in `call'
  from active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
  from action_dispatch/middleware/executor.rb:14:in `call'
  from rack/sendfile.rb:110:in `call'
  from action_dispatch/middleware/ssl.rb:74:in `call'
  from action_dispatch/middleware/host_authorization.rb:76:in `call'
  from raven/integrations/rack.rb:51:in `call'
  from rack/cors.rb:100:in `call'
  from scout_apm/instruments/middleware_summary.rb:58:in `call'
  from rails/engine.rb:527:in `call'
  from puma/configuration.rb:228:in `call'
  from puma/server.rb:713:in `handle_request'
  from puma/server.rb:472:in `process_client'
  from puma/server.rb:328:in `block in run'
  from puma/thread_pool.rb:134:in `block in spawn_thread'
  • sidekiq: 6.1.2
  • rails: 6.0.3.3
  • puma: 4.3.5
  • rack: 2.2.3

@gingerlime
Copy link

gingerlime commented Jan 26, 2021

Thanks for the fix @szechyjs !

I still wonder how/what influences the CSRF token refresh on the web interface of Sidekiq? It seems like I now get Forbidden errors fairly regularly when I try to e.g. delete a job from the retry queue. I'm not 100% sure, but this seems to happen when I quickly go to the page, rather than if I wait a bit or refresh the page first...

Any tips on how to investigate this? or how to make sure I get the CSRF token "loaded" correctly?

EDIT: @cavi21's bewilderment matches mine as well :)

Hi there this issue is still happening in a very weird and inconsistent way. I experienced when trying to retry some dead jobs, but couldn't debug it due to the fact that it was in production and also because some attempts were successful, not sure why. Maybe have something to do with the puma worker that is answering the request, not sure... really weird.

EDIT2: very strange, but I'm getting 403s nearly always on our staging environment, but some times it works. On my dev environment I cannot reproduce it at all, and never get a 403 :-/

@gingerlime
Copy link

@cavi21 I think your intuition was right. I debugged a bit on our staging environment, and I think it's somehow related to the puma processes. If I run only one process, it's working fine, but the more puma processes we have it seems the likelihood of failing the CSRF check increases... Perhaps the rack session isn't shared across all puma processes?

We're not using cookies for the sessions, but rather storing them in redis ... I imagine it works for most people who rely on cookies for storing the sessions? but then fails in our case...

@cavi21
Copy link

cavi21 commented Jan 26, 2021

Hi @gingerlime we also have the session on redis, so maybe there is something around that configuration. I narrowed down, as you did, to the way puma workers handle the request, but did not have the time to keep digging.

If you found something or want me to try anything on my config just let me know!

@gingerlime
Copy link

gingerlime commented Jan 29, 2021

An update... did a bit more digging, and found a workaround, although I'm still not sure if it's the best one, nor why precisely it's not working.

TL;DR: using redis-rack seems to work around the issue here, even with multiple puma workers. Only tested in development so far though (but with multiple puma processes).

What I found so far:

  • sidekiq seems to use Rack::Session which isn't the same as the rails session. It defaults to cookies as far as I can tell.
  • my rack.session cookie is being set, but I also noticed that the cookie value keeps changing, e.g. when I'm on the sidekiq main page, there's a request to stats every couple of seconds or so, and each request seems to change the cookie value slightly
  • When I decode the rack.session cookie values before and after a stats request, I can see that both the session_id and csrf values are different (that in itself probably isn't a problem? I guess every request will refresh the csrf cookie?)
  • When I tried to debug requests that were successful or failed. Somehow however, some times env["rack.session"] has the csrf token present, and then the request is successful, and other times env["rack.session"] only has the session id, with no csrf token, and then obviously it fails. In both cases, the rack.session cookie is being sent and if I inspect env I can see it (e.g. env["HTTP_COOKIE"].split("; ").find { |x| x.match /rack.session/ } shows the cookie is part of the request).

Given that the session data is actually stored in a cookie anyway, I'm really confused how it relates to different puma workers, but somehow it seems to... not clue why.

Hope someone else can shed some light or has some ideas what specifically to test?

@gingerlime
Copy link

gingerlime commented Jan 30, 2021

@cavi21 are you using the clearance gem by any chance? (EDIT: or more likely, not using Devise?)

@gingerlime
Copy link

Another solution/workaround seems to be setting the sidekiq web session secret key ... no clue why it works without it in some cases or with a single puma worker process, but fails otherwise though.

@gingerlime
Copy link

Ok, I think I'm getting to the bottom of this, but still have a few questions :)

It looks like setting the secret key will prevent it from being randomly set here https://github.com/mperham/sidekiq/blob/3b5ae30c4e5e9e760268243ab5c14664a2f8d236/lib/sidekiq/web.rb#L165-L169 ...

This explains why:

  1. setting the sidekiq web session secret key works. It prevents sidekiq from randomly loading a new one for each process.
  2. a single puma process works, because the secret key is being set once, and then used on all subsequent requests.

I'm not familiar with the sidekiq codebase though, so wasn't sure why using?(::Rack::Session::Cookie) returns false in my case... When I tested it with Devise, the problem was gone, so I'm pretty sure that Devise adds this middleware, whereas Clearance (or other gems) don't.

I found a rather old note in the changelog about it. So I guess another solution is to manually add this middleware? even though mine is a rails app...

Should sidekiq warn about this? setting a random secret is a nice failsafe fallback, and works in some cases, but obviously not in a multi-process environment...

@mperham
Copy link
Collaborator

mperham commented Jan 30, 2021

Here's a quick summary of my knowledge. Sidekiq needs a valid, working Rack session to implement CSRF protection.

  • Easiest workaround is Sidekiq::Web.sessions = false. This disables security and opens you up to CSRF attacks. Not good.
  • Next, Sidekiq must work as a standalone Rack app when called as run Sidekiq::Web. This is why it injects a Rack::Session::Cookie middleware if not already configured, in order to get the simplest safe session possible.
  • Most commonly, you can configure Sidekiq::Web to run next to a Rails app. But Sidekiq::Web must know how to decrypt and read the session used by Rails. Either you must put the Rails session middleware "in front of" Sidekiq::Web or you must configure Sidekiq::Web so it can decipher the Rails session cookie too. That's where you set Sidekiq::Web.session_secret = ...; Sidekiq uses Rack::Session::Cookie to decipher the Rails session cookie directly from the HTTP header.
  • If you do not configure the session_secret, Sidekiq and Rails will constantly reset each others sessions, which is why some people get logged out if you log into your app, visit the Sidekiq UI and then try to return to the app.

Should sidekiq warn about this? setting a random secret is a nice failsafe fallback, and works in some cases, but obviously not in a multi-process environment...

There's no way for the Sidekiq UI know it is running in a multi-process environment.

This is why I'm pretty silent on session issues. The situation is quite complex, there's no good solution, every app has a different mix of gems, session configuration and Rack middleware. The possible setups are so varied that I can't debug and diagnose everyone's app. So I stay silent and let people debug their own issue.

@mperham
Copy link
Collaborator

mperham commented Jan 30, 2021

If you have a Rails app, make sure you are mounting the web UI "inside" the Rails routes, like shown here, so it can reuse the Rails application session:

https://github.com/mperham/sidekiq/wiki/Monitoring#rails

@gingerlime
Copy link

gingerlime commented Jan 31, 2021

Thanks for taking time to explain things with so much detail. I really appreciate it. I think I understand the complexity here better now, and your approach makes sense.

First off, yes, we mount Sidekiq web in our Rails routes. However, we don't use the Rails' CookieStore, which I imagine is the default one. Neither are we using Devise, which might also load the middleware, although I couldn't find anything specific yet. So I imagine for most people who are on the Rails "happy path", this isn't going to be an issue out of the box.

I still wonder if there's a way to avoid these awkward scenarios in some way? to help some of those odd setups. A few thoughts:

  • force a secret: e.g. raise an exception instead of setting a random secret key / loading the middleware. i.e. force people to provide a secret (or sessions = false). This is obviously a big breaking change, and can throw off people who might start with sidekiq. However, it's least likely to cause problems later down the line. Many can happily start in development where everything is rosy, but then try to get this live, and suddenly things are failing. This is frustrating and much harder to debug.
  • log a server warning: this might be helpful at least for debugging things in production, so whenever a secret is auto-generated there's a higher chance of knowing this isn't an ideal scenario
  • log more info with the forbidden message: here's a quick ugly example to update csrf_protection.deny with
def deny(env)
  # the digest allows us to "identify" the secret without exposing it
  # so if there are multiple random secrets, the forbidden message + log would make it clear
  digest = OpenSSL::Digest.new("sha1").hexdigest(env["rack.session.options"][:secret])[0...8] rescue nil
  logger(env).warn "attack prevented by #{self.class} #{digest}"
  [403, {"Content-Type" => "text/plain"}, ["Forbidden #{digest}"]]
end
  • auto-detect and use the rails secret key base: this might require checking for different versions of Rails to fetch the secret. If it can be detected however, it can be used instead of a random key and therefore shared across process and server boundaries

mperham added a commit that referenced this issue Jan 31, 2021
Remove all of the hacks and support infrastructure around Rack sessions. Rails provides this by default so we don't need it for 90% of users. The other 10% should know and provide a Rack session.

This is a big change and has the potential to break many installs. It will be part of the 7.0 major version bump and require a lengthy beta period to ensure we document as many edge cases and solutions as possible.

See also #4671, #4728 and many others.
@mperham
Copy link
Collaborator

mperham commented Jan 31, 2021

My biggest issue in fixing this problem is the large number of edge cases. I'd like to tear apart the current build_session method and make it as simple as possible but I don't have 50-100 different apps to test different scenarios to add warnings and log output (as you show) to handle the major edge cases.

I'm thinking this will be the big refactoring for Sidekiq 7.0 and we'll have a public beta period where people will need to open issues to get the Web UI configured properly. If you want to try this today, you can read through #4791, run the new better_sessions branch and report back any issues. Just put this in your Gemfile:

gem 'sidekiq', github: 'mperham/sidekiq', branch: 'better_sessions'

@gingerlime
Copy link

I'd love to help testing the better_sessions branch. I had a very quick glimpse and it seems like a good direction to go to.

I had a couple of other ideas/questions though, just want to throw it out for now:

  • since sidekiq depends on redis, and already stores some metadata etc, maybe it can store the random secret key in redis? then it's shared across process/server boundaries anyway
  • another thought: the CSRF token doesn't need to be stored in a signed or encrypted cookie. It's just a random value that needs to be on the form, and then checked on POST. I'm not that familiar with Rack, but I imagine you can add a completely independent cookie, i.e. without necessarily relying on the rack session? This could make Sidekiq web self-contained and independent of anything but Rack ... unless I'm missing something here.

@cavi21
Copy link

cavi21 commented Feb 1, 2021

Hi @gingerlime sorry for the late reply! happy to know that you figure it out what was the issue and that Mike share his thoughts about it! also happy to help testing on the better_sessions branch if needed.

About your question:

@cavi21 are you using the clearance gem by any chance? (EDIT: or more likely, not using Devise?)

we're using authlogic for authentication.

@mperham
Copy link
Collaborator

mperham commented Feb 1, 2021

I had a couple of other ideas/questions though, just want to throw it out for now:

  • since sidekiq depends on redis, and already stores some metadata etc, maybe it can store the random secret key in redis? then it's shared across process/server boundaries anyway

I like the direction you're thinking. Today Sidekiq doesn't control the session. The secret key is part of the session config.

  • another thought: the CSRF token doesn't need to be stored in a signed or encrypted cookie. It's just a random value that needs to be on the form, and then checked on POST. I'm not that familiar with Rack, but I imagine you can add a completely independent cookie, i.e. without necessarily relying on the rack session? This could make Sidekiq web self-contained and independent of anything but Rack ... unless I'm missing something here.

These are really interesting ideas. I could roll my own CSRF handling using an encrypted cookie directly, independent of any Rack session. We'd still need to deprecate all of the Sidekiq::Web session stuff but the user would not need to supply any session. I like this.

@gingerlime
Copy link

Thank @mperham !

The secret key is part of the session config.

Sorry, I guess I wasn't clear. Still thinking about the existing codebase. Currently if Sidekiq web doesn't have a secret available / the middleware isn't used, then it picks a random secret and sets it. With multi-process / multi-node environment, this means that each process will have its own secret, and then we get the problem that manifested here. If instead, the secret is randomly picked, but then stored in redis, then at least all processes / nodes will share the same secret. However, this is still a patch... And I like the direction you're going with simplifying the whole CSRF / session management, and making Sidekiq::Web potentially much more self-contained... So I think it's ok to leave this out for now. Just wanted to clarify.

These are really interesting ideas. I could roll my own CSRF handling using an encrypted cookie directly, independent of any Rack session. We'd still need to deprecate all of the Sidekiq::Web session stuff but the user would not need to supply any session. I like this.

I think what I had in mind is more or less the OWASP's double-submit cookie. You might not even strictly need an encrypted cookie. You can use a signed one, or even a plaintext one is secure enough (with the caveat of it potentially breaking if you deal with sub-domains which users control). There are other CSRF protection alternatives using headers etc. Somehow this feels the closest to the current implementation, yet I think you can make it self-contained and not dependent on the parent app.

I'd be happy to help in the process if you're interested. Perhaps best to continue this discussion on #4791 ?

@mperham
Copy link
Collaborator

mperham commented Feb 9, 2021

Hi all, I have a new PR ready for testing which aims to improve the Web UI experience. Check out #4804 and try it out with:

gem 'sidekiq', github: 'mperham/sidekiq', branch: 'better_sessions'

I'm trying to solve all of the hacks people are using and forbidden errors you're seeing.

mperham added a commit that referenced this issue Feb 12, 2021
* Simplify Web UI sessions

Remove all of the hacks and support infrastructure around Rack sessions. Rails provides this by default so we don't need it for 95% of users. The other 5% need to provide a Rack session.

This is a big change and has the potential to break installs so it deserves at least a minor version bump.

See also #4671, #4728 and many others.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants