Implement strict argument checking #5071

kellysutton · 2021-11-29T22:44:42Z

This PR implements a runtime type check for arguments as a follow-up from the discussion here: #5070. The idea is to provide some safeguards in non-production environments for folks about to do something dangerous (i.e. use arguments in a job that cannot (de)serialize reliably to/from JSON).

Open Questions:

Is #validate or #normalize_item the right place for this logic?
Are there other types we'd like to include in the allow list?
This check is relatively strict as written (no symbols, no hashes). Do we want to instead take the approach explored in Should Sidekiq Error when perform_async is called with poorly serializable arguments? #2870?

Notes:

It looks like this idea has been considered before and has been commented out for some time: Should Sidekiq Error when perform_async is called with poorly serializable arguments? #2870 and ...
https://github.com/mperham/sidekiq/blame/848df46c67da926587cd76ab7d78939b4d79714b/lib/sidekiq/client.rb#L222

…rray allocations

… the class

mperham · 2021-11-30T16:24:02Z

validate is the right method.

You're missing Hash, Array and any notion of recursive checking. I found it simpler to use a real JSON roundtrip i.e. the check that's commented out. I think that combined with your global flag is a better solution.

of confirming the safety of job argument payloads. Cleanup commented-out code from a few years back. Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

kellysutton · 2021-11-30T18:02:05Z

Sounds good. We've updated the PR to address the feedback!

mperham · 2021-11-30T19:04:39Z

I'm worried about the equality check. Does that correctly handle deep elements, e.g. an Array of Symbols?

kellysutton · 2021-11-30T19:24:07Z

Let's find out. We'll write a few more interesting test cases.

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

…se/dump approach and deep structures Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

kellysutton · 2021-11-30T20:37:03Z

Test cases improved. It looks like the JSON.parse/dump approach works to prevent things like symbols either as values or keys be serialized, while letting JSON-friendly hashes/arrays pass through.

mperham · 2021-11-30T20:42:57Z

Ok, now:

when should it be enabled by default? if env == 'development'?
should we provide an API for it? Sidekiq.strict_mode(true/false)?
should the raised error note how to enable/disable it?
where do we document this, if at all, in the wiki?

kellysutton · 2021-11-30T21:24:41Z

when should it be enabled by default? if env == 'development'?

My thinking: It should be configurable in 6.x but maybe become the default in env == 'development' starting in 7.x. Maybe with a deprecation warning closer to the release of 7.x to smooth the upgrade path. How does that sound? Too complicated?

should we provide an API for it? Sidekiq.strict_mode(true/false)?

Seems like it could be a nice way of simplifying the API, but I can go either way. My assumption here is that calling Sidekiq.strict_mode! would set a value in the options hash. Happy to get that whipped up for whichever way you think is best.

should the raised error note how to enable/disable it?

Yep. I'll push a commit to include a note about that once a decision is made for (2).

where do we document this, if at all, in the wiki?

For sure in the Best Practices section about simple arguments. If we go with the 6.x/7.x split in (1), the Best Practice can read something like:

If you want this best practice enforced, you can use Sidekiq.strict_mode! in the environments where you'd like to raise an error.

If we make it the default in 7.x, that message will need to change a bit.

mperham · 2021-11-30T23:50:27Z

Maybe a WARN level log in 6.4+ and raise in 7+?

We're really building a hyper-specialized linter here. Is this better off as a Rubocop cop? What do you think about warning about larger argument sizes, e.g. greater than 16kb?

kellysutton · 2021-11-30T23:53:19Z

Maybe a WARN level log in 6.4+ and raise in 7+?

Sounds good. I'll make it so.

We're really building a hyper-specialized linter here. Is this better off as a Rubocop cop? What do you think about warning about larger argument sizes, e.g. greater than 16kb?

The check we're doing is a bit more of a runtime check (whereas I think of linters as static checks), but I think the ideal place for this type of check is statically. I believe Sorbet/RBS definitions would be the best place to enforce this type of thing, since they could do so without actually running the code. Probably something to figure out for the future, though, as the community stabilizes around a type system.

What do you think about warning about larger argument sizes, e.g. greater than 16kb?

Absolutely. I was going to propose that in a future PR :) Happy to roll it into this one if that makes more sense.

For now, I'll get things tidied up based on the conversation so far.

…d how to enable strict_mode and the best practice

…t_mode! method

kellysutton · 2021-12-01T00:27:56Z

Alright, ready for another review at your leisure. Let me know if you'd like me to squish the commits on this PR, since we've accumulated quite a few.

mperham

Looking pretty good. Do we need a way for the user to disable the check completely (to restore current behavior?). Consider if they have an app which breaks the best practices but their code handles it.

lib/sidekiq/job_util.rb

test/test_client.rb

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

…ead of all at once Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

…imple Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

gingerlime · 2022-01-28T07:22:20Z

Perhaps we're panicking for no reason and the option to disable it will stay available? The Changelog is a bit confusing for me...

Sidekiq will now log a warning if JSON-unsafe arguments are passed to perform_async.
Add Sidekiq.strict_args!(false) to your initializer to disable this warning.
This warning will switch to an exception in Sidekiq 7.0.

Would strict_args!(false) continue to be available in Sidekiq 7.0 ? Or would the switch from warning to an exception also mean that we won't be able to turn this off?

mperham · 2022-01-28T16:46:36Z

It will remain available for backwards compatibility reasons.

gingerlime · 2022-01-29T07:24:19Z

Thanks @mperham. That's very reassuring. 👍

NemyaNation · 2022-02-04T00:13:46Z

Am I missing something or am I in the wrong section:

warning in logs:

app[web] info redis.pipelined do
app[web] info   redis.get("key")
app[web] info end
app[web] info
app[web] info should be replaced by
app[web] info
app[web] info redis.pipelined do |pipeline|
app[web] info   pipeline.get("key")
app[web] info end
app[web] info
app[web] info (called from /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.4.0/lib/sidekiq/api.rb:57:in `block in fetch_stats_fast!'}
Feb 4 00:09:38 my-app app[web] info Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0.

initializers/sidekiq.rb

Sidekiq.configure_server do |config|
	config.redis = {
		url: ENV["REDISCLOUD_URL"]
	}
end

Sidekiq.configure_client do |config|
	config.redis = {
		url: ENV["REDISCLOUD_URL"],
		size: 2
	}
end

Sidekiq.strict_args!(false)

Still clogging up my logs, and this is just from live polling on the dashboard.

mperham · 2022-02-04T00:18:50Z

@NemyaNation Why are you commenting on this issue?

NemyaNation · 2022-02-04T00:25:33Z

@mperham Sorry just realised I commented on the wrong issue.

Shall I create a new issue or is there a simple fix other than downgrading a level.

#5178

tenpaiyomi · 2022-02-09T01:22:41Z

I'm kind of with @bnorton and @gingerlime on this. I can understand the necessity of wanting to ensure users don't send along a whole Ruby object, but the strict enforcement on hashes feels kind of hamfisted. As the change exists currently, I have to go through and change hundreds of lines of code, because a perform function like this:

def perform(user_id, user_event, properties = {})

that is called like this

SomeWorker.perform_async(some_id, some_string, { foo: 'string', bar: 'string' })

is required to be set up like this

SomeWorker.perform_async(some_id, some_string, { 'foo' => 'string', 'bar' => 'string' })

Even though the output of the hash in both cases with JSON.dump is identical,

irb(main):001:0> JSON.dump({foo: 'string', bar: 'string'})
=> "{\"foo\":\"string\",\"bar\":\"string\"}"
irb(main):002:0> JSON.dump({'foo' => 'string', 'bar' => 'string'})
=> "{\"foo\":\"string\",\"bar\":\"string\"}"

But the latter format is no longer Ruby idiomatic and is largely discouraged over utilizing the symbol: value format.

It feels like the json_safe? method is too simplistic. I can understand the check with regards to some things (Date objects, for instance), but other things should be treated a bit more logically, IMO. Some whitelisted basic classes, and adding a recursive check, seems like the more logical approach that doesn't force potentially old style conventions.

mperham · 2022-02-09T15:35:27Z

No one is forcing anyone to use the arg checking. You can use Sidekiq.strict_args!(false) in your initializer to get the old behavior.

tenpaiyomi · 2022-02-09T17:11:12Z

Right, I'm aware of the initializer flag that can be set, but that still leaves the fact that the argument checking will probably be somewhat confusing to individuals who are passing valid JSON serializable arguments, that the wiki says are acceptable, and getting errors about them. Likewise, that is negating the protection of potentially passing actual invalid objects as well.

mperham · 2022-02-09T17:13:10Z

If the wiki says they are acceptable, let's fix the wiki. Symbols are not valid JSON.

tenpaiyomi · 2022-02-09T17:20:36Z

Nor are the hash rockets either. The thing is in this case is not what is valid in JSON, the problem is what is being expected to consider the object valid.

Not valid JSON, not accepted:
{foo: 'bar'}

Not valid JSON, accepted:
{'foo' => 'bar'}

Neither of these are valid JSON objects. If we were going on only accepting valid JSON objects, then only a variant such as this should pass

"{\"foo\": \"bar\"}"

irb(main):020:0> args = "{'foo': 'bar'}"
=> "{'foo': 'bar'}"
irb(main):021:0> JSON.parse(JSON.dump(args)) == args
=> true
irb(main):022:0> args = {'foo' => 'bar'}
=> {"foo"=>"bar"}
irb(main):023:0> JSON.parse(JSON.dump(args)) == args
=> true
irb(main):024:0> args = {'foo':  'bar'}
=> {:foo=>"bar"}
irb(main):025:0> JSON.parse(JSON.dump(args)) == args
=> false

mperham · 2022-02-09T17:21:47Z

Let's be clear here:

perform_async is not a normal Ruby method call. It is an helper for constructing a background job which maps onto a Ruby method call "somewhere else". The wire format for this "remote procedure call" is JSON. We are trying to make that mapping between wire format and the Ruby method call mapping safer by catching unsafe conversions to/from Ruby/JSON. {'foo' => 'bar'} is perfectly legal because it maps safely to JSON.

tenpaiyomi · 2022-02-09T17:29:38Z

Yeah, I get that, but so does {foo: 'bar'}

irb(main):028:0> {'foo' => 'bar'}.to_json
=> "{\"foo\":\"bar\"}"
irb(main):029:0> {foo: 'bar'}.to_json
=> "{\"foo\":\"bar\"}"

There is no scenario I'm aware of where {'foo' => 'bar'} should not be treated synonymous with {foo: 'bar'}. Even the JSON.parse method has an option to specifically map back to the modern idiomatic format, which would make all instances of {'foo' => 'bar'} be considered invalid with the current json_safe? method

irb(main):046:0> args = {'foo' => 'bar'}
=> {"foo"=>"bar"}
irb(main):047:0> JSON.parse(JSON.dump(args)) == args
=> true
irb(main):048:0> JSON.parse(JSON.dump(args), symbolize_names: true) == args
=> false
irb(main):049:0> args = {foo:  'bar'}
=> {:foo=>"bar"}
irb(main):050:0> JSON.parse(JSON.dump(args)) == args
=> false
irb(main):051:0> JSON.parse(JSON.dump(args), symbolize_names: true) == args
=> true

tenpaiyomi · 2022-02-09T17:33:41Z

Just postulating, but would it make sense perhaps to check the validity of parse against both iterations (with and without symbolize_names) to handle both cases?

json_args = JSON.dump(args)
JSON.parse(json_args) == args || JSON.parse(json_args, symbolize_names: true) == args

kellysutton · 2022-02-09T17:42:55Z

The symbolized keys have bitten us a few times before, which is why we introduced this check. While hashes with keys for symbols serializes safely to JSON, it does not deserialize to the same value. Here's a code snippet to illustrate this:

class MyWorker
  include Sidekiq::Worker

  def perform(hash)
    id = hash[:id] # will never exist, but hash['id'] will!
    # ...
  end
end

MyWorker.perform_async({ id: 123 }) #=> becomes the JSON "{ \"id\": 123 }"

tenpaiyomi · 2022-02-09T17:51:21Z

@kellysutton yeah, I can certainly understand that scenario. Unfortunately that is one of the things where symbolize_names on the receiving would resolve that issue, but then it would have the same problem of hash['id'] being nil, and that is not something I'm really advocating to change as it doesn't make sense in this case.

I think it makes sense to say "your received arguments will be always string keyed hashes", but I'm not sure it makes sense to say "you have to adhered to this explicit format for sending arguments," especially when, as noted above, the format doesn't actually dictate the validity of the object serialization, instead just being an arbitrary way that just happens to match how the JSON.dump options are set up. On top of that, as was noted by another commenter, ensuring that all hashes match this explicit arbitrary format is far more difficult than ensuring that all usages in perform just make use of hash['id'] and similar.

tenpaiyomi · 2022-02-09T17:56:30Z

Also, I just want to note that I'm not trying to argue this point out of a sense of stubbornness or anything of the sort. I'm advocating for this perspective because I do feel like this change will cause an undue burden and confusion on the end user, especially with the change several ruby versions back that the hash rocket was the less favored way of generating hashes over the key: value format, and so I feel like it's an important topic to actively discuss the different perspectives of to ensure the best user experience possible. I do greatly appreciate the back and forth dialog.

* Fix Sidekiq warnings about JSON serialization This occurs on every symbol argument we pass, and every symbol key in hashes, because Sidekiq expects strings instead. See sidekiq/sidekiq#5071 We do not need to change how workers parse their arguments because this has not changed and we were already converting to symbols adequately or using `with_indifferent_access`. * Set Sidekiq to raise on unsafe arguments in test mode In order to more easily catch issues that would produce warnings in production code.

Sidekiq 6.4 has started to log warnings (sidekiq/sidekiq#5071) if the job arguments are not strictly JSON-safe i.e. serializing to JSON and deserializing from JSON yield consistent outputs. To what it means, we cannot use a hash with symbol keys since doing the JSON rountrip will change them all to string keys. This fix will do the JSON round trip for the job arguments beforehand so Sidekiq won't complain it (and break it starting version 7.0)

Sidekiq 6.4 has started to log warnings (sidekiq/sidekiq#5071) if the job arguments are not strictly JSON-safe i.e. serializing to JSON and deserializing from JSON yield consistent outputs. To what it means, we cannot use a hash with symbol keys since doing the JSON rountrip will change them all to string keys. This fix will do the JSON round trip for the job argument beforehand so Sidekiq won't complain it (and break it starting version 7.0)

Sidekiq 6.4 has started to log warnings (sidekiq/sidekiq#5071) if the job arguments are not strictly JSON-safe i.e. serializing to JSON and deserializing from JSON yield consistent outputs. To what it means, we cannot use a hash with symbol keys since doing the JSON rountrip will change them all to string keys. This fix will transform the hash argument with symbol keys to string keys so Sidekiq won't complain it (and break it starting version 7.0).

silverdr · 2022-12-02T13:58:04Z

It's a year old thread but maybe somebody could shed a bit of light on this for me as this seems strictly related?

WARN: Job arguments to MyModule::Jobs::MyJob do not serialize to JSON safely. This will raise an error in Sidekiq 7.0. See https://github.com/mperham/sidekiq/wiki/Best-Practices or raise an error today by calling `Sidekiq.strict_args!` during Sidekiq initialization.

[2022-12-02T13:24:06.684+00:00] INFO: Enqueued MyModule::Jobs::MyJob (Job ID: 652e4346-9e09-4111-8795-10acf473040c) to Sidekiq(default) with arguments: {"object_id"=>1, "ip_address"=>"44.44.44.44"}

I pass a simple, two k/v pairs Hash with string keys. Nothing fancy, no symbols or complex objects. The above message confirms getting the hash in question. Even worse: I also get the same warning even if I positionally pass simple primitives (integer and string). What am I doing wrong here?

Info says the version is: Sidekiq 6.5.7

bdewater-thatch · 2023-01-14T00:16:35Z

@silverdr I ran into something similar that looked odd initially, turns out I was serializing a BigDecimal instance as a value when it looked like a string. Open a Rails console, set a breakpoint (ruby/debug or byebug: break Sidekiq::JobUtil#json_safe?), enqueue the job and inspect the actual objects classes there.

kellysutton added 5 commits November 29, 2021 14:22

Add the outline of failing tests

18eb9ac

Implement argument checking

0152412

Move argument checking into Sidekiq::JobUtil#validate

bc41eec

Refactor acceptable class definition into a constant to cut down on a…

7e258b7

…rray allocations

Improve error message, match raise call formatting to other errors in…

fbe4884

… the class

Address feedback in the Pull Request to use the JSON round-trip method

5c63649

of confirming the safety of job argument payloads. Cleanup commented-out code from a few years back. Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

kellysutton and others added 3 commits November 30, 2021 12:32

Swap out JSON.load for JSON.parse per the security CI check

8512cb0

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

Add a few more tests cases to build up confidence around our JSON.par…

9b4ebc1

…se/dump approach and deep structures Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

Improve test case description

29556fe

kellysutton added 6 commits November 30, 2021 15:59

Warn when job arguments do not serialize safely and point folks towar…

279440f

…d how to enable strict_mode and the best practice

Reconfigure the options-hash based approach to a global Sidekiq.stric…

9c6d075

…t_mode! method

Add a note in the raised error on how to disable the error

13b7b77

Let the error message breathe a little bit

7639a5e

Toggle strict_mode! off to suss out a test flake

ef45a30

Capitalize the start of a sentence

d784272

mperham reviewed Dec 2, 2021

View reviewed changes

lib/sidekiq/job_util.rb Outdated Show resolved Hide resolved

test/test_client.rb Outdated Show resolved Hide resolved

kellysutton and others added 4 commits December 2, 2021 09:55

Rename job_is_json_safe to json_safe?

7010bbc

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

Refactor a few tests to test a single argument at a time

417d656

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

Break out test cases to exercise each individual intersting case inst…

c5b377c

…ead of all at once Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

Change formatting to be more consistent, tighter when arguments are s…

f13214c

…imple Co-authored-by: Eda Zhou <eda.zhou@gusto.com>

cwrw mentioned this pull request Feb 1, 2022

Disable strict_args warning in Sidekiq DFE-Digital/apply-for-teacher-training#6471

Closed

5 tasks

QQism mentioned this pull request Feb 13, 2022

chore: resolve sidekiq json-unsafe warning RailsEventStore/rails_event_store#1272

Merged

ndbroadbent mentioned this pull request Mar 22, 2022

I think that the json_safe? method should return true for symbols, to avoid the "Job arguments to **** must be native JSON types" error #5252

Closed

bdewater-thatch mentioned this pull request Jan 14, 2023

Stringify money BigDecimal value in Active Job serializer Shopify/money#253

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement strict argument checking #5071

Implement strict argument checking #5071

kellysutton commented Nov 29, 2021 •

edited

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021

kellysutton commented Nov 30, 2021

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021 •

edited

kellysutton commented Dec 1, 2021

mperham left a comment

gingerlime commented Jan 28, 2022 •

edited

mperham commented Jan 28, 2022

gingerlime commented Jan 29, 2022

NemyaNation commented Feb 4, 2022

mperham commented Feb 4, 2022

NemyaNation commented Feb 4, 2022 •

edited

tenpaiyomi commented Feb 9, 2022

mperham commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022

mperham commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022

mperham commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022 •

edited

tenpaiyomi commented Feb 9, 2022

kellysutton commented Feb 9, 2022 •

edited

tenpaiyomi commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022 •

edited

silverdr commented Dec 2, 2022

bdewater-thatch commented Jan 14, 2023 •

edited

Implement strict argument checking #5071

Implement strict argument checking #5071

Conversation

kellysutton commented Nov 29, 2021 • edited

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021

kellysutton commented Nov 30, 2021

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021

mperham commented Nov 30, 2021

kellysutton commented Nov 30, 2021 • edited

kellysutton commented Dec 1, 2021

mperham left a comment

Choose a reason for hiding this comment

gingerlime commented Jan 28, 2022 • edited

mperham commented Jan 28, 2022

gingerlime commented Jan 29, 2022

NemyaNation commented Feb 4, 2022

mperham commented Feb 4, 2022

NemyaNation commented Feb 4, 2022 • edited

tenpaiyomi commented Feb 9, 2022

mperham commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022

mperham commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022

mperham commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022 • edited

tenpaiyomi commented Feb 9, 2022

kellysutton commented Feb 9, 2022 • edited

tenpaiyomi commented Feb 9, 2022

tenpaiyomi commented Feb 9, 2022 • edited

silverdr commented Dec 2, 2022

bdewater-thatch commented Jan 14, 2023 • edited

kellysutton commented Nov 29, 2021 •

edited

kellysutton commented Nov 30, 2021 •

edited

gingerlime commented Jan 28, 2022 •

edited

NemyaNation commented Feb 4, 2022 •

edited

tenpaiyomi commented Feb 9, 2022 •

edited

kellysutton commented Feb 9, 2022 •

edited

tenpaiyomi commented Feb 9, 2022 •

edited

bdewater-thatch commented Jan 14, 2023 •

edited