Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement strict argument checking #5071

Merged
merged 20 commits into from Dec 7, 2021

Conversation

kellysutton
Copy link
Contributor

@kellysutton kellysutton commented Nov 29, 2021

This PR implements a runtime type check for arguments as a follow-up from the discussion here: #5070. The idea is to provide some safeguards in non-production environments for folks about to do something dangerous (i.e. use arguments in a job that cannot (de)serialize reliably to/from JSON).

Open Questions:

Notes:

@mperham
Copy link
Collaborator

mperham commented Nov 30, 2021

validate is the right method.

You're missing Hash, Array and any notion of recursive checking. I found it simpler to use a real JSON roundtrip i.e. the check that's commented out. I think that combined with your global flag is a better solution.

of confirming the safety of job argument payloads.

Cleanup commented-out code from a few years back.

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>
@kellysutton
Copy link
Contributor Author

Sounds good. We've updated the PR to address the feedback!

@mperham
Copy link
Collaborator

mperham commented Nov 30, 2021

I'm worried about the equality check. Does that correctly handle deep elements, e.g. an Array of Symbols?

@kellysutton
Copy link
Contributor Author

Let's find out. We'll write a few more interesting test cases.

kellysutton and others added 3 commits November 30, 2021 12:32
Co-authored-by: Eda Zhou <eda.zhou@gusto.com>
…se/dump approach and deep structures

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>
@kellysutton
Copy link
Contributor Author

Test cases improved. It looks like the JSON.parse/dump approach works to prevent things like symbols either as values or keys be serialized, while letting JSON-friendly hashes/arrays pass through.

@mperham
Copy link
Collaborator

mperham commented Nov 30, 2021

Ok, now:

  1. when should it be enabled by default? if env == 'development'?
  2. should we provide an API for it? Sidekiq.strict_mode(true/false)?
  3. should the raised error note how to enable/disable it?
  4. where do we document this, if at all, in the wiki?

@kellysutton
Copy link
Contributor Author

  1. when should it be enabled by default? if env == 'development'?

My thinking: It should be configurable in 6.x but maybe become the default in env == 'development' starting in 7.x. Maybe with a deprecation warning closer to the release of 7.x to smooth the upgrade path. How does that sound? Too complicated?

  1. should we provide an API for it? Sidekiq.strict_mode(true/false)?

Seems like it could be a nice way of simplifying the API, but I can go either way. My assumption here is that calling Sidekiq.strict_mode! would set a value in the options hash. Happy to get that whipped up for whichever way you think is best.

  1. should the raised error note how to enable/disable it?

Yep. I'll push a commit to include a note about that once a decision is made for (2).

  1. where do we document this, if at all, in the wiki?

For sure in the Best Practices section about simple arguments. If we go with the 6.x/7.x split in (1), the Best Practice can read something like:

If you want this best practice enforced, you can use Sidekiq.strict_mode! in the environments where you'd like to raise an error.

If we make it the default in 7.x, that message will need to change a bit.

@mperham
Copy link
Collaborator

mperham commented Nov 30, 2021

Maybe a WARN level log in 6.4+ and raise in 7+?

We're really building a hyper-specialized linter here. Is this better off as a Rubocop cop? What do you think about warning about larger argument sizes, e.g. greater than 16kb?

@kellysutton
Copy link
Contributor Author

kellysutton commented Nov 30, 2021

Maybe a WARN level log in 6.4+ and raise in 7+?

Sounds good. I'll make it so.

We're really building a hyper-specialized linter here. Is this better off as a Rubocop cop? What do you think about warning about larger argument sizes, e.g. greater than 16kb?

The check we're doing is a bit more of a runtime check (whereas I think of linters as static checks), but I think the ideal place for this type of check is statically. I believe Sorbet/RBS definitions would be the best place to enforce this type of thing, since they could do so without actually running the code. Probably something to figure out for the future, though, as the community stabilizes around a type system.

What do you think about warning about larger argument sizes, e.g. greater than 16kb?

Absolutely. I was going to propose that in a future PR :) Happy to roll it into this one if that makes more sense.

For now, I'll get things tidied up based on the conversation so far.

@kellysutton
Copy link
Contributor Author

Alright, ready for another review at your leisure. Let me know if you'd like me to squish the commits on this PR, since we've accumulated quite a few.

Copy link
Collaborator

@mperham mperham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. Do we need a way for the user to disable the check completely (to restore current behavior?). Consider if they have an app which breaks the best practices but their code handles it.

lib/sidekiq/job_util.rb Outdated Show resolved Hide resolved
test/test_client.rb Outdated Show resolved Hide resolved
kellysutton and others added 4 commits December 2, 2021 09:55
Co-authored-by: Eda Zhou <eda.zhou@gusto.com>
Co-authored-by: Eda Zhou <eda.zhou@gusto.com>
…ead of all at once

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>
…imple

Co-authored-by: Eda Zhou <eda.zhou@gusto.com>
@gingerlime
Copy link

gingerlime commented Jan 28, 2022

Perhaps we're panicking for no reason and the option to disable it will stay available? The Changelog is a bit confusing for me...

Sidekiq will now log a warning if JSON-unsafe arguments are passed to perform_async.
Add Sidekiq.strict_args!(false) to your initializer to disable this warning.
This warning will switch to an exception in Sidekiq 7.0.

Would strict_args!(false) continue to be available in Sidekiq 7.0 ? Or would the switch from warning to an exception also mean that we won't be able to turn this off?

@mperham
Copy link
Collaborator

mperham commented Jan 28, 2022

It will remain available for backwards compatibility reasons.

@gingerlime
Copy link

Thanks @mperham. That's very reassuring. 👍

@NemyaNation
Copy link

Am I missing something or am I in the wrong section:

warning in logs:

app[web] info redis.pipelined do
app[web] info   redis.get("key")
app[web] info end
app[web] info
app[web] info should be replaced by
app[web] info
app[web] info redis.pipelined do |pipeline|
app[web] info   pipeline.get("key")
app[web] info end
app[web] info
app[web] info (called from /app/vendor/bundle/ruby/2.7.0/gems/sidekiq-6.4.0/lib/sidekiq/api.rb:57:in `block in fetch_stats_fast!'}
Feb 4 00:09:38 my-app app[web] info Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0.

initializers/sidekiq.rb

Sidekiq.configure_server do |config|
	config.redis = {
		url: ENV["REDISCLOUD_URL"]
	}
end

Sidekiq.configure_client do |config|
	config.redis = {
		url: ENV["REDISCLOUD_URL"],
		size: 2
	}
end

Sidekiq.strict_args!(false)

Still clogging up my logs, and this is just from live polling on the dashboard.

@mperham
Copy link
Collaborator

mperham commented Feb 4, 2022

@NemyaNation Why are you commenting on this issue?

@NemyaNation
Copy link

NemyaNation commented Feb 4, 2022

@mperham Sorry just realised I commented on the wrong issue.

Shall I create a new issue or is there a simple fix other than downgrading a level.

#5178

@tenpaiyomi
Copy link

I'm kind of with @bnorton and @gingerlime on this. I can understand the necessity of wanting to ensure users don't send along a whole Ruby object, but the strict enforcement on hashes feels kind of hamfisted. As the change exists currently, I have to go through and change hundreds of lines of code, because a perform function like this:

def perform(user_id, user_event, properties = {})

that is called like this

SomeWorker.perform_async(some_id, some_string, { foo: 'string', bar: 'string' })

is required to be set up like this

SomeWorker.perform_async(some_id, some_string, { 'foo' => 'string', 'bar' => 'string' })

Even though the output of the hash in both cases with JSON.dump is identical,

irb(main):001:0> JSON.dump({foo: 'string', bar: 'string'})
=> "{\"foo\":\"string\",\"bar\":\"string\"}"
irb(main):002:0> JSON.dump({'foo' => 'string', 'bar' => 'string'})
=> "{\"foo\":\"string\",\"bar\":\"string\"}"

But the latter format is no longer Ruby idiomatic and is largely discouraged over utilizing the symbol: value format.

It feels like the json_safe? method is too simplistic. I can understand the check with regards to some things (Date objects, for instance), but other things should be treated a bit more logically, IMO. Some whitelisted basic classes, and adding a recursive check, seems like the more logical approach that doesn't force potentially old style conventions.

@mperham
Copy link
Collaborator

mperham commented Feb 9, 2022

No one is forcing anyone to use the arg checking. You can use Sidekiq.strict_args!(false) in your initializer to get the old behavior.

@tenpaiyomi
Copy link

Right, I'm aware of the initializer flag that can be set, but that still leaves the fact that the argument checking will probably be somewhat confusing to individuals who are passing valid JSON serializable arguments, that the wiki says are acceptable, and getting errors about them. Likewise, that is negating the protection of potentially passing actual invalid objects as well.

@mperham
Copy link
Collaborator

mperham commented Feb 9, 2022

If the wiki says they are acceptable, let's fix the wiki. Symbols are not valid JSON.

@tenpaiyomi
Copy link

Nor are the hash rockets either. The thing is in this case is not what is valid in JSON, the problem is what is being expected to consider the object valid.

Not valid JSON, not accepted:
{foo: 'bar'}

Not valid JSON, accepted:
{'foo' => 'bar'}

Neither of these are valid JSON objects. If we were going on only accepting valid JSON objects, then only a variant such as this should pass

"{\"foo\": \"bar\"}"

irb(main):020:0> args = "{'foo': 'bar'}"
=> "{'foo': 'bar'}"
irb(main):021:0> JSON.parse(JSON.dump(args)) == args
=> true
irb(main):022:0> args = {'foo' => 'bar'}
=> {"foo"=>"bar"}
irb(main):023:0> JSON.parse(JSON.dump(args)) == args
=> true
irb(main):024:0> args = {'foo':  'bar'}
=> {:foo=>"bar"}
irb(main):025:0> JSON.parse(JSON.dump(args)) == args
=> false

@mperham
Copy link
Collaborator

mperham commented Feb 9, 2022

Let's be clear here:

perform_async is not a normal Ruby method call. It is an helper for constructing a background job which maps onto a Ruby method call "somewhere else". The wire format for this "remote procedure call" is JSON. We are trying to make that mapping between wire format and the Ruby method call mapping safer by catching unsafe conversions to/from Ruby/JSON. {'foo' => 'bar'} is perfectly legal because it maps safely to JSON.

@tenpaiyomi
Copy link

tenpaiyomi commented Feb 9, 2022

Yeah, I get that, but so does {foo: 'bar'}

irb(main):028:0> {'foo' => 'bar'}.to_json
=> "{\"foo\":\"bar\"}"
irb(main):029:0> {foo: 'bar'}.to_json
=> "{\"foo\":\"bar\"}"

There is no scenario I'm aware of where {'foo' => 'bar'} should not be treated synonymous with {foo: 'bar'}. Even the JSON.parse method has an option to specifically map back to the modern idiomatic format, which would make all instances of {'foo' => 'bar'} be considered invalid with the current json_safe? method

irb(main):046:0> args = {'foo' => 'bar'}
=> {"foo"=>"bar"}
irb(main):047:0> JSON.parse(JSON.dump(args)) == args
=> true
irb(main):048:0> JSON.parse(JSON.dump(args), symbolize_names: true) == args
=> false
irb(main):049:0> args = {foo:  'bar'}
=> {:foo=>"bar"}
irb(main):050:0> JSON.parse(JSON.dump(args)) == args
=> false
irb(main):051:0> JSON.parse(JSON.dump(args), symbolize_names: true) == args
=> true

@tenpaiyomi
Copy link

Just postulating, but would it make sense perhaps to check the validity of parse against both iterations (with and without symbolize_names) to handle both cases?

json_args = JSON.dump(args)
JSON.parse(json_args) == args || JSON.parse(json_args, symbolize_names: true) == args

@kellysutton
Copy link
Contributor Author

kellysutton commented Feb 9, 2022

The symbolized keys have bitten us a few times before, which is why we introduced this check. While hashes with keys for symbols serializes safely to JSON, it does not deserialize to the same value. Here's a code snippet to illustrate this:

class MyWorker
  include Sidekiq::Worker

  def perform(hash)
    id = hash[:id] # will never exist, but hash['id'] will!
    # ...
  end
end

MyWorker.perform_async({ id: 123 }) #=> becomes the JSON "{ \"id\": 123 }"

@tenpaiyomi
Copy link

@kellysutton yeah, I can certainly understand that scenario. Unfortunately that is one of the things where symbolize_names on the receiving would resolve that issue, but then it would have the same problem of hash['id'] being nil, and that is not something I'm really advocating to change as it doesn't make sense in this case.

I think it makes sense to say "your received arguments will be always string keyed hashes", but I'm not sure it makes sense to say "you have to adhered to this explicit format for sending arguments," especially when, as noted above, the format doesn't actually dictate the validity of the object serialization, instead just being an arbitrary way that just happens to match how the JSON.dump options are set up. On top of that, as was noted by another commenter, ensuring that all hashes match this explicit arbitrary format is far more difficult than ensuring that all usages in perform just make use of hash['id'] and similar.

@tenpaiyomi
Copy link

tenpaiyomi commented Feb 9, 2022

Also, I just want to note that I'm not trying to argue this point out of a sense of stubbornness or anything of the sort. I'm advocating for this perspective because I do feel like this change will cause an undue burden and confusion on the end user, especially with the change several ruby versions back that the hash rocket was the less favored way of generating hashes over the key: value format, and so I feel like it's an important topic to actively discuss the different perspectives of to ensure the best user experience possible. I do greatly appreciate the back and forth dialog.

jesseplusplus pushed a commit to jesseplusplus/decodon that referenced this pull request Feb 10, 2022
* Fix Sidekiq warnings about JSON serialization

This occurs on every symbol argument we pass, and every symbol key in hashes,
because Sidekiq expects strings instead.

See sidekiq/sidekiq#5071

We do not need to change how workers parse their arguments because this has
not changed and we were already converting to symbols adequately or using
`with_indifferent_access`.

* Set Sidekiq to raise on unsafe arguments in test mode

In order to more easily catch issues that would produce warnings in production
code.
QQism added a commit to QQism/rails_event_store that referenced this pull request Feb 13, 2022
Sidekiq 6.4 has started to log warnings
(sidekiq/sidekiq#5071) if the job arguments are
not strictly JSON-safe i.e. serializing to JSON and deserializing from
JSON yield consistent outputs. To what it means, we cannot use a hash
with symbol keys since doing the JSON rountrip will change them all to
string keys.

This fix will do the JSON round trip for the job arguments beforehand so
Sidekiq won't complain it (and break it starting version 7.0)
QQism added a commit to QQism/rails_event_store that referenced this pull request Feb 13, 2022
Sidekiq 6.4 has started to log warnings
(sidekiq/sidekiq#5071) if the job arguments are
not strictly JSON-safe i.e. serializing to JSON and deserializing from
JSON yield consistent outputs. To what it means, we cannot use a hash
with symbol keys since doing the JSON rountrip will change them all to
string keys.

This fix will do the JSON round trip for the job argument beforehand so
Sidekiq won't complain it (and break it starting version 7.0)
QQism added a commit to QQism/rails_event_store that referenced this pull request Mar 5, 2022
Sidekiq 6.4 has started to log warnings
(sidekiq/sidekiq#5071) if the job arguments are
not strictly JSON-safe i.e. serializing to JSON and deserializing from
JSON yield consistent outputs. To what it means, we cannot use a hash
with symbol keys since doing the JSON rountrip will change them all to
string keys.

This fix will transform the hash argument with symbol keys to string
keys so Sidekiq won't complain it (and break it starting version 7.0).
porbas pushed a commit to RailsEventStore/rails_event_store that referenced this pull request Mar 5, 2022
Sidekiq 6.4 has started to log warnings
(sidekiq/sidekiq#5071) if the job arguments are
not strictly JSON-safe i.e. serializing to JSON and deserializing from
JSON yield consistent outputs. To what it means, we cannot use a hash
with symbol keys since doing the JSON rountrip will change them all to
string keys.

This fix will transform the hash argument with symbol keys to string
keys so Sidekiq won't complain it (and break it starting version 7.0).
@silverdr
Copy link

silverdr commented Dec 2, 2022

It's a year old thread but maybe somebody could shed a bit of light on this for me as this seems strictly related?

WARN: Job arguments to MyModule::Jobs::MyJob do not serialize to JSON safely. This will raise an error in Sidekiq 7.0. See https://github.com/mperham/sidekiq/wiki/Best-Practices or raise an error today by calling `Sidekiq.strict_args!` during Sidekiq initialization.

[2022-12-02T13:24:06.684+00:00] INFO: Enqueued MyModule::Jobs::MyJob (Job ID: 652e4346-9e09-4111-8795-10acf473040c) to Sidekiq(default) with arguments: {"object_id"=>1, "ip_address"=>"44.44.44.44"}

I pass a simple, two k/v pairs Hash with string keys. Nothing fancy, no symbols or complex objects. The above message confirms getting the hash in question. Even worse: I also get the same warning even if I positionally pass simple primitives (integer and string). What am I doing wrong here?

Info says the version is: Sidekiq 6.5.7

@bdewater-thatch
Copy link

bdewater-thatch commented Jan 14, 2023

@silverdr I ran into something similar that looked odd initially, turns out I was serializing a BigDecimal instance as a value when it looked like a string. Open a Rails console, set a breakpoint (ruby/debug or byebug: break Sidekiq::JobUtil#json_safe?), enqueue the job and inspect the actual objects classes there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants