Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change nested query parsing behavior of "foo[]" from {"foo" => [nil]} to {"foo" => []} #1696

Closed
PragTob opened this issue Jul 20, 2020 · 38 comments 路 Fixed by #1699
Closed

Change nested query parsing behavior of "foo[]" from {"foo" => [nil]} to {"foo" => []} #1696

PragTob opened this issue Jul 20, 2020 · 38 comments 路 Fixed by #1699

Comments

@PragTob
Copy link

PragTob commented Jul 20, 2020

Hello lovely folks,

first and foremost thanks for your excellent work on this project practically the entire ruby webdev eco system relies on 馃挌 馃帀 馃殌

Problem

I've come here today to propose a change in the parsing behavior of the query string param[].

There is a spec asserting the current behavior so I assume it's intentional

rack/test/spec_utils.rb

Lines 164 to 165 in 649c72b

Rack::Utils.parse_nested_query("foo[]").
must_equal "foo" => [nil]

I believe/hope the more correct test would be:

    Rack::Utils.parse_nested_query("foo[]").
      must_equal "foo" => []

I base this on three things:

  • we otherwise have no way to indicate an empty array and an array with nil doesn't seem particularly useful (or intuitive) vs. an empty array
  • the introduction was 12 years ago ( 6e330bd ) and might be a side effect of the implementation due to the splitting behavior exhibited here:
    unless qs.nil? || qs.empty?
    (qs || '').split(d ? (COMMON_SEP[d] || /[#{d}] */n) : DEFAULT_SEP).each do |p|
    k, v = p.split('=', 2).map! { |s| unescape(s) }
    normalize_params(params, k, v, param_depth_limit)
    end
  • the popular HTTP client library faraday (which her and many projects rely upon) changed it's way of serializing empty arrays to the query string above #800 correct handling parameter value empty array聽lostisland/faraday#801

Especially due to the last point this creates a sort of dissonance in the Ruby eco system where I send an empty array from the one side and it arrives as [nil] on the server side. Of course, the fix might also be on the faraday side.

I have not yet checked the HTTP specs or anything about this to see what's "officially correct"

To have complete traceability, I came here via ruby-grape/grape#2079 :)

Implementation

I'm not sure if it's the complete implementation but it seems to be easily fixable in

elsif after == "[]"
params[k] ||= []
raise ParameterTypeError, "expected Array (got #{params[k].class.name}) for param `#{k}'" unless params[k].is_a?(Array)
params[k] << v

all it needs is unless v.nil?:

      elsif after == "[]"
        params[k] ||= []
        raise ParameterTypeError, "expected Array (got #{params[k].class.name}) for param `#{k}'" unless params[k].is_a?(Array)
        params[k] << v unless v.nil?

All tests (with exception of the one mentioned above) pass with this change.

(I have the code, I found the old spec while implementing this/my spec was elsewhere)

Happy to make a PR should it be wanted!

Thanks for all your work! 馃帀 馃殌

@jeremyevans
Copy link
Contributor

Can you get a browser to submit that via a form submission? As far as I'm aware, browsers will always include an equal sign, even if the right hand side would be empty. Looking at RFC 1866, it appears that a query string of foo[] is invalid. A valid query string would be foo[]=, which Rack will parse as {"foo"=>[""]}. If we change Rack's behavior in any way in this area, it should be to raise an exception for foo[], or other cases where there is no equal sign. That's my opinion, let's see what other committers think.

@PragTob
Copy link
Author

PragTob commented Jul 20, 2020

Hi @jeremyevans 馃憢

Not sure you can get a form to submit like this, I haven't tried though. I looked at RFC3986 and it's... interesting.

So looking at the ABNF for URIs it seems valid that key can go without value:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   query         = *( pchar / "/" / "?" )
   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   pct-encoded   = "%" HEXDIG HEXDIG
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

However, [ and ] seem invalid 馃し (hope I didn't miss it) That's clearly not our lived reality though.

edit: Jeremy corrected me that of course they are escaped and hence valid

The RFC also says it's frequently used as key value, but not always:

However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, it is sometimes better for usability to avoid percent-
encoding those characters.

Seems to me that this way [] is used is more of a thing of how web frameworks frequently use it more than an actual standard. Or I haven't found the standard yet, or it's just a recommendation...

@jeremyevans
Copy link
Contributor

From a URL perspective, the query portion is opened ended. That's because different URL schemas could have different query formats. However, for HTTP, RFC 1866 applies, as the query for HTTP should be in application/x-www-form-urlencoded format. See https://tools.ietf.org/html/rfc1866#section-8.2.

Rack's specific handling of [] is not part of any standard. However, that doesn't change that foo[] (without the =) appears to not be valid format for application/x-www-form-urlencoded.

@jeremyevans
Copy link
Contributor

Note that [] in the query part is supposed to be escaped in order to be valid according to the newer URL RFC (3986). So when submitted, the query should be submitted as: foo%5b%5d=. Rack will decode that to foo[]= before attempting to parse it. Rack and most webservers will accept unescaped [] in query, even though it isn't strictly valid. ruby-core even rejected a bug report to URI escape [] (https://bugs.ruby-lang.org/issues/9990), as using the unescaped version is not an issue in practice.

@ioquatix
Copy link
Member

The problem with escaping [] is that it's pedantic and makes URLS super ugly. So I agree with Rails. The only thing I could add is that if foo[] is currently invalid, making it do the behaviour as outlined {foo: []} shouldn't be a breaking change.

So the only question is what is the use case and how did you create such a query string?

@ioquatix
Copy link
Member

I'm okay with this change generally, but can you make a PR so we can review?

@jeremyevans
Copy link
Contributor

@ioquatix can you describe briefly why are in favor of this change, considering that foo[] is not a valid query string? IMO, this is a bug introduced in faraday, and attempting to support it just encourages the bug to spread to more libraries.

@ioquatix
Copy link
Member

ioquatix commented Jul 20, 2020

  • It shouldn't break existing code if it was previously undefined. However, now that I think about it, code could potentially depend on the currently unspecified behaviour.
  • Query parsing is poorly defined so I'm not at odds with being more robust and having sensible interpretations of otherwise ambiguous strings.
  • Throwing an exception could be a worse option.

Regarding this particular behaviour, if it's a bug in faraday, I'm less inclined to change it. But I don't think it's an unreasonable request. So, I want to see an actual PR so we can see what specs need to be changed and the potential impact. This is on the OP to deliver.

Regarding consistency, PHP behaves similar as Rack as implemented:

php > parse_str("foo[]", $result);
php > var_dump($result);
array(1) {
  ["foo"]=>
  array(1) {
    [0]=>
    string(0) ""
  }
}

The difference is PHP sets the variable to an empty string, so a lack of = is the same as =.

@PragTob can you survey other implementations to see what is the normal behaviour? Can you also comment on whether this is a bug with Faraday?

cc @olleolleolle

@jeremyevans
Copy link
Contributor

I'm not totally opposed to foo[] being the same as foo[]=. The fact that it isn't currently the same is an implementation detail in my mind. I still think a warning in the next version followed by an exception in future versions is the better way to handle this.

As far as I can tell from the faraday issue, they chose this path because it appeared to work with Rails. I'm not sure whether the Rails behavior is intentional or just an implementation detail in Rails.

There's a big problem with accepting invalid input. Invalid input becomes cancerous. If a server accepts invalid input, clients will send invalid input considering it valid, and then try to use the invalid input with other servers, and complain to other servers that don't accept it. That's basically what is happening here.

See https://tools.ietf.org/id/draft-thomson-postel-was-wrong-03.html and https://news.ycombinator.com/item?id=9824638.

@ioquatix
Copy link
Member

I agree that we shouldn't accept invalid input. But the RFC is not clear that this is invalid input? If we can confirm it's invalid according to the RFC, I would follow that.

Does the query parser throw exceptions for any other kind of input?

@jeremyevans
Copy link
Contributor

@ioquatix https://tools.ietf.org/html/rfc1866#section-8.2 describes the application/x-www-form-urlencoded format. Section 8.2.1.2 states: The fields are listed in the order they appear in the document with the name separated from the value by '='. It does not appear to support key/value entries without an = separating the key and value. If a value is null (there is no value), then the whole pair should be omitted.

Yes, the parser already raises exceptions in some cases:

ruby -r rack -e 'p Rack::Utils.parse_nested_query("foo[bar]=&foo[]=")'
lib/rack/query_parser.rb:104:in `normalize_params': expected Array (got Rack::QueryParser::Params) for param `foo' (Rack::QueryParser::ParameterTypeError)

@jeremyevans
Copy link
Contributor

It does appear the Rails behavior is by design: https://github.com/rails/rails/blob/master/actionpack/test/dispatch/request/query_string_parsing_test.rb#L96-L103

Fun fact, the stripping of nils from parameter arrays was introduced in Rails as a security fix for a bug in ActiveRecord: rails/rails@060c91c

This was later changed to just remove nils from arrays, leaving the array empty, and only because ActiveRecord had been fixed to handle things correctly: rails/rails@8f8ccb9

@PragTob
Copy link
Author

PragTob commented Jul 21, 2020

馃憢

Hey thanks for all the input and thanks Jeremy for the corrections to my erroneous view of the RFC 馃挌

I can look at some implementations in other languages and see what they do sure :)

I can provide a PR, I already had/have the code - it only changes 2 lines and has everything passing so it's easy.

As for Faraday, it was a bugfix for lostisland/faraday#800 which itself was basically a bug fix for a json_api_client issue: JsonApiClient/json_api_client#278 - so it was definitely an intended bugfix. I'll check the JSON API spec to see if I can quickly find if they consider that valid or if it was just a wrong interpretation.

as an aside, I'd like it if there was a way to represent an empty array vs. array with an empty string. Standard conformity of course trumps how I feel though 馃槄

Also, thanks for the quick feedback everyone! 馃殌

PragTob added a commit to PragTob/rack that referenced this issue Jul 21, 2020
Pending discussion in rack#1696 and if merged also fixes rack#1696
PragTob added a commit to PragTob/rack that referenced this issue Jul 21, 2020
Pending discussion in rack#1696 and if merged also fixes rack#1696
@PragTob
Copy link
Author

PragTob commented Jul 21, 2020

PR sits at #1697 - I lied it's 3 lines that need changing one assertion was in the same test after the one I thought was the only test needed changing.

@PragTob
Copy link
Author

PragTob commented Jul 21, 2020

Prasing/Deserialization

JavaScript - qs

> var qs = require('qs');
undefined
> qs.parse('foo[]');
{ foo: [ '' ] }

JavaScript = query-string

> const queryString = require('query-string');
undefined
> queryString.parse('foo[]', {arrayFormat: 'bracket'})
[Object: null prototype] { foo: [ null ] }
> queryString.parse('foo[]=', {arrayFormat: 'bracket'})
[Object: null prototype] { foo: [ '' ] }

Elixir - stdlib

iex(1)> URI.decode_query("foo[]")
%{"foo[]" => nil}
iex(2)> URI.decode_query("foo[]=")
%{"foo[]" => ""}

Elixir - plug rack equivalent

iex(2)>  Plug.Conn.Query.decode("foo[]")
%{"foo" => []}
iex(3)>  Plug.Conn.Query.decode("foo[]=")
%{"foo" => [""]}

Python - stdlib

Python 3.8.4 (default, Jul 21 2020, 09:25:23) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urlparse
>>> from urllib.parse import parse_qs
>>> parse_qs("foo[]")
{}
>>> parse_qs("foo[]=")
{}
>>> parse_qs("foo[]=1")
{'foo[]': ['1']}

Python - querystring-parser

>>> from querystring_parser import parser
>>> parser.parse("foo[]")
Traceback (most recent call last):
  File "/home/tobi/.asdf/installs/python/3.8.4/lib/python3.8/site-packages/querystring_parser/parser.py", line 137, in parse
    (var, val) = element.split("=")
ValueError: not enough values to unpack (expected 2, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tobi/.asdf/installs/python/3.8.4/lib/python3.8/site-packages/querystring_parser/parser.py", line 146, in parse
    raise MalformedQueryStringError
querystring_parser.parser.MalformedQueryStringError
>>> parser.parse("foo[]=")
{'foo': {'': ''}}
>>> parser.parse("foo[]=1")
{'foo': {'': 1}}

From the surveyed implementations, only plug from elixir parses it as this issue suggests. Behavior is generally all over the place.

Serialization

It seemed fun/interesting to check what some implementations think how it should be serialized (aka is faraday doing the right thing)

Elixir - Plug

iex(1)> Plug.Conn.Query.encode(%{foo: []})
""
iex(3)> Plug.Conn.Query.encode(%{foo: [""]}) 
"foo[]="

JavaScript - jQuery

$.param({foo: []})
""
$.param({foo: [""]})
"foo%5B%5D="

JavaScript - qs

> var qs = require('qs');
undefined
> qs.stringify({foo: []})
''
> qs.stringify({foo: ['']})
'foo%5B0%5D='

The surveyed serializers all seem to serialize it as empty string suggesting that the old faraday behavior was correct and the new behavior might be a bug.


Any big library we should check? I didn't find a good Java answer but my Java package days are also way behind me. I'd be interested to now what Apache thinks/does but I'm not sure how to get to that if Apache itself even cares about that :)

@iMacTia
Copy link

iMacTia commented Jul 21, 2020

Thanks everyone and @PragTob in particular for the discussion around this.
I'd just like to voice my view as a Faraday maintainer and subscribe to this for future updates.
I personally agree with @PragTob on this one:

I'd like it if there was a way to represent an empty array vs. array with an empty string.

My understanding is that this particular case is not strictly defined in the RFC spec (but I'll join your search for references to it) and existing libraries seem to converge towards the foo[] => foo: [nil].
However, when I reviewed and approved lostisland/faraday#801, I thought that actually foo[] => foo: [] makes more sense.

Without that change, parsing foo: [] from qs would otherwise be impossible, and I believed there are definitely more use-cases to provide an empty array to the server, compared to providing an array that contains only nil 馃槃
Interestingly, it seems like Elixir's Plug took our same approach, although serialisation is inconsistent for them, so that looks more like a bug...

That is simply my view of course, but I'd really hate is to rollback to an illogical implementation even though there's no official RFC asking for it, and simply because other libraries have adopted it over the years.
I'll gladly follow this thread and check how things evolve

@PragTob
Copy link
Author

PragTob commented Jul 21, 2020

Thanks @iMacTia for chiming in 馃憢

I skimmed the latest JSON API spec (as the original request to faraday came from a JSON API client bug) and couldn't find any defined usage of the foo[] query string syntax

@iMacTia
Copy link

iMacTia commented Jul 21, 2020

@PragTob I didn't have much time to go through it, but I found this one: https://jsonapi.org/format/#crud-updating-to-many-relationships (check 2nd example after "And the following request clears every tag for an article").

Now, I know this is a PATCH call and as such the "data": [] param should come from the body.
However rails and other modern web frameworks allow to provide parameters in different ways, including qs.

If we considered GET calls only, then I agree the foo[] syntax will hardly be useful, as arrays in qs are usually comma-separated values anyway. But then again we're back to the point where we don't really have any clear guideline on how to implement that.

@PragTob
Copy link
Author

PragTob commented Jul 21, 2020

@iMacTia yeah but as you say that should be in the body and it's also JSON so I don't think it's too important here. I had hoped that their filtering section or something had something about how index end points should react/what the format should be there (as those would commonly be query string parameters) but it's intentionally left open and without rules.

@PragTob
Copy link
Author

PragTob commented Jul 21, 2020

Alright, I might have found something!

I started at RFC1866 but that's obsolete, as it's for HTML 2 I went to the newest HTML standard I found which is 5.2. The form submission section eventually links to the URL standard application/x-www-form-urlencoded

First, I want everyone to read the introduction, because well... 馃ぃ :

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms

With that out of the way, it specifies an algorithm for parsing, which simplified is:

  • split input by &
  • Iterate over all elements
    • if there's nothing, continue
    • if there is = then name is before = value is after =
    • if there is no =, basically name is the string and value is empty string, or in their words (this is 3.3.):

Otherwise, let name have the value of bytes and let value be the empty byte sequence.

Which imo speaks for the correctly parsed value being {"foo" => [""]}... although that entire specification never talks about nesting aspect so no usage of foo[] or whatsoever. So technically it'd be {"foo[]" => ""}. I'm not sure if there's any specification anywhere that talks about the nested query parsing/serialization.

Maybe there is such a thing somewhere 馃し

@iMacTia
Copy link

iMacTia commented Jul 21, 2020

That introduction definitely made my day 馃槀

@jeremyevans
Copy link
Contributor

@PragTob Thank you for your research on this. With the updated standard, I agree with you that {"foo" => [""]} is the correct way to handle this. I'll submit a PR for that.

@matthewd
Copy link
Contributor

I think the current behaviour makes the most sense to me, as an extrapolation of general rules.

  • foo (no =) means { "foo" => nil } (this is indeed a departure from the RFC, which says it should be "", per above)
  • Each value of x[]= represents an element in an array (no RFC involved, but I think that's how we'd all describe the general x[]=1&x[]=2?)

Ergo, each value-less instance of foo[] represents a nil element in the array.

That being said, although it requires an explicit extra rule, the proposed change here (which is kinda upstreaming the Rails compact! behaviour) does seem like it's more useful. 馃

jeremyevans added a commit to jeremyevans/rack that referenced this issue Jul 21, 2020
The URL spec section 5.1.3.3 specifies that if = is not present
in the byte sequence, it should be treated as if the byte sequence
is the name of the tuple and the value is the empty string.

This affects all parameters without =, not just arrays:

```ruby
Rack::Utils.parse_nested_query("foo[bar]&baz[]&quux")
{"foo"=>{"bar"=>nil}, "baz"=>[nil], "quux"=>nil} # Before
{"foo"=>{"bar"=>""}, "baz"=>[""], "quux"=>""}    # After
```

Fixes rack#1696
@ioquatix
Copy link
Member

ioquatix commented Jul 21, 2020

@matthewd I like that you are thinking not just about what aligns with the RFC, but also what makes the most sense for downstream users.

Consider the following:

x
# {x: nil}

x=
# {x: ""}

x[]
# {x: [nil]}

x[]=
# {x: [""]} 

x[]=y
# {x: ["y"]}

The presence of = allows us to control whether it's nil or "" (empty string). Rack::Utils.build_query also reflects that logic:

> Rack::Utils.build_query({x: nil})
=> "x"
> Rack::Utils.build_query({x: ""})
=> "x="

When parsing data from a form, it feels like there should be some way to specify nil in an array. My question would then be, what data structures are we expecting to be able to round trip? Should we expect empty strings to become nil, or vice versa?

Whatever we do here, we should try to be consistent with (1) Published RFCs, (2) Rack::Utils.build_query, and (3) other software systems. We can't change (1) or (3) but we can modify (2).

For example from the OP,

> Rack::Utils.parse_nested_query("foo[]")
=> {"foo"=>[nil]}
> Rack::Utils.build_query({"foo"=>[nil]})
=> "foo"

At least this eventually reaches a fixed point. Based on my interpretation of all the above, foo[] should be a nil value in the array and foo[]= should be an empty string.

One way to support this behaviour without breaking existing systems and without forcing changes would be to make it configurable. i.e. add an option to parse_nested_query.

@PragTob
Copy link
Author

PragTob commented Jul 22, 2020

@ioquatix I was gonna write something similar but with a different conclusion 馃槄 I agree that ideally build_query should line up and a configuration option might be a good idea, although it could cause more divergence.

As in, is it really more important to have to be able to specify an array including nil vs. an empty array?

The way I see the options for foo[]:

[nil]

"keep it as is"
+ No breaking compatibility since nothing changes!
+ we can specify nils in arrays

- can't specify empty array

[]

"I want my empty array"
+ interoperable with both rails and faraday
+ we can specify empty arrays

~ You could read the standard of "Otherwise, let name have the value of bytes and let value be the empty byte sequence." as that the empty byte sequence should be the empty value and in the context of non specced nested query params it could be interpreted as empty array

- can't specify nil

[""]

"Let's take the closest interpretation of the standard"
+ arguably "most standard conform" as per behavior described in standard

- can't have nils in arrays & can't have empty arrays
- would probably need a config options as otherwise we'd break rails (unless someone freedom patched compact! to also throw away blank values and if someone did... please don't tell me ;P)

raise an error

"It's ambiguous what that means so rather not do it"
+ we decide it's too ambiguous and communicate that

- lots of breakage, but at least it'll be clearly communicated at the point where it's happening (v.s tracking an unknown value through the app)
- can't specify empty array nor nil


I probably missed points, let me know and I can update it :)

Also I'm this close to writing the WHATWG like "hey know this standard you called an 'aberrant monstrosity'? How about we extend it because lots of web frameworks seem to use it that way and no standard makes communication hard both within one language exo system and especially outside of it!"

@jeremyevans
Copy link
Contributor

Now that the updated spec considers the lack of = valid, I think raising an error is definitely out as an option. I think we either should be correct (use empty string in compliance with the standard, #1699) or compatible with Rack 2 (i.e. no change).

It is definitely possible to make this configurable, and I can see good arguments for that. If can prepare a PR for a configurable approach if we want to go that route. Assuming we do go that route, we still need to decide on a default, which I think should be correct or compatible with Rack 2.

@ioquatix
Copy link
Member

I have to say, despite all the great conversation that's happened here, I really find it frustrating when there is no clear correct path.

  • Breaking compatibility is bad.
  • Not conforming to specifications is bad.

Can we do both? Okay let's add an option to control it and move all the complexity to user code. It's also bad (for users).

My gut tells me we should follow the standard. That means breaking compatibility. But... it's Rack 3.0 - we are allowed.

Please give your opinion on this matter:

  1. Follow standard.
  2. Follow compatibility.
  3. Make user decide.

@matthewd
Copy link
Contributor

Setting the array behaviour aside for a moment, and considering the base foo -> { foo: "" } vs { foo: nil }:

IMO strict conformance is uninteresting here; it's a technicality that's hard to encounter (because it's not something browsers can generate), nil.to_s is already "", and my reading of the spec is that it normalizes foo and foo= because it's constrained to defining an algorithm that always produces a byte-sequence value (whereas we exist in a world where string-or-nil is, for better or worse, a commonplace distinction). Given those thoughts, it seems hard, to me, to make the case for breaking compatibility (and reducing the set of values that can be round-tripped).

Returning focus to this issue, though, I grow an additional objection: I don't see how it would help address the reported issue. AFAICS, changing to "" -- and by extension [""] -- does not improve anything for anyone currently struggling with the fact they get [nil] when they might prefer []. Thus, I particularly don't like the idea of introducing an incompatible change Just Because, where the only trigger is that we've noticed a change we could theoretically make, without fixing anyone's actual problem.

So, in conclusion: I think foo should continue to parse as { "foo" => nil }, but I'm more on the fence for foo[] between { "foo" => [nil] } (compatible with existing Rack users), and { "foo" => [] } (incompatible if people aren't using compact, but possibly with the slightly redeeming feature that it's more useful / it eliminates a basically-forced compact otherwise).

I'm extremely wary of any configurable option inside a library like this, because all clients that don't control that configuration option (i.e., downstream libraries, as opposed to end-user applications) are then all obliged to handle both possible config scenarios. We've previously been burned on that in Rails, and now fight hard against that inclination especially when it affects return values -- it's the PHP magic quotes problem.

@iMacTia
Copy link

iMacTia commented Jul 27, 2020

I agree with @matthewd points, I just want to elaborate a bit more on it, take a different view of the same problem and try to justify my preference (and what we did in Faraday).
It looks like we overall all agree on the foo -> { foo: nil } behaviour for simple parameters.
Where we're having a hard time is with arrays/hashes, and the main reason is because there's no clear standard telling us what to do.

We should really stop trying to solve this entirely with standards here because neither the URI RFC nor the HTML standards have a clear set of rules for modern web frameworks "nested" query parameters like arrays and hashes.
In fact, if we really wanted to stick with the standard, then arrays should be encoded as comma-separated lists.

So why did we introduce the "nested" encoding? Well I'd say that was to create a "convention" (much different, but still as important as a standard) to better serve users and developers.
Careful though, that doesn't mean the standard can be ignored, that would be preposterous. We're simply bending it to accommodate more use-cases and better fit the new requirements of modern web applications.
So we still need to stick to the rules, like the fact that qs params start with a ? and are separated by &, and the right format is key=value. But how we structure key and value is pretty much up to us and how we decide to make the convention.

Now, if we agree that we're trying to make life easier for users, let's try putting ourselves in their shoes and pretend someone asked the following:

  • How should we interpret foo[] as a query string parameter.

and let's imagine for a moment that our convention didn't cover that case at all, so we're free to choose whatever interpretation we want for it 馃槂

Here are the ones we've mentioned already:

  • { foo: "" }: great, now my Array became a string 馃檲, what's the point? It also matches foo= so it's confusing...
  • { foo: [""] }: this is equivalent to foo[]=, so again it would be a waste to use the same convention
  • { foo: nil }: this is equivalent (and undistinguishable) to foo, and thus useless (a waste of the convention)
  • { foo: [nil] }: kinda makes sense if compared to the previous one, but why on earth would I do it? Now I have to run compact on it.
  • { foo: [] }: this ACTUALLY makes sense, and there are also scenarios where I might need this.

So the last option is the one that makes the most sense AND is not already covered by other representations, thus sounds like the most logical one.

Finally, I'd also advise against giving options unless strictly necessary as that will cause chaos very quickly and increase maintenance complexity in future.

@jeremyevans
Copy link
Contributor

I agree with @matthewd points, I just want to elaborate a bit more on it, take a different view of the same problem and try to justify my preference (and what we did in Faraday).
It looks like we overall all agree on the foo -> { foo: nil } behaviour for simple parameters.

No, we don't all agree on that. It should be {'foo' => ''} according to the the spec. I'd prefer switching to that, but I am OK with keeping the current behavior for backwards compatibility.

  • { foo: [""] }: this is equivalent to foo[]=, so again it would be a waste to use the same convention

Note that if you use {'foo' => ''} for foo, then {'foo' => [""]} for foo[] makes the most sense. According to the spec, the omitted = is the same as the = with an empty right hand side, so it is the approach most consistent with the spec.

  • { foo: [nil] }: kinda makes sense if compared to the previous one, but why on earth would I do it? Now I have to run compact on it.

This format is backwards compatible. I don't think we should break backwards compatibility unless we are going to increase compatibility with the spec.

  • { foo: [] }: this ACTUALLY makes sense, and there are also scenarios where I might need this.

I don't see how this "ACTUALLY" makes more sense. It's inconsistent with foo giving {'foo' => nil}. Consider the following code:

first_id, second_id = params['foo']

Let's say I want to pass nil as the first id and I want to pass '1' as the second id. With the current code, you can do foo[]&foo[]=1. With your approach, what would this do?

{'foo' => []} ACTUALLY makes less sense, it just makes a reasonably common scenario slightly easier.

@iMacTia
Copy link

iMacTia commented Jul 28, 2020

Let's say I want to pass nil as the first id and I want to pass '1' as the second id. With the current code, you can do foo[]&foo[]=1. With your approach, what would this do?

Let's say I want to send an empty array, neither the existing solution nor the proposed "standard-friendly" one solves this.
At this point if the only change we're willing to consider is to get closer to the standard, that would not bring any tangible benefit and will break backwards compatibility, so it's actually better to leave things as they are

@ioquatix
Copy link
Member

@jeremyevans
Copy link
Contributor

https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams shows that my approach (empty string for omitted =) is correct:

var searchParams = new URLSearchParams("q=&topic=api");
searchParams.get("q")
# => ""

var searchParams = new URLSearchParams("q&topic=api");
searchParams.get("q")
# => ""

https://developer.mozilla.org/en-US/docs/Web/API/FormData/FormData converts input to string:

var f = new FormData()
f.append('q', '');
var searchParams = new URLSearchParams(f);
searchParams.get("q")
# => ""

var f = new FormData()
f.append('q', null);
var searchParams = new URLSearchParams(f);
searchParams.get("q")
# => "null"

Considering that nil.to_s in Ruby is the empty string (unlike JS, where String(null) is "null"), this also shows that my approach is the correct one.

@PragTob
Copy link
Author

PragTob commented Jul 31, 2020

I wouldn't say that any of those have much merit right now in deciding what's valid here as what we're doing is undefined/unspecified behaviour since none of these deal with arrays or any sort or form of nesting.
They're doing what the standard says should be done. If anything that'd propose a behaviour to how rack parses q (i.e. not to nil but to empty string) but I don't wanna open that discussion here :)
With their parsing semantics/scope there's no way they'd ever be able to advocate for the [] solution because that is out of the realm of possibility for them. All they can argue is nil vs. ""

There is no standard. For all that I know that parsing behaviour was invented in PHP and then adopted in Ruby. I don't know, but at this point it seems as good a guess as any. Well besides asking the WHATWG if they wanna further refine a standard they called a monstrosity...

Anyhow, let's look at PHP, function seems to be parse_str:

tobi@speedy:~$ php -v
PHP 7.4.8 (cli) (built: Jul 31 2020 21:28:23) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
tobi@speedy:~$ php -a
Interactive shell

php > parse_str("arr[]&bar[]=&foo&zar[]=1", $output); var_dump($output);
array(4) {
  ["arr"]=>
  array(1) {
    [0]=>
    string(0) ""
  }
  ["bar"]=>
  array(1) {
    [0]=>
    string(0) ""
  }
  ["foo"]=>
  string(0) ""
  ["zar"]=>
  array(1) {
    [0]=>
    string(1) "1"
  }
}

So, PHP also parses it to empty string (much to my surprise to be honest :) ).

It's noteworthy though that the top voted comment is someone complaining that PHP does this weirdly and how everyone else does it differently (it's also 13 years old though):

It bears mentioning that the parse_str builtin does NOT process a query string in the CGI standard way, when it comes to duplicate fields. If multiple fields of the same name exist in a query string, every other web processing language would read them into an array, but PHP silently overwrites them:

...
Instead, PHP uses a non-standards compliant practice of including brackets in fieldnames to achieve the same effect.
...
This can be confusing for anyone who's used to the CGI standard, so keep it in mind. As an alternative, I use a "proper" querystring parser function:

Anyone wants to check how CGI parses this query? 馃槄


Anyhow, this leaves us with:

  • [""] - apparently what a lot of languages do (including PHP) and the "most standard conform" on a close reading. But, it'd be inconsistent with the parsing of just q as it happens now (which is nil) and it's the least backwards compatible
  • [] - not "standard" (although we could still argue about "empty byte"), not wide spread, but arguably most useful and backwards compatible at least for rails
  • [nil] - not wide spread, arguable not "standard" but it's been around a while

Which I'm still not sure how to resolve :)

@PragTob PragTob closed this as completed Jul 31, 2020
@PragTob PragTob reopened this Jul 31, 2020
@PragTob
Copy link
Author

PragTob commented Jul 31, 2020

I have no idea what accidental keyboard shortcuts I hit here, I'm sorry for close and empty now deleted comment.

@jeremyevans
Copy link
Contributor

Anyone wants to check how CGI parses this query?

CGI parses everything as an array, does not treat [] in keys specially, and ignores values without an =:

CGI.parse('q=')
# => {"q"=>[""]}
CGI.parse('q')
# => {"q"=>[]}

So to get the value of a parameter, you generally would reference the first value:

CGI.parse('q=')['q'][0]
# => ""
CGI.parse('q')['q'][0]
# => nil
  • [""] - apparently what a lot of languages do (including PHP) and the "most standard conform" on a close reading. But, it'd be inconsistent with the parsing of just q as it happens now (which is nil) and it's the least backwards compatible

Note that if we change the behavior of just q to be correct, as #1699 does, then it isn't inconsistent.

My thoughts on this haven't changed. Either we keep the behavior the same for backwards compatibility, or we change the behavior to be correct. Changing the behavior to a different backwards-incompatible approach, even if it makes a common case easier, is a poor choice IMO.

@ioquatix
Copy link
Member

Looking at this issue again, I think that we should:

  • Follow the standards.
  • Break backwards compatibility if it's non-standards compliant.
  • More complex usage should probably use a different format (e.g. JSON).

While I really strongly feel for the original logic and rational, implementing a bespoke parser which doesn't follow the documented standards (no matter how stupid it may be) feels wrong. I know it's hard to reconcile this position given that no one here is "wrong"... but we should fix the standard if that's the problem.

jeremyevans added a commit to jeremyevans/rack that referenced this issue Jan 25, 2022
The URL spec section 5.1.3.3 specifies that if = is not present
in the byte sequence, it should be treated as if the byte sequence
is the name of the tuple and the value is the empty string.

This affects all parameters without =, not just arrays:

```ruby
Rack::Utils.parse_nested_query("foo[bar]&baz[]&quux")
{"foo"=>{"bar"=>nil}, "baz"=>[nil], "quux"=>nil} # Before
{"foo"=>{"bar"=>""}, "baz"=>[""], "quux"=>""}    # After
```

Fixes rack#1696
jeremyevans added a commit that referenced this issue Jan 25, 2022
The URL spec section 5.1.3.3 specifies that if = is not present
in the byte sequence, it should be treated as if the byte sequence
is the name of the tuple and the value is the empty string.

This affects all parameters without =, not just arrays:

```ruby
Rack::Utils.parse_nested_query("foo[bar]&baz[]&quux")
{"foo"=>{"bar"=>nil}, "baz"=>[nil], "quux"=>nil} # Before
{"foo"=>{"bar"=>""}, "baz"=>[""], "quux"=>""}    # After
```

Fixes #1696
@dblock
Copy link
Contributor

dblock commented Dec 28, 2022

We're discussing how this change breaks backwards compatibility in Grape in ruby-grape/grape#2298, please check it out and comment if you have time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants