Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow encoding of null values in application/x-www-form-urlencoded. #732

Open
ioquatix opened this issue Jan 4, 2023 · 17 comments
Open

Comments

@ioquatix
Copy link

ioquatix commented Jan 4, 2023

url/url.bs

Lines 2933 to 2934 in fdaa0e5

<li><p>Otherwise, let <var>name</var> have the value of <var>bytes</var>
and let <var>value</var> be the empty byte sequence.

Some systems want to encode null values.

e.g.

parse("x=10&y") => {x => "10", y => nil}

differently from empty strings, e.g.

parse("x=10&y=") => {x => "10", y => ""}

I wonder if we can amend this rule to allow optional special interpretation of keys without = separators to indicate that the value is null.

Consider adding the wording:

"empty byte sequence or null value depending on what is most appropriate for the language environment to represent the lack of a value".

Or something to that effect.

See ruby-grape/grape#2298 and rack/rack#1696 for more details/discussion over the past 2 years.

@ioquatix
Copy link
Author

ioquatix commented Jan 4, 2023

After "searching the literature" I found a similar issue here: #427

@annevk
Copy link
Member

annevk commented Jan 4, 2023

That this is about application/x-www-form-urlencoded is not immediately clear.

This was discussed in #469 and I stand by #469 (comment). If you want null, omit the key.

I can appreciate that it's difficult to migrate existing software though. But also, in theory there's nothing stopping servers from interpreting a URL's query however they wish. They just can't claim to support application/x-www-form-urlencoded then.

@ioquatix
Copy link
Author

ioquatix commented Jan 4, 2023

To a certain extent, I think your comment is fair, but where that breaks down is:

  1. Encoding arrays with embedded null values. e.g. encode({x: [1, null, 2]}) => x[]=1&x[]&x[]=2
  2. Explicitly specifying something is null: perhaps the remote end assumes a default value unless otherwise specified, and if you can't explicitly specify null, the default is assumed (so the absence of a key doesn't imply null).
  3. Round-tripping the exact data structure. It's impossible to serialise {x: null} and deserialise the same data on the remote end.

With the proposed changes, it's possible to support these use cases. Otherwise, for most other cases that I'm aware of, simply omitting the value is sufficient.

I made this issue because when we standardised Rack (v3) to follow the current application/x-www-form-urlencoded standard, downstream systems were broken. So, to your point - "migrate existing software" can be tricky.

Either way, I don't personally have a strong opinion about it, but it would be convenient and makes the application/x-www-form-urlencoded format more useful and complete. Either way, having a specific documented stance on allowing or disallowing this would help guide users of URL query parsers what they should and shouldn't expect, and where they might like to reach for something more elaborate (e.g. JSON). For example, the alternative of this proposal is to explicitly call out treating query values like x itself as null, as invalid.

I would like to know, before this specification, was application/x-www-form-urlencoded specified anywhere? It does seem to me that some languages are adopting x itself as a defacto representation of {x: null}. Some research from 2 years ago: rack/rack#1696 (comment)

@ioquatix ioquatix changed the title Allow encoding of null values. Allow encoding of null values in application/x-www-form-urlencoded. Jan 4, 2023
@domenic
Copy link
Member

domenic commented Jan 4, 2023

The key is that the standard only has a single model: a string-to-string(s) multimap. Each entry in this multimap had a string key, and its value is one or more strings.

This model does not support:

  • The value being null
  • The value being a list containing null
  • The value being a zero-sized list
  • The value being a string, instead of a list of strings
  • Any special treatment of keys that end in "[]"
  • Or anything else, like the value being a number, or boolean, or whatever.

In particular, this model is not by itself capable of representing string-keyed maps to arbitrary values in your target language. The value space for each entry is much more limited.

Now. You can build more complicated models on top of the standard's model! You should just do so in a layered fashion, and do so while acknowledging you are going beyond any shared web standard and thus are going to need to do a lot of outreach and consensus building among all consumers/producers you want to interoperate with. In particular, browsers have no need for a more complicated model, and so won't participate. It's easiest if you constrain your model to just one client/server pair, but maybe you want it to expand to "all client/server pairs using a specific library", or even set of collaborating libraries.

Examples of ways you could layer on top of the spec's model include:

  • Treating entries with multiple values for the same key as an error, unless the key ends in "[]", in which case you strip the "[]" suffix. (This one seems common in non-browser libraries. Although I haven't checked their handling of foo=bar&foo=baz; maybe they discard the bar, or discard the baz, or treat it as a list anyway even without the key ending in "[]"?)
  • Treating entries with a single value as a string, instead of a list containing a single string, unless the key ends with "[]". (Also seems common.)
  • Treating entries with a single value "null" as your language's null value, and requiring the string "null" to be encoded some other way.
  • Treating entries with a single value empty string as your language's null value, and requiring the empty string to be encoded some other way.
  • Requiring all values to be JSON-encoded, so you can represent arrays, nulls, booleans, numbers, strings, objects...
  • Using suffixes on the keys to denote data types, so e.g. transforming ("foo!bool", ["false"]) into ("foo", false).

Etc. The main idea is that you have a limited space to work with, when you layer on top of the standard's model. So it'll take some work to get people to agree.

@ioquatix
Copy link
Author

ioquatix commented Jan 4, 2023

@domenic I'm personally fine with that model, but the reality is a fairly predominant pattern of people interpreting x itself as {x: null} and as a maintainer of a popular library, which previously followed that model, and now follows the "a string-to-string(s) multimap" with extensions (layers as you suggest for handling arrays, etc), we have now introduced understandable pain.

Since I think we prefer to follow the standard, my only hope for those users was to propose some kind of change to the standard.

From my point of view, introducing layered interpretations doesn't help that much when everyone has their own bespoke concept of what a layer should be like - the entire point of standards is to standardise the approach so we don't end up with people building N different ways of representing null values etc. Again, I don't have a strong opinion about it, but this is the pain point and I'm just bringing it up and starting the discussion on behalf of the affected users/developers.

@domenic
Copy link
Member

domenic commented Jan 4, 2023

Well, there's already a layer you're imposing on top---whatever layer is doing the "[]" stuff, and is translating lists-of-single-string-values into just string values. That isn't in this standard. My suggestion is to work with whatever community is responsible for that layer, to extend it to support additional semantics that you'd like. That commmunity isn't really related to this repo though, so here is probably not the right place. This repo is fully about the model I outlined above, since that model is the one browsers use.

@ioquatix
Copy link
Author

ioquatix commented Jan 4, 2023

Fair enough. I think what people are asking for is some level of standardisation of some of those layers. Otherwise as a maintainer of a shared implementation, we can't really introduce bespoke layering without being opinionated (and as you said, not following a standard).

Is there any case where browsers actually do need to specify null values? e.g. <input type="hidden" name="x"> followed by x.value = null?

@domenic
Copy link
Member

domenic commented Jan 4, 2023

No, x.value = null is the same as x.value = "null" for browsers.

@annevk
Copy link
Member

annevk commented Jan 4, 2023

I mean, if servers got together and agree on additional semantics I could see compatible extensions to URLSearchParams. Web developers could benefit from typed data as well. (Similar to how we're considering extensions to Headers for typed header values.)

I haven't seen sufficient interest in that though and I suspect most existing usage is fairly entrenched and unable to change. And yeah, given that browsers would not be the primary stakeholders but more a beneficiary this doesn't seem like the right place to drive that, but I'd be open to it if we got a group of people together that's somewhat representative of that space.

@ioquatix
Copy link
Author

ioquatix commented Jan 4, 2023

I think the original semantics (x itself) I described are implemented in a number of different frameworks/libraries/languages already, but whether that's enough to constitute a standard, I don't know. Does any browser generate x instead of x= for an empty value?

@dblock
Copy link

dblock commented Jan 5, 2023

Thanks @ioquatix for bringing this up from Grape and @domenic and @annevk for your comments!

Y'all make a ton of sense, even though I think you are too focused on browsers, which are like people, a bit unpredictable at times. In contrast, I think most API developers that were not forced into using application/x-www-form-urlencoded by a form filled in a browser abandoned it in favor of application/json for lack of clarity for things like nulls in URLs and different server-side interpretations. With the change introduced by Rack strictly interpreting the standard they should not be relying on query string parameters either. We can stay true to the spec.

My conclusion is that it was nice when the implementation allowed nulls in a non-ambiguous way In Grape! What @ioquatix was suggesting is to restore that, but I agree that we're extending the spec by doing so. We plan to cleanly deprecate that behavior, tell our users that it's now undefined, and call it a day.

@ioquatix
Copy link
Author

ioquatix commented Jan 5, 2023

No, x.value = null is the same as x.value = "null" for browsers.

Is this why some systems get confused when someone actually has a name "null"? i.e. no. way to differentiate between "null" and null. My guess is there is code that writes if params[name] == "null" return null. Maybe a motivation towards fixing this issue. Do HTML forms ever do this?

Some funny examples:

image

image

@dblock
Copy link

dblock commented Jan 5, 2023

I found a lot of NULLs in URLs, https://www.wired.com/2015/11/null/ 🤷

@karwa
Copy link
Contributor

karwa commented Jan 28, 2023

If we treated pairs with no key-value delimiter as having a null value, I think we would also have to accept that in the URL http://example/foo&&&baz&&another, the key-value pairs are:

key: "foo", value: null
key: "", value: null
key: "", value: null
key: "baz", value: null
key: "", value: null
key: "another", value: null

In other words, we would not be able to skip strings of empty pair delimiters (&&&&). AFAICT, that skipping is widely agreed upon by other URL/querystring libraries, so this would be a significant departure.

To illustrate why we would need to do that:

I have an API which allows identifying each key-value pair by its position (i.e. an index) and supports index-based operations such as inserting pairs at a particular location, replacing a region of pairs, removing a particular pair, or changing the key/value of a particular pair.

Now consider the URL http://example/?foo&baz&another#frag. The API allows replacing the key component of the first pair:

url.queryParams.replaceKey(at: 0, with: "new_key")
// result: http://example/?new_key&baz&another#frag
//                         ^^^^^^^

But if you replace the key with the empty string, something interesting happens:

url.queryParams.replaceKey(at: 0, with: "")
// result: http://example/?=&baz&another#frag
//                         ^

Since empty strings of delimiters are usually skipped entirely, we must insert an = sign in order to preserve the fact that the first pair exists and its key is the empty string.

Currently this is fine, because the presence or not of the = sign does not change the value component. If we started to say that the presence of that delimiter is meaningful, then inserting this = would also change the pair's value. The only way around it would be allow the following result:

url.queryParams.replaceKey(at: 0, with: "")
// result: http://example/?&baz&another#frag
//                         ^

And to say that this initial & is actually a pair with (key: "", value: null).

I'm not opposed to that (in fact, the &&&-skipping can be problematic in lots of cases), but I think the two changes would need to happen together, and I think this aspect is likely to cause even more compatibility issues.

@ioquatix
Copy link
Author

Does any framework in existence actually do that?

@karwa
Copy link
Contributor

karwa commented Jan 29, 2023

Do what? Interpret a plain & as (key: "", value: null)?

Not that I know of - that's why I said it would be a significant departure, but it is also a logical consequence of saying that a missing key-value delimiter means a null value.

@ioquatix
Copy link
Author

While I understand order of elements has semantic meaning, the degenerate case of && seems unimportant to me, and my question was what if any frameworks depend on those semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants