Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are key values collected from headers case insensitive? #549

Open
Josh-Marshall-FactSet opened this issue Oct 5, 2020 · 15 comments
Open
Labels

Comments

@Josh-Marshall-FactSet
Copy link

The documentation on passed values (https://flask-smorest.readthedocs.io/en/latest/arguments.html) for form values, json, headers, query, etc. do not have case sensitivity documented. I would expect locations like json would be case sensitive, but locations like headers to be case insensitive. Looking at https://github.com/marshmallow-code/flask-smorest/blob/e06325d7a201af6d07caa2e8e103fba15eb4b9b7/flask_smorest/arguments.py#L105 I'd guess everything is case sensitive.

@lafrech lafrech transferred this issue from marshmallow-code/flask-smorest Oct 5, 2020
@lafrech
Copy link
Member

lafrech commented Oct 5, 2020

Indeed, case sensitive everywhere.

Transfering to apispec, as this is where arguments are parsed.

This might be challenging to address. It all happens in the marshmallow schema.

Perhaps when dealing with headers, one could

  • lowercase all schema field names
  • add a pre-load methods to lowercase the input

I don't see an easy and nice way to do that.

@Josh-Marshall-FactSet
Copy link
Author

It is an important use case for me. Trying to tame some legacy stuff, and it could be going more smoothly. Another thing which I'm not sure is relevant is allowing args to come from different location, like query and json.

@lafrech lafrech transferred this issue from marshmallow-code/apispec Oct 5, 2020
@lafrech
Copy link
Member

lafrech commented Oct 5, 2020

My bad. Got apispec and webargs mixed up.

@lafrech lafrech closed this as completed Oct 5, 2020
@lafrech
Copy link
Member

lafrech commented Oct 5, 2020

My bad. Got apispec and webargs mixed up.

@lafrech lafrech reopened this Oct 5, 2020
@lafrech
Copy link
Member

lafrech commented Oct 5, 2020

Sorry about the transfer mess.

The point about query or json is addressed in the docs: https://webargs.readthedocs.io/en/latest/advanced.html#meta-locations.

@sirosen
Copy link
Collaborator

sirosen commented Nov 5, 2020

Because webargs is re-exposing framework data, this seems like a framework-specific detail. Some frameworks like Django expose a case-insensitive dict, and some do not.

Do we want to treat this as a doc issue, or do we actually want to wrap headers in our own case-insensitive dict implementation?
I'm somewhat averse to wrapping the data just because I like keeping webargs small where we can.

@lafrech, are you considering such a wrapper in flask-smorest ? If you are, that might change my attitude here -- if you're going to write that code anyway, maybe it's easy enough to just bake it into webargs.

@sirosen
Copy link
Collaborator

sirosen commented Feb 1, 2021

@lafrech, as I look at this, might this be something we could solve with #583 ?

flask-smorest could subclass FlaskParser to provide a pre_load method which checks for location="header" and lowercases the keys by default? Then this becomes a feature of flask-smorest which is supported by webargs.
What do you think?

@lafrech
Copy link
Member

lafrech commented Mar 16, 2021

Indeed we could add a pre_load method to do that.

Since HTTP headers are case-insensitive everywhere, not just in OpenAPI spec, this might be better suited here than in flask-smorest.

I don't think anyone would mind having it here.

Besides if it is a parser method, someone who really doesn't want that can always override it with a noop. We can even add an opt-out config attribute.

@sirosen
Copy link
Collaborator

sirosen commented Mar 17, 2021

I haven't done the 8.0 release in case we want to include a change related to this; but I've been swamped and haven't had time to think about it much.

The fact that a header can be provided multiple times (so headers are a multidict) could make this messy. We might need our own multidict implementation if we wanted to support that.

I think the easiest thing to do would be to release 8.0 with the current pre_load support, and, like stripping whitespace from values, create a documentation example for this.

# in advanced docs...

# warning: this will not preserve headers which are provided multiple times!
def _lowercase_keys(value):
    return {k.lower(): v for k, v in value.items()}

class LowercaseHeadersParser(FlaskParser):
    def pre_load(self, location_data, *, schema, req, location):
        if location == "header":
            return _lowercase_keys(location_data)
        return location_data

Does that sound good?

I could contribute something like the above to flask-smorest if that impacts the decision making here. I'm more comfortable doing it there for a couple of reasons: there's only one framework to support and we haven't done v1.0 there yet.

@lafrech
Copy link
Member

lafrech commented Mar 17, 2021

No rush for the release. I could do it as well, but like you, I didn't get the time.

If this reveals more complicated than I thought, we can postpone it.

It would be nice to come up with an implementation that doesn't lose the multiple values.

I thought we'd just mutate the dict, but it looks like the headers multidict is immutable, so we need to create a new one.

IIUC, creating a new multidict is trickier because it depends on the initial multidict class (which depends on the web framework).

I'm not familiar with MultiDicts and I only use Flask so my vision is rather limited.

@Josh-Marshall-FactSet
Copy link
Author

Would this be relevant and helpful for a related group's solution to the problem? https://requests.readthedocs.io/en/master/_modules/requests/structures/

@lafrech
Copy link
Member

lafrech commented Mar 18, 2021

Looks like it is not a MultiDict. It is case insensitive, but not multi.

Still, it is indeed a good idea to check how requests or other libs deal with this, if they do.

Also relevant : https://stackoverflow.com/a/4371395

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded

So multiple & case insensitive should be supported. And order preserved.

@sirosen
Copy link
Collaborator

sirosen commented Mar 18, 2021

This thread is reminding me of a past issue in which a user had a schema pre_load hook which modified data. Because the underlying multi-dict was immutable, the user had to run data = dict(data) as the first step in the hook.

One direction we could go is to apply that more generally and "unwrap" MultiDictProxy into other types.
dict(MultiDictProxy(...)) works. If we construct our own type, CaseInsensitiveDict, then we could do CaseInsensitiveDict(MultiDictProxy(...)) for headers. (And, logical next step: I'd replace MultiDictProxy with a function which returns a dict or CaseInsensitiveDict.)

The place that runs into trouble is how we detect multi-fields. Because it would put case insensitivity "around" MultiDictProxy, the inner workings of MultiDictProxy don't get case insensitivity.

But, following from that thought, maybe there's a way we can get all the way to providing a dict to user schemas always. Like dict(CaseInsensitiveMultiDictProxy(...))? I think that would be pretty nice if we can manage it.

@sirosen
Copy link
Collaborator

sirosen commented Apr 8, 2021

I've been thinking about this on and off for a few weeks, and I think we should ship 8.0 without doing anything on this front.

Today, if a framework provides headers as case-insensitive, then the proxy object works and behaves case-insensitively. If a framework provides headers as case sensitive, then the proxy object works and behaves case-sensitively. That seems pretty reasonable -- you get behavior based on what your framework provides.

I started looking at what the frameworks give us this morning, to see how much variation there is on this front. To save time I just looked at flask/werkzeug, pyramid/webob, and django.
Django and Pyramid/webob both implement headers as case-insensitive non-multidict types. When I opened up werkzeug, it looks to me like flask/werkzeug is already case insensitive.

So the answer to the OP's question is twofold: (1) it depends on your framework and (2) if you're using flask/flask-smorest, then yes, headers are case-insensitive.

@lafrech, if this sounds good, let me know, and I'll do the 8.0 release. Since we've now let it sit for a bit, I want to re-confirm.

@lafrech
Copy link
Member

lafrech commented Apr 8, 2021

Thanks for this research.

You may ship 8.0.

No need to wait for marshmallow-code/marshmallow#1742. We should be able to support that change without dropping support for old names. It's only tests and docstring changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants