RFC: Allow a single location for each field #419

lafrech · 2019-08-27T09:20:51Z

See discussion in #267.

The idea is that the Schema would be passed the whole data from a location, so there couldn't be fields with different location in the same Schema.

Drop location argument to Field
Only accept a single location in use_args

This is an important breaking change.

It breaks the default use case, which is to iterate on all fields, and for each field search in DEFAULT_LOCATIONS (by default: query string, form, then json).

We could keep a somewhat similar behaviour by trying to load the whole data from query string, then form, then json. The only added restriction is that Schema couldn't contain fields from mixed locations.

I'm not convinced with the multi-location feature, so we could even

Replace DEFAULT_LOCATIONS with DEFAULT_LOCATION

The user could subclass the parser to change the default location (in our framework, we use json by default), and he would need to specify the location explicitly when it is not the default.

I think all this changes allow for better documentation (e.g. in apispec).

The text was updated successfully, but these errors were encountered:

sirosen · 2019-08-27T14:41:41Z

I'm not convinced with the multi-location feature

Neither am I. I suggested the wrapper example in #267 in part to make the case that "the user can do this him/herself if it's really wanted".

lafrech · 2019-08-27T21:19:02Z

Great. @sirosen Would you be willing to work on this?

sirosen · 2019-08-27T21:28:05Z

I'm definitely willing to, but I'm not sure when I'll get to it. I can try to carve out time for it in the next 2 weeks.

sloria · 2019-08-27T23:47:48Z

I think this is the way to go, so long as there's a decent story for users who need the legacy behavior. I've seen APIs that accept either form-encoded or JSON request payloads. We should document how to achieve this in the narrative docs and changelog.

lafrech · 2019-08-28T07:49:09Z

Actually, we could accept multiple locations. What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).

We could implement "try whole Schema on first location, then second if first fails" (basically what's in #267 (comment)). That would be documentable. And that would be compatible with the "pass all location data to the Schema at once" direction we want to take.

I'm just wondering about the error message content.

If first location fails because a single field is required and missing, the second location is tried. It is empty. We should return the error message of the first attempt, I guess.

I've seen APIs that accept either form-encoded or JSON request payloads.

But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?

What error message would you expect when sending wrong data in form-encoded and no data in JSON? In other words, how does the parser know, from the two locations (one of them can be empty), which to use to return the error message.

If one of the two is empty, it can be an easy case. But there could be cases where it is not, because it contains data related to another argument/Schema.

sloria · 2019-08-28T13:42:20Z

What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).

But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?

I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?), so we need not support the 'single schema, mixed locations' case.

We could implement "try whole Schema on first location, then second if first fails" ... I'm just wondering about the error message content.

Yeah, the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.

lafrech · 2019-08-28T14:02:31Z

I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?)

Hmmm... I guess that would be multipart. OK, there could be more plausible examples of mixed-locations, like query args and path perhaps, but none of those I could think of were convincingly realistic.

Let's drop single schema - mixed locations. Definitely a corner case.

so we need not support the 'single schema, multiple locations' case.

Single schema - multiple locations is not such a corner case. It is the case you present in #267 (comment):

APIs that accept either form-encoded or JSON request payloads

We do not have to support this but it would suck if the refactor prevented the user from supporting this. And it's nice if we can document how to do it.

Fine for me if we leave it to user code, with an example in the docs.

the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.

As long as there's not mixed locations, there's no need to merge error messages. Just pick the one that corresponds to the location the faulty data was sent. It's not trivial to tell at first sight which one it is because both locations return an error.

Worse, in the (non-mixed) multi-location case, using the workaround in #267 (comment), if the data is sent in one location with a validation error, the validation fails, the second location is tried. It is empty. If no field is required, it doesn't error at all. But input data is silently ignored.

Perhaps the user code should probe first which of the two locations actually has content and call webargs only once with this.

sloria · 2019-08-28T14:15:42Z

Single schema - multiple locations is not such a corner case

Woops, I meant 'single schema, mixed locations'--fixed my comment.

Won't we need to merge error messages for the stacked case?

@use_args({"related_id": fields.Str(required=True))}, location="query"})
@use_args({"name": fields.Str(required=True))}, location="json")

{"related_id": ['Missing...'], "name": ["Missing..."]}

lafrech · 2019-08-28T14:28:12Z

Oh, this.

I didn't think of it. Well, if we don't do anything, the request will abort in the external wrapper, so it will return only query args validation errors first. And when those are fixed, it'll return the errors from the json part.

I don't see an easy way to provide all the errors at once. But it's not a big deal if we don't IMHO.

It's like sometimes input data passes webargs validation and then triggers an error in the view func (e.g. unique index issue in DB), so a new error message appears.

BTW I gave it a quick look and this is gonna be a hell of a rework. Close to a total rewrite. Hopefully, not too many tests will be broken, so it won't be like starting from scratch.

sloria · 2019-08-28T16:03:45Z

Yeah, I also think it's fine if we don't merge errors; it's not necessarily more intuitive from the user's POV. Anyway, it can be done in userland.

I'm happy to work on this, but I'm not using webargs at work currently, so I'm concerned I might overlook corner cases. I think we should put this in a pre-release so it can get some early testing.

@sirosen Thanks for offering to tackle this. If you put up an initial proof-of-concept/WIP PR, I can work with you on the rest.

Stated goals in the form of CHANGELOG update Gut significant chunks of webargs.core - replace "parse_{location}" with "location_load_{location}" "locations=<collection>" with "location=<string>" - remove "parse_arg" "get_value" - add webargs.multidictproxy which uses a Schema to handle MultiDict types and behave similarly to webargs.core.get_value Some random work on tests and flaskparser as a proving ground for this implementation/approach, still really not working.

sirosen · 2019-08-31T13:02:24Z

I've finally got a few hours today to try to get something real going. This branch in my fork is where I'll be hacking away if anyone wants to take a peek, but this is just what I could scratch together during the work-week and it's quite broken right now (almost all tests fail).

Initial thoughts and questions based on starting on an implementation:

I'm marking this as v6.0.0 in the changelog and noting differences from webargs v5. Does that sound right?
- Maybe webargs wants to do a final 5.x release before this merges, so that it can be the start of a 6.x branch with any other changes we might want
To try to keep things clear in meaning, I'm doing a rename parse_{location} -> location_load_{location}, e.g. location_load_json. I'm going with this somewhat verbose option in the hopes that it's clear that this is just a hook for reading data but doesn't do any "parsing"
Parser.parse_arg is going away because it's not clear what it should do
Going to a single DEFAULT_LOCATION = "json" and a single location per call is a big API improvement, IMO -- I've been pursuing that.
- "json" is arguable as a choice. But flask-rest-api defaults to json and I came across flask-restful as well, which uses ('json', 'values').
get_value is being replaced with a schema-aware proxy for result dicts -- otherwise, MultiDicts won't work right

I've been thinking about a couple of sticky cases in this thread and I have two ideas I'd like to put forth.

I've seen APIs that accept either form-encoded or JSON request payloads.

This seems very legitimate and should be supported out-of-the-box if possible. Is this the only "normal" case we're aware of in which a single schema is applied to multiple locations?

If so, I'd suggest adding location="json_or_form" with the following meaning:

try to load json, then form body, without passing the result to schema
(aside: ensure that all json parse methods return missing when there's no content, not {} -- I noticed that not all do)
pass whatever the result of the above is -- missing if neither is populated, json data if both are, form post only if json is missing and form body is present -- to the schema
no retry, fallback, or other nesting of errors

The result would be write-able in terms of a location_load_json_or_form method on the base parser, I think.

I like this because it doesn't (re)introduce locations=... or a rename like try_locations=... just to handle a special case that we want to support.

If users have more exotic APIs -- e.g. "takes query parameters or headers for X" -- they can implement it following this same model on a custom Parser subclass.
They can even implement strange (read: unwise) things like merging locations with this kind of "meta-location" pattern.

Won't we need to merge error messages for the stacked case?

@use_args({"related_id": fields.Str(required=True))}, location="query"})
@use_args({"name": fields.Str(required=True))}, location="json")

{"related_id": ['Missing...'], "name": ["Missing..."]}

In exchange for putting a slight burden on users in this case, I think we can support this really easily with this kind of usage:

@webargs.collect(
    parser.use_args({"related_id": fields.Str(required=True))}, location="query"}),
    parser.use_args({"name": fields.Str(required=True))}, location="json"),
)
def viewfunc(name, related_id):
    ...

which might raise a

WebargsCollectedError([
  {"related_id": ['Missing...']},
  {"name": ["Missing..."]}
])

The result will be that use_args stays very simple and easy to maintain.

If we want use_args to do this internally, I think it's still a good idea to keep the errors separate, wrapped in a list, and not try to merge their contents together. The user can decide what to do with that information.

The best way I can think to support this "magically" (i.e. without the explicit collect wrapper) is the approach used in pallets/click in which you create a special attribute to store data. You then need to replace the function with a singular callable (click uses objects, but maybe just a function is fine) that knows how to handle that attribute -- so it's slightly tricker than the typical nested decorators case.

Fixes as the location loading code undergoes refactoring for the flaskparser: - Restore missing mimetype check for JSON payloads - Treat form data as a multidict without extraneous AttributeError check (after testing, flask will return '{}' for form data when it is missing or empty -- just respect/allow this) - req.headers is a multidict type as well - fix some mistakes with new argument ordering Changes to the base test app at src/webargs/testing.py of note: - Because each route can only specify that it loads data from a single location (unless you pull extra tricks), several testing cases have had their routes modified, e.g. "/echo" -> "/echo_json" - New test, test_parse_json_list_error_malformed_data added to check that webargs does not incorrectly treat non-list data as singleton lists when passing data to a schema with list fields. The previous behavior was questionable before, but with full data going to schemas it definitely becomes incorrect. - Many tests which previously expected success with malformed data ignored now expect 422 errors. For example, sending a non-dict JSON value (e.g. `1`) should fail because it cannot be loaded by the schema. It should *not* be silently ignored because data was provided and failed to parse. This is a major behavior change for webargs. - Reverse the decision on marshmallow-code#297 . After reading the referent material, I agree with the aiohttp maintainers in aio-libs/aiohttp#3302 . "" and "{}" should not be treated equivalently -- they are different inputs. Additionally, I foresee that supporting "" as "{}" will pose an issue if the proposal for "json_or_form" in marshmallow-code#419 is accepted. More detailed explanation is provided in the inline comment, and the test for this case is preserved with its expectations reversed Supporting modifications were made in tests/apps/flask_app.py , including the addition of `ma.EXCLUDE` when the detected marshmallow version is >= 3. Minor fix in core: restore the cache clearing step. This is apparently used by the flaskparser. Possibly this only shows up under test usage? The main parse function should always be cloning the parser before its cache could be populated.

Stated goals in the form of CHANGELOG update Gut significant chunks of webargs.core - replace "parse_{location}" with "location_load_{location}" "locations=<collection>" with "location=<string>" - remove "parse_arg" "get_value" - add webargs.multidictproxy which uses a Schema to handle MultiDict types and behave similarly to webargs.core.get_value Some random work on tests and flaskparser as a proving ground for this implementation/approach, still really not working.

Fixes as the location loading code undergoes refactoring for the flaskparser: - Restore missing mimetype check for JSON payloads - Treat form data as a multidict without extraneous AttributeError check (after testing, flask will return '{}' for form data when it is missing or empty -- just respect/allow this) - req.headers is a multidict type as well - fix some mistakes with new argument ordering Changes to the base test app at src/webargs/testing.py of note: - Because each route can only specify that it loads data from a single location (unless you pull extra tricks), several testing cases have had their routes modified, e.g. "/echo" -> "/echo_json" - New test, test_parse_json_list_error_malformed_data added to check that webargs does not incorrectly treat non-list data as singleton lists when passing data to a schema with list fields. The previous behavior was questionable before, but with full data going to schemas it definitely becomes incorrect. - Many tests which previously expected success with malformed data ignored now expect 422 errors. For example, sending a non-dict JSON value (e.g. `1`) should fail because it cannot be loaded by the schema. It should *not* be silently ignored because data was provided and failed to parse. This is a major behavior change for webargs. - Reverse the decision on marshmallow-code#297 . After reading the referent material, I agree with the aiohttp maintainers in aio-libs/aiohttp#3302 . "" and "{}" should not be treated equivalently -- they are different inputs. Additionally, I foresee that supporting "" as "{}" will pose an issue if the proposal for "json_or_form" in marshmallow-code#419 is accepted. More detailed explanation is provided in the inline comment, and the test for this case is preserved with its expectations reversed Supporting modifications were made in tests/apps/flask_app.py , including the addition of `ma.EXCLUDE` when the detected marshmallow version is >= 3.

Stated goals in the form of CHANGELOG update Gut significant chunks of webargs.core - replace "parse_{location}" with "load_{location}" "locations=<collection>" with "location=<string>" "location_handler" with "location_loader" - remove "parse_arg" "get_value" - add webargs.multidictproxy which uses a Schema to handle MultiDict types and behave similarly to webargs.core.get_value As a proving ground for this implementation/approach, get flaskparser tests to pass under ma3 Fixes as the location loading code undergoes refactoring for the flaskparser: - Restore missing mimetype check for JSON payloads - Treat form data as a multidict without extraneous AttributeError check (after testing, flask will return '{}' for form data when it is missing or empty -- just respect/allow this) - req.headers is a multidict type as well - fix some mistakes with new argument ordering Changes to the base test app at src/webargs/testing.py of note: - Because each route can only specify that it loads data from a single location (unless you pull extra tricks), several testing cases have had their routes modified, e.g. "/echo" -> "/echo_json" - New test, test_parse_json_list_error_malformed_data added to check that webargs does not incorrectly treat non-list data as singleton lists when passing data to a schema with list fields. The previous behavior was questionable before, but with full data going to schemas it definitely becomes incorrect. - Many tests which previously expected success with malformed data ignored now expect 422 errors. For example, sending a non-dict JSON value (e.g. `1`) should fail because it cannot be loaded by the schema. It should *not* be silently ignored because data was provided and failed to parse. This is a major behavior change for webargs. Supporting modifications were made in tests/apps/flask_app.py , including the addition of `ma.EXCLUDE` when the detected marshmallow version is >= 3. After work on the relevant PR and trying things both ways, the current handling of empty JSON bodies (interpreting them as {}) is being kept externally, but changed internally. This relates to the decision made in marshmallow-code#297 . Originally in the course of work on marshmallow-code#419, it seemed like it might be necessary to change the behavior to make missing JSON data behave as `missing` externally. However, making the change internal and wrapping it to present the same external interface is best. load_* methods are expected to return `missing` or {} when data is missing for the given location. e.g. When a JSON body is "" it should be `missing`. This is considered correct. However, in terms of the external behavior of webargs, no data should be interpreted as `{}`. In order to achieve this, the wrapper which calls the various "loader_func"s to load location data explicitly checks the result for `missing` and converts it to `{}` if it is missing. This makes it easy to use webargs to implement an API with optional JSON data, but internally preserves the ability to detect a missing JSON payload and distinguish it from a payload containing `{}`.

lafrech added the backwards incompatible label Aug 27, 2019

lafrech mentioned this issue Aug 27, 2019

Leave logic of deciding on extra args to schema #267

Closed

lafrech changed the title ~~RFC: Enforce a single location for each field~~ RFC: Allow a single location for each field Aug 27, 2019

sloria pinned this issue Aug 27, 2019

sirosen mentioned this issue Aug 28, 2019

RFC: Change the way we store metadata? marshmallow-code/marshmallow#1350

Closed

sirosen mentioned this issue Sep 4, 2019

Only allow users to specify a single location per parse call & pass full location data to schema.load #420

Merged

lafrech closed this as completed in #420 Jan 3, 2020

sirosen mentioned this issue Jan 3, 2020

Combine errors from nested use_args calls so that mixed locations still provide full errors in webargs 6.x #455

Closed

lafrech unpinned this issue Jan 25, 2020

lafrech mentioned this issue Mar 31, 2020

Webargs 6.0.0 has broken Flaskparser @use_kwargs: Never parses query string or formdata #483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Allow a single location for each field #419

RFC: Allow a single location for each field #419

lafrech commented Aug 27, 2019

sirosen commented Aug 27, 2019

lafrech commented Aug 27, 2019

sirosen commented Aug 27, 2019

sloria commented Aug 27, 2019

lafrech commented Aug 28, 2019

sloria commented Aug 28, 2019 •

edited

lafrech commented Aug 28, 2019

sloria commented Aug 28, 2019

lafrech commented Aug 28, 2019

sloria commented Aug 28, 2019

sirosen commented Aug 31, 2019

RFC: Allow a single location for each field #419

RFC: Allow a single location for each field #419

Comments

lafrech commented Aug 27, 2019

sirosen commented Aug 27, 2019

lafrech commented Aug 27, 2019

sirosen commented Aug 27, 2019

sloria commented Aug 27, 2019

lafrech commented Aug 28, 2019

sloria commented Aug 28, 2019 • edited

lafrech commented Aug 28, 2019

sloria commented Aug 28, 2019

lafrech commented Aug 28, 2019

sloria commented Aug 28, 2019

sirosen commented Aug 31, 2019

sloria commented Aug 28, 2019 •

edited