Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Allow a single location for each field #419

Closed
lafrech opened this issue Aug 27, 2019 · 11 comments · Fixed by #420
Closed

RFC: Allow a single location for each field #419

lafrech opened this issue Aug 27, 2019 · 11 comments · Fixed by #420

Comments

@lafrech
Copy link
Member

lafrech commented Aug 27, 2019

See discussion in #267.

The idea is that the Schema would be passed the whole data from a location, so there couldn't be fields with different location in the same Schema.

  • Drop location argument to Field
  • Only accept a single location in use_args

This is an important breaking change.

It breaks the default use case, which is to iterate on all fields, and for each field search in DEFAULT_LOCATIONS (by default: query string, form, then json).

We could keep a somewhat similar behaviour by trying to load the whole data from query string, then form, then json. The only added restriction is that Schema couldn't contain fields from mixed locations.

I'm not convinced with the multi-location feature, so we could even

  • Replace DEFAULT_LOCATIONS with DEFAULT_LOCATION

The user could subclass the parser to change the default location (in our framework, we use json by default), and he would need to specify the location explicitly when it is not the default.

I think all this changes allow for better documentation (e.g. in apispec).

@sirosen
Copy link
Collaborator

sirosen commented Aug 27, 2019

I'm not convinced with the multi-location feature

Neither am I. I suggested the wrapper example in #267 in part to make the case that "the user can do this him/herself if it's really wanted".

@lafrech lafrech changed the title RFC: Enforce a single location for each field RFC: Allow a single location for each field Aug 27, 2019
@lafrech
Copy link
Member Author

lafrech commented Aug 27, 2019

Great. @sirosen Would you be willing to work on this?

@sirosen
Copy link
Collaborator

sirosen commented Aug 27, 2019

I'm definitely willing to, but I'm not sure when I'll get to it. I can try to carve out time for it in the next 2 weeks.

@sloria sloria pinned this issue Aug 27, 2019
@sloria
Copy link
Member

sloria commented Aug 27, 2019

I think this is the way to go, so long as there's a decent story for users who need the legacy behavior. I've seen APIs that accept either form-encoded or JSON request payloads. We should document how to achieve this in the narrative docs and changelog.

@lafrech
Copy link
Member Author

lafrech commented Aug 28, 2019

Actually, we could accept multiple locations. What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).

We could implement "try whole Schema on first location, then second if first fails" (basically what's in #267 (comment)). That would be documentable. And that would be compatible with the "pass all location data to the Schema at once" direction we want to take.

I'm just wondering about the error message content.

If first location fails because a single field is required and missing, the second location is tried. It is empty. We should return the error message of the first attempt, I guess.

I've seen APIs that accept either form-encoded or JSON request payloads.

But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?

What error message would you expect when sending wrong data in form-encoded and no data in JSON? In other words, how does the parser know, from the two locations (one of them can be empty), which to use to return the error message.

If one of the two is empty, it can be an easy case. But there could be cases where it is not, because it contains data related to another argument/Schema.

@sloria
Copy link
Member

sloria commented Aug 28, 2019

What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).

But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?

I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?), so we need not support the 'single schema, mixed locations' case.

We could implement "try whole Schema on first location, then second if first fails" ... I'm just wondering about the error message content.

Yeah, the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.

@lafrech
Copy link
Member Author

lafrech commented Aug 28, 2019

I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?)

Hmmm... I guess that would be multipart. OK, there could be more plausible examples of mixed-locations, like query args and path perhaps, but none of those I could think of were convincingly realistic.

Let's drop single schema - mixed locations. Definitely a corner case.

so we need not support the 'single schema, multiple locations' case.

Single schema - multiple locations is not such a corner case. It is the case you present in #267 (comment):

APIs that accept either form-encoded or JSON request payloads

We do not have to support this but it would suck if the refactor prevented the user from supporting this. And it's nice if we can document how to do it.

Fine for me if we leave it to user code, with an example in the docs.

the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.

As long as there's not mixed locations, there's no need to merge error messages. Just pick the one that corresponds to the location the faulty data was sent. It's not trivial to tell at first sight which one it is because both locations return an error.

Worse, in the (non-mixed) multi-location case, using the workaround in #267 (comment), if the data is sent in one location with a validation error, the validation fails, the second location is tried. It is empty. If no field is required, it doesn't error at all. But input data is silently ignored.

Perhaps the user code should probe first which of the two locations actually has content and call webargs only once with this.

@sloria
Copy link
Member

sloria commented Aug 28, 2019

Single schema - multiple locations is not such a corner case

Woops, I meant 'single schema, mixed locations'--fixed my comment.


Won't we need to merge error messages for the stacked case?

@use_args({"related_id": fields.Str(required=True))}, location="query"})
@use_args({"name": fields.Str(required=True))}, location="json")
{"related_id": ['Missing...'], "name": ["Missing..."]}

@lafrech
Copy link
Member Author

lafrech commented Aug 28, 2019

Oh, this.

I didn't think of it. Well, if we don't do anything, the request will abort in the external wrapper, so it will return only query args validation errors first. And when those are fixed, it'll return the errors from the json part.

I don't see an easy way to provide all the errors at once. But it's not a big deal if we don't IMHO.

It's like sometimes input data passes webargs validation and then triggers an error in the view func (e.g. unique index issue in DB), so a new error message appears.

BTW I gave it a quick look and this is gonna be a hell of a rework. Close to a total rewrite. Hopefully, not too many tests will be broken, so it won't be like starting from scratch.

@sloria
Copy link
Member

sloria commented Aug 28, 2019

Yeah, I also think it's fine if we don't merge errors; it's not necessarily more intuitive from the user's POV. Anyway, it can be done in userland.

I'm happy to work on this, but I'm not using webargs at work currently, so I'm concerned I might overlook corner cases. I think we should put this in a pre-release so it can get some early testing.

@sirosen Thanks for offering to tackle this. If you put up an initial proof-of-concept/WIP PR, I can work with you on the rest.

sirosen added a commit to sirosen/webargs that referenced this issue Aug 28, 2019
Stated goals in the form of CHANGELOG update

Gut significant chunks of webargs.core
- replace
    "parse_{location}" with "location_load_{location}"
    "locations=<collection>" with "location=<string>"
- remove
    "parse_arg"
    "get_value"
- add
    webargs.multidictproxy which uses a Schema to handle MultiDict types
    and behave similarly to webargs.core.get_value

Some random work on tests and flaskparser as a proving ground for this
implementation/approach, still really not working.
@sirosen
Copy link
Collaborator

sirosen commented Aug 31, 2019

I've finally got a few hours today to try to get something real going. This branch in my fork is where I'll be hacking away if anyone wants to take a peek, but this is just what I could scratch together during the work-week and it's quite broken right now (almost all tests fail).

Initial thoughts and questions based on starting on an implementation:

  • I'm marking this as v6.0.0 in the changelog and noting differences from webargs v5. Does that sound right?
    • Maybe webargs wants to do a final 5.x release before this merges, so that it can be the start of a 6.x branch with any other changes we might want
  • To try to keep things clear in meaning, I'm doing a rename parse_{location} -> location_load_{location}, e.g. location_load_json. I'm going with this somewhat verbose option in the hopes that it's clear that this is just a hook for reading data but doesn't do any "parsing"
  • Parser.parse_arg is going away because it's not clear what it should do
  • Going to a single DEFAULT_LOCATION = "json" and a single location per call is a big API improvement, IMO -- I've been pursuing that.
    • "json" is arguable as a choice. But flask-rest-api defaults to json and I came across flask-restful as well, which uses ('json', 'values').
  • get_value is being replaced with a schema-aware proxy for result dicts -- otherwise, MultiDicts won't work right

I've been thinking about a couple of sticky cases in this thread and I have two ideas I'd like to put forth.

I've seen APIs that accept either form-encoded or JSON request payloads.

This seems very legitimate and should be supported out-of-the-box if possible. Is this the only "normal" case we're aware of in which a single schema is applied to multiple locations?

If so, I'd suggest adding location="json_or_form" with the following meaning:

  • try to load json, then form body, without passing the result to schema
    (aside: ensure that all json parse methods return missing when there's no content, not {} -- I noticed that not all do)
  • pass whatever the result of the above is -- missing if neither is populated, json data if both are, form post only if json is missing and form body is present -- to the schema
  • no retry, fallback, or other nesting of errors

The result would be write-able in terms of a location_load_json_or_form method on the base parser, I think.

I like this because it doesn't (re)introduce locations=... or a rename like try_locations=... just to handle a special case that we want to support.

If users have more exotic APIs -- e.g. "takes query parameters or headers for X" -- they can implement it following this same model on a custom Parser subclass.
They can even implement strange (read: unwise) things like merging locations with this kind of "meta-location" pattern.

Won't we need to merge error messages for the stacked case?

@use_args({"related_id": fields.Str(required=True))}, location="query"})
@use_args({"name": fields.Str(required=True))}, location="json")
{"related_id": ['Missing...'], "name": ["Missing..."]}

In exchange for putting a slight burden on users in this case, I think we can support this really easily with this kind of usage:

@webargs.collect(
    parser.use_args({"related_id": fields.Str(required=True))}, location="query"}),
    parser.use_args({"name": fields.Str(required=True))}, location="json"),
)
def viewfunc(name, related_id):
    ...

which might raise a

WebargsCollectedError([
  {"related_id": ['Missing...']},
  {"name": ["Missing..."]}
])

The result will be that use_args stays very simple and easy to maintain.

If we want use_args to do this internally, I think it's still a good idea to keep the errors separate, wrapped in a list, and not try to merge their contents together. The user can decide what to do with that information.

The best way I can think to support this "magically" (i.e. without the explicit collect wrapper) is the approach used in pallets/click in which you create a special attribute to store data. You then need to replace the function with a singular callable (click uses objects, but maybe just a function is fine) that knows how to handle that attribute -- so it's slightly tricker than the typical nested decorators case.

sirosen added a commit to sirosen/webargs that referenced this issue Sep 4, 2019
Fixes as the location loading code undergoes refactoring for the
flaskparser:
- Restore missing mimetype check for JSON payloads
- Treat form data as a multidict without extraneous AttributeError check
  (after testing, flask will return '{}' for form data when it is
  missing or empty -- just respect/allow this)
- req.headers is a multidict type as well
- fix some mistakes with new argument ordering

Changes to the base test app at src/webargs/testing.py of note:
- Because each route can only specify that it loads data from a single
  location (unless you pull extra tricks), several testing cases have
  had their routes modified, e.g. "/echo" -> "/echo_json"
- New test, test_parse_json_list_error_malformed_data added to check
  that webargs does not incorrectly treat non-list data as singleton
  lists when passing data to a schema with list fields. The previous
  behavior was questionable before, but with full data going to schemas
  it definitely becomes incorrect.
- Many tests which previously expected success with malformed data
  ignored now expect 422 errors. For example, sending a non-dict JSON
  value (e.g. `1`) should fail because it cannot be loaded by the
  schema. It should *not* be silently ignored because data was provided
  and failed to parse. This is a major behavior change for webargs.
- Reverse the decision on marshmallow-code#297 . After reading the referent material, I
  agree with the aiohttp maintainers in aio-libs/aiohttp#3302 . "" and
  "{}" should not be treated equivalently -- they are different inputs.
  Additionally, I foresee that supporting "" as "{}" will pose an issue
  if the proposal for "json_or_form" in marshmallow-code#419 is accepted.
  More detailed explanation is provided in the inline comment, and the
  test for this case is preserved with its expectations reversed

Supporting modifications were made in tests/apps/flask_app.py ,
including the addition of `ma.EXCLUDE` when the detected marshmallow
version is >= 3.

Minor fix in core: restore the cache clearing step. This is apparently
used by the flaskparser.
Possibly this only shows up under test usage? The main parse function
should always be cloning the parser before its cache could be
populated.
sirosen added a commit to sirosen/webargs that referenced this issue Sep 4, 2019
Stated goals in the form of CHANGELOG update

Gut significant chunks of webargs.core
- replace
    "parse_{location}" with "location_load_{location}"
    "locations=<collection>" with "location=<string>"
- remove
    "parse_arg"
    "get_value"
- add
    webargs.multidictproxy which uses a Schema to handle MultiDict types
    and behave similarly to webargs.core.get_value

Some random work on tests and flaskparser as a proving ground for this
implementation/approach, still really not working.
sirosen added a commit to sirosen/webargs that referenced this issue Sep 4, 2019
Fixes as the location loading code undergoes refactoring for the
flaskparser:
- Restore missing mimetype check for JSON payloads
- Treat form data as a multidict without extraneous AttributeError check
  (after testing, flask will return '{}' for form data when it is
  missing or empty -- just respect/allow this)
- req.headers is a multidict type as well
- fix some mistakes with new argument ordering

Changes to the base test app at src/webargs/testing.py of note:
- Because each route can only specify that it loads data from a single
  location (unless you pull extra tricks), several testing cases have
  had their routes modified, e.g. "/echo" -> "/echo_json"
- New test, test_parse_json_list_error_malformed_data added to check
  that webargs does not incorrectly treat non-list data as singleton
  lists when passing data to a schema with list fields. The previous
  behavior was questionable before, but with full data going to schemas
  it definitely becomes incorrect.
- Many tests which previously expected success with malformed data
  ignored now expect 422 errors. For example, sending a non-dict JSON
  value (e.g. `1`) should fail because it cannot be loaded by the
  schema. It should *not* be silently ignored because data was provided
  and failed to parse. This is a major behavior change for webargs.
- Reverse the decision on marshmallow-code#297 . After reading the referent material, I
  agree with the aiohttp maintainers in aio-libs/aiohttp#3302 . "" and
  "{}" should not be treated equivalently -- they are different inputs.
  Additionally, I foresee that supporting "" as "{}" will pose an issue
  if the proposal for "json_or_form" in marshmallow-code#419 is accepted.
  More detailed explanation is provided in the inline comment, and the
  test for this case is preserved with its expectations reversed

Supporting modifications were made in tests/apps/flask_app.py ,
including the addition of `ma.EXCLUDE` when the detected marshmallow
version is >= 3.
sirosen added a commit to sirosen/webargs that referenced this issue Nov 24, 2019
Stated goals in the form of CHANGELOG update

Gut significant chunks of webargs.core
- replace
    "parse_{location}" with "load_{location}"
    "locations=<collection>" with "location=<string>"
    "location_handler" with "location_loader"
- remove
    "parse_arg"
    "get_value"
- add
    webargs.multidictproxy which uses a Schema to handle MultiDict types
    and behave similarly to webargs.core.get_value

As a proving ground for this implementation/approach, get flaskparser
tests to pass under ma3

Fixes as the location loading code undergoes refactoring for the
flaskparser:
- Restore missing mimetype check for JSON payloads
- Treat form data as a multidict without extraneous AttributeError check
  (after testing, flask will return '{}' for form data when it is
  missing or empty -- just respect/allow this)
- req.headers is a multidict type as well
- fix some mistakes with new argument ordering

Changes to the base test app at src/webargs/testing.py of note:
- Because each route can only specify that it loads data from a single
  location (unless you pull extra tricks), several testing cases have
  had their routes modified, e.g. "/echo" -> "/echo_json"
- New test, test_parse_json_list_error_malformed_data added to check
  that webargs does not incorrectly treat non-list data as singleton
  lists when passing data to a schema with list fields. The previous
  behavior was questionable before, but with full data going to schemas
  it definitely becomes incorrect.
- Many tests which previously expected success with malformed data
  ignored now expect 422 errors. For example, sending a non-dict JSON
  value (e.g. `1`) should fail because it cannot be loaded by the
  schema. It should *not* be silently ignored because data was provided
  and failed to parse. This is a major behavior change for webargs.

Supporting modifications were made in tests/apps/flask_app.py ,
including the addition of `ma.EXCLUDE` when the detected marshmallow
version is >= 3.

After work on the relevant PR and trying things both ways, the
current handling of empty JSON bodies (interpreting them as {}) is being
kept externally, but changed internally. This relates to the decision
made in marshmallow-code#297 . Originally in the course of work on marshmallow-code#419, it seemed like
it might be necessary to change the behavior to make missing JSON data
behave as `missing` externally. However, making the change internal and
wrapping it to present the same external interface is best.

load_* methods are expected to return `missing` or {} when data is
missing for the given location. e.g. When a JSON body is "" it should be
`missing`. This is considered correct.
However, in terms of the external behavior of webargs, no data should be
interpreted as `{}`.

In order to achieve this, the wrapper which calls the various
"loader_func"s to load location data explicitly checks the result for
`missing` and converts it to `{}` if it is missing.

This makes it easy to use webargs to implement an API with optional JSON
data, but internally preserves the ability to detect a missing JSON
payload and distinguish it from a payload containing `{}`.
@lafrech lafrech unpinned this issue Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants