Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only allow users to specify a single location per parse call & pass full location data to schema.load #420

Merged
merged 13 commits into from Jan 3, 2020

Conversation

sirosen
Copy link
Collaborator

@sirosen sirosen commented Sep 4, 2019

This is still incomplete, but core and flaskparser tests are now passing. No work is (yet) included on other concrete parsers, but I fixed the base asyncparser to make mypy linting pass.

Resolves #419, #267, #164, #268
Obsoletes, and therefore closes, #410

I've updated the changelog with a basic "statement of intent". As I work on this, I will try to maintain it as an accurate record of how this change should look to the outside world. Obviously, narrative docs also need to be updated with relevant info.

Highlights / greatest hits:

  • Marks this as an unreleased v6.0.0 in the changelog
  • Replace DEFAULT_LOCATIONS with DEFAULT_LOCATION="json"
  • Replace parse_{location} with location_load_{location}
  • Replace locations=<iterable> with location=<str>
  • Remove locations=... from Fields
  • Remove Parser.parse_arg
  • Remove webargs.core.get_value
  • Add webargs.multidictproxy.MultiDictProxy which has get_value-like behavior and proxies various dict-like objects
  • Reverse the decision made in aiohttpparser: Fix 500 error with JSON content-type and empty body #297 with a comment in tests explaining this change and a bit of similar explanation in the commit message
  • Numerous fixes to tests to be more explicit about which location is being used (since you can't trivially have a single location /echo which loads data from a bunch of locations)
  • As a side-effect of the above changes, certain inputs which were previously ignored are treated as errors (e.g. 1 for a JSON body will now parse and be passed literally to the schema, which will fail because 1 is not a valid input for schema.load). Changes to the tests serve as a somewhat-readable list of behavioral changes.

To try to keep this from sprawling even more, my intent is to address some items mentioned in #419 either in follow-up PRs or as part of this only if we're happy with this direction (and agree on those items). In particular: the json_or_form location I proposed and nested use_args usage and error collection.

@sloria, @lafrech, my biggest question is: do you like where this is going? Should I just hammer out all of the other parsers?

Copy link
Member

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just drive-by's for now. Good work! Thanks for being so thorough with the docs.

I'll give this a proper lookover when I have more time to review.

CHANGELOG.rst Outdated Show resolved Hide resolved
CHANGELOG.rst Outdated Show resolved Hide resolved
src/webargs/asyncparser.py Show resolved Hide resolved
src/webargs/flaskparser.py Outdated Show resolved Hide resolved
src/webargs/multidictproxy.py Outdated Show resolved Hide resolved
src/webargs/multidictproxy.py Outdated Show resolved Hide resolved
src/webargs/testing.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
src/webargs/flaskparser.py Outdated Show resolved Hide resolved
src/webargs/multidictproxy.py Outdated Show resolved Hide resolved
src/webargs/multidictproxy.py Outdated Show resolved Hide resolved
src/webargs/asyncparser.py Show resolved Hide resolved
CHANGELOG.rst Outdated Show resolved Hide resolved
Copy link
Member

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed this a bit more. I'm basically good with the approach here.

I cut 5.5.0, so you'll need to update with dev. Feel free to merge instead of rebase; I intend to squash and merge when this is all done, if you don't mind.

I also made a 5.x-line branch so that we can maintain support for 5.x for some time after we release 6.0.

src/webargs/flaskparser.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
src/webargs/asyncparser.py Show resolved Hide resolved
src/webargs/core.py Show resolved Hide resolved
src/webargs/multidictproxy.py Show resolved Hide resolved
tests/apps/flask_app.py Outdated Show resolved Hide resolved
tests/test_flaskparser.py Outdated Show resolved Hide resolved
@sloria
Copy link
Member

sloria commented Sep 9, 2019

Should we rename @location_handler to @location_loader for consistency with the load_* naming?

Just a thought: doesn't need to happen in this PR.

@sirosen
Copy link
Collaborator Author

sirosen commented Sep 10, 2019

Should we rename @location_handler to @location_loader for consistency with the load_* naming?

I don't mind changing it or keeping it the same. There isn't a huge difference between these names.

One advantage of renaming is that it would immediately fail for webargs v5 users who are using @location_handler today and upgrade. That might be reason enough to make the change.

@sloria
Copy link
Member

sloria commented Sep 10, 2019

Up to you. Feel free to do the location_loader rename, or we can do it in a follow-up PR.

@sirosen
Copy link
Collaborator Author

sirosen commented Sep 10, 2019

Up to you. Feel free to do the location_loader rename, or we can do it in a follow-up PR.

I'll probably do it in here, since it's already a large, breaky change.


Something that I've started to realize poses an issue: the way that I've been writing things, such that webargs treats an empty JSON payload as missing, actually makes the behavior more confusing and non-uniform.

The issue is that json is the only location which can behave this way. No query params or an empty form post needs to be interpreted as {}.

I'm working on a refactor now in which load_json will still return missing, but _load_location_data (the wrapper which gets the loader func and calls it) will check if the result is missing and return {} if it is.
This lets webargs can still pass around information about the presence or absence of data, but when data is being handed off to Schema.load it needs to be "normalized" for consumption.

Part of the goal is to ensure it's easy to add something like

def load_json_or_form(self, req, schema):
    data = self.load_json(req, schema)
    if data is missing:
        data = self.load_form(req, schema)
    return data

and I think this achieves that.

This will have a public-facing impact on how webargs v6 behaves, but I think generally for the better. It will make it easier to use webargs in APIs with optional request bodies.
So expect it to come with numerous changes to tests' expectations around posting empty data to /echo_json endpoints.

@sirosen
Copy link
Collaborator Author

sirosen commented Sep 10, 2019

Er, to clarify, I think the above is important to be very clear and open about because it constitutes another reversal on #297 (back to the current behavior)

The first version of this PR changed the decision on that saying that "" is invalid JSON.
I thought this change would be necessary for implementing json_or_form, and it kind of appeals as more pedantically correct. But as I work on things I realize that this has negative impact on webargs' usability, and it's not as necessary as I thought if we distinguish between the value returned by load_json and the value seen by Schema.load.

One of the big cases that changes my mind is the idea of an endpoint with an optional JSON body.
There are a number of behaviors currently accepted for a missing request body, and it's not really obvious what should be allowed on this front.

@sirosen
Copy link
Collaborator Author

sirosen commented Sep 14, 2019

As of f005c40 , this is finally be passing tests, as all of the parser conversions are finished. And I've just merged dev back in again to get this up to date.

This is what I'd consider the very minimum to be safe to merge. But I still plan to do two more things before removing WIP from the PR title:

  • Update narrative docs with new usage
  • Add some explicit testing of various unknown=... behaviors

Copy link
Member

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Let me know when you're confident in merging this.

src/webargs/aiohttpparser.py Outdated Show resolved Hide resolved
@sirosen sirosen changed the title [WIP] Only allow users to specify a single location per parse call & pass full location data to schema.load Only allow users to specify a single location per parse call & pass full location data to schema.load Sep 24, 2019
@sirosen
Copy link
Collaborator Author

sirosen commented Sep 24, 2019

I've just removed [WIP] from the title because I think this is ready.
I've made changes to the narrative docs (more content was removed than added; I am sorry), and added some more tests.

As an important final change, I was trying to add json_or_form as a location -- to validate that the approach I have for that will work -- and my testing revealed some non-uniform treatment of empty JSON payloads between different frameworks.

I've added b8d3fee , which makes the decision to check content-type on the frameworks where this was not done (django and pyramid) and return missing if it is non-JSON.

docs/quickstart.rst Outdated Show resolved Hide resolved
@sloria
Copy link
Member

sloria commented Sep 30, 2019

Great. I probably won't be able to review this until this weekend at the earliest. Thanks for all your work on this.

@sirosen
Copy link
Collaborator Author

sirosen commented Oct 2, 2019

Just as a short FYI: I'm going to be taking a couple of weeks off, and likely won't be able to respond to any feedback for the next two weeks or so anyway. So please don't worry if you write some comments and don't see a reply -- this isn't abandoned, and I'm still invested in seeing it through.

@sirosen
Copy link
Collaborator Author

sirosen commented Oct 2, 2019

In light of the fact that it motivates b8d3fee , I've added the json_or_form location to this branch. The former commit doesn't make as much sense without the latter.

@lafrech
Copy link
Member

lafrech commented Nov 13, 2019

Thank you so much @sirosen for tackling this and sorry for the so long delay.

I've tried to port flask-smorest to your webargs branch and I only had two adaptations to make:

  • replace locations=[location] with location=location in a call to use_args
  • add location='query' to a call to parser.parse

Those are known breaking changes. My point is that I didn't discover any issue.

Then I tried to port the application we're working on. I had to

  • fix NestedQueryArgsParser. We copied it from webargs docs and since you also fixed the docs we just had to copy your new version. Great.
  • add missing content_type='application/json' in some test functions

Again, nothing surprising.

I shall now actually review the code, but at least it proved functional on our real-life cases.

Thanks again.

@lafrech lafrech added this to the 6.0.0 milestone Nov 13, 2019
@lafrech lafrech mentioned this pull request Nov 13, 2019
2 tasks
@sirosen
Copy link
Collaborator Author

sirosen commented Nov 13, 2019

I'll take some time soon (hopefully next few days) to revisit the changelog and make sure it's accurate. In particular

add missing content_type='application/json' in some test functions

made me realize that two parsers have changed (the DjangoParser and PyramidParser) to require Content-Type where they did not before. That's currently missing from the changelog.


Anyway, please don't worry about taking time with this! I more than understand how hard it can be to get time to work on open source projects and properly review big PRs. 🙂

@lafrech
Copy link
Member

lafrech commented Nov 13, 2019

Thanks.

I'm using Flask, BTW, so I guess this also changed on FlaskParser.

@lafrech
Copy link
Member

lafrech commented Nov 22, 2019

I could be wrong but here's my understanding:

  • In the Flask client, if I forget content_type='application/json', the payload is sent as form data.
  • webargs used to check several locations by default and now it only checks json.

So the breaking change would only be the default location checked for input.

@sirosen
Copy link
Collaborator Author

sirosen commented Nov 23, 2019

I just amended the changelog for 6.0 based on a reread of the changes. Listed as a bugfix (because it makes webargs internally consistent), several parsers now require Content-Type to be set for JSON payloads. Since anything matching application/(.*\+)?json will do, I didn't specify exactly what it must be set to.

FlaskParser was one of the parsers with changes. Even if you send it JSON data, if you don't sent the Content-Type header it will not be parsed as JSON.
That jives with what you're saying you observed, so I think all is well.


There are new changes in dev, but they didn't merge back into this branch smoothly (quickly "fixing" the conflicts produced broken tests). I'll have to look more closely at the changes to understand what went wrong.

This is early documentation of planned changes, for this branch of work.

Note the change from `locations=...` to `location` in
Parser.use_[kw]args methods, and that fields may no longer specify a
location. Additionally, detail that schemas now load all data from a
location, not only those for which there are matching fields.

The changelog explains how to rewrite a schema which loads from multiple
locations into multiple `use_[kw]args` calls, and how to use `unknown`
to get webargs v5-like behavior.
Stated goals in the form of CHANGELOG update

Gut significant chunks of webargs.core
- replace
    "parse_{location}" with "load_{location}"
    "locations=<collection>" with "location=<string>"
    "location_handler" with "location_loader"
- remove
    "parse_arg"
    "get_value"
- add
    webargs.multidictproxy which uses a Schema to handle MultiDict types
    and behave similarly to webargs.core.get_value

As a proving ground for this implementation/approach, get flaskparser
tests to pass under ma3

Fixes as the location loading code undergoes refactoring for the
flaskparser:
- Restore missing mimetype check for JSON payloads
- Treat form data as a multidict without extraneous AttributeError check
  (after testing, flask will return '{}' for form data when it is
  missing or empty -- just respect/allow this)
- req.headers is a multidict type as well
- fix some mistakes with new argument ordering

Changes to the base test app at src/webargs/testing.py of note:
- Because each route can only specify that it loads data from a single
  location (unless you pull extra tricks), several testing cases have
  had their routes modified, e.g. "/echo" -> "/echo_json"
- New test, test_parse_json_list_error_malformed_data added to check
  that webargs does not incorrectly treat non-list data as singleton
  lists when passing data to a schema with list fields. The previous
  behavior was questionable before, but with full data going to schemas
  it definitely becomes incorrect.
- Many tests which previously expected success with malformed data
  ignored now expect 422 errors. For example, sending a non-dict JSON
  value (e.g. `1`) should fail because it cannot be loaded by the
  schema. It should *not* be silently ignored because data was provided
  and failed to parse. This is a major behavior change for webargs.

Supporting modifications were made in tests/apps/flask_app.py ,
including the addition of `ma.EXCLUDE` when the detected marshmallow
version is >= 3.

After work on the relevant PR and trying things both ways, the
current handling of empty JSON bodies (interpreting them as {}) is being
kept externally, but changed internally. This relates to the decision
made in marshmallow-code#297 . Originally in the course of work on marshmallow-code#419, it seemed like
it might be necessary to change the behavior to make missing JSON data
behave as `missing` externally. However, making the change internal and
wrapping it to present the same external interface is best.

load_* methods are expected to return `missing` or {} when data is
missing for the given location. e.g. When a JSON body is "" it should be
`missing`. This is considered correct.
However, in terms of the external behavior of webargs, no data should be
interpreted as `{}`.

In order to achieve this, the wrapper which calls the various
"loader_func"s to load location data explicitly checks the result for
`missing` and converts it to `{}` if it is missing.

This makes it easy to use webargs to implement an API with optional JSON
data, but internally preserves the ability to detect a missing JSON
payload and distinguish it from a payload containing `{}`.
If an empty body is submitted to a webargs parser, it should always be
checked for content-type. Otherwise, the behavior framework-to-framework
for webargs will not be consistent.

Namely: djangoparser and pyramidparser now check content-type for JSON
and return missing when an emtpy/missing payload is used. This makes the
detection/use of JSON payloads more consistent and is necessary for
`json_or_form`.
'json_or_form' is defined as first trying to load the JSON data, then
falling back to form data.

Adds a test to the testsuite for this which tries sending JSON, tries
sending Form, and tries sending no data, verifying that we get the
correct result in each case.

This does *not* try to support more elaborate behaviors, e.g. multipart
bodies, which probably will work variably well or poorly based on the
framework in use. In these cases, the content type will not detect as
JSON (because it is 'multipart'), and we'll failover to using form data.
This is acceptable, since anyone using `json_or_form` is expecting to
allow form posts anyway.

Docs cover not only the new json_or_form location, but also the *idea*
of a "meta location" which combines multiple locations. This is worth
documenting in the advanced docs since it is the way to get back certain
behaviors from webargs v5.x
This is not a deep and comprehensive rewrite which aims to
discuss/explain the functionality which is now available. Rather, this
change merely takes everything in the docs which has become inaccurate
and trims or modifies it to be accurate again.

Add notes to changelog about making parsers more consistent about
checking Content-Type for JSON payloads.
@sirosen
Copy link
Collaborator Author

sirosen commented Nov 24, 2019

I've just rebased and pushed a cleaned-up version of this branch which works against dev. (I tried doing this merge-back thing, but it's just not how I'm used to using git. The commit history becomes harder to navigate.)

My attempt to handle this by merging things yesterday failed because I did not properly incorporate #428 . Because this PR introduces the idea of a shared load_json handler in the core parser, that needed to be updated as well.

@lafrech
Copy link
Member

lafrech commented Nov 24, 2019

Thanks.

Rebase is better. Sometimes it is harder to achieve (too many changes and too many commits in the branch to rebase), so a merge is acceptable. If rebase is easier for you, then that's just great.

I'd really like to merge this. I'll try to review this as soon as I can. Can't promise when.

Copy link
Member

@lafrech lafrech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I made a few comments.

I didn't review the async parsers.

I lean towards releasing as 6.0 rather than as a 6.0b pre-release.

docs/quickstart.rst Outdated Show resolved Hide resolved
src/webargs/core.py Show resolved Hide resolved
src/webargs/core.py Show resolved Hide resolved
src/webargs/multidictproxy.py Show resolved Hide resolved
src/webargs/multidictproxy.py Show resolved Hide resolved
src/webargs/falconparser.py Show resolved Hide resolved
Copy link
Collaborator Author

@sirosen sirosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to get to the review feedback this past weekend, but it didn't happen. I'll try to find a good time for it soon. There's one item where I had a question to make sure I correctly understand what's wanted.

docs/quickstart.rst Outdated Show resolved Hide resolved
src/webargs/falconparser.py Show resolved Hide resolved
src/webargs/multidictproxy.py Show resolved Hide resolved
src/webargs/multidictproxy.py Show resolved Hide resolved
1. Fix a typo in the quickstart doc

Rewrite "content_type" to "user_type". Just a slip-up.

2. Flatten out the nesting in MultiDictProxy.__getitem__

Although this was flagged as an `else-return` which could flatten
out to just a return, Jérôme also suggested a version which treats the
outer condition as an `if ...: return` so that we can dedent the rest
of the logic.
The end-result is easier to read, so let's go with that.

3. Add a test for passing a single value to a List field

This applies to all parsers, but it was required by FalconParser in
particular. If you remove MultiDictProxy from the presentation of
query params, headers, etc. from FalconParser, it works in most cases.
However, because field values only get wrapped in lists by that parser
if there are multiple values supplied, we need some intermediary to
understand that lookups for List fields always need a `list`-type
value. MultiDictProxy serves that purpose, as a schema-aware dict type.
Copy link
Collaborator Author

@sirosen sirosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit with fixes for the several issues found in the last review.
I'm going to do a final double-check on everything and (hopefully) mark all of the outstanding comments as resolved.

docs/quickstart.rst Outdated Show resolved Hide resolved
src/webargs/multidictproxy.py Show resolved Hide resolved
src/webargs/falconparser.py Show resolved Hide resolved
Previously, `load_json` carefully avoided putting `missing` into the
`Parser._cache["json"]` storage location. However, we're not clear (at
least, in the context of marshmallow-code#420 ) on why we should avoid caching
`missing` values. No tests fail, so change this to be simpler logic in
which any "successful" JSON parsing, including those which failover to
`missing`, will be cached.
Refer to the `__location_map__` attribute, and expand the list of
supported locations to be complete (add `query` and `json_or_form`).
@sirosen
Copy link
Collaborator Author

sirosen commented Dec 16, 2019

Okay, I missed a couple of small things, so I handled those in follow-up commits.

I'm excited about merging this in part because I want to get cracking on the changes to @use_args I pitched in #419 for us to review. 😄

@lafrech
Copy link
Member

lafrech commented Dec 20, 2019

Great.

@sloria, should we release a beta to give frameworks time to adapt to the changes?

Then we'd release a final 6.0.0 with Python 2 dropped.

@sloria
Copy link
Member

sloria commented Jan 2, 2020

Haven't had time to review the latest changes, so I'll defer to @lafrech on merging this. Releasing a 6.0.0b1 makes sense to me

@lafrech lafrech merged commit df66fe8 into marshmallow-code:dev Jan 3, 2020
@lafrech
Copy link
Member

lafrech commented Jan 3, 2020

Alright. Merging. I'll try to release a beta version today.

@lafrech
Copy link
Member

lafrech commented Jan 3, 2020

@sloria, I just got this error again.

remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
remote: error: GH006: Protected branch update failed for refs/heads/dev.
remote: error: Required status check "marshmallow-code.webargs" is expected.
To ssh://github.com/marshmallow-code/webargs
 ! [remote rejected] dev -> dev (protected branch hook declined)
error: failed to push some refs to 'ssh://git@github.com/marshmallow-code/webargs'

Am I missing privileges on this repo?

@sloria
Copy link
Member

sloria commented Jan 5, 2020

@lafrech I just gave you admin privs. Can you try again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Allow a single location for each field
3 participants