Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leave logic of deciding on extra args to schema #267

Closed
tuukkamustonen opened this issue Aug 8, 2018 · 23 comments
Closed

Leave logic of deciding on extra args to schema #267

tuukkamustonen opened this issue Aug 8, 2018 · 23 comments

Comments

@tuukkamustonen
Copy link
Contributor

tuukkamustonen commented Aug 8, 2018

Now that marshmallow 3.x is in beta and comes with configurable behavior on extra args, how about just including all fields in Parser._parse_request() and passing them on to Schema.load()? It would be up to passed Schema to determine what to do with them.

The effect would be that:

  • With RAISE, extra fields would be denied.
  • With INCLUDE, extra fields would be allowed and passed on.
  • With EXCLUDE, extra fields would be ignored.

This would allow something like:

@use_args(HeaderSchema(unknown=EXCLUDE), locations=('headers', ))
@use_args({'action': fields.Str()}, locations=('json',))  # default is RAISE

Actually, I don't think this even needs marshmallow 3.x. Why not just always pass all fields from _parse_request() to Schema.load()? On 2.x you would need custom @validates_schema implementation in addition, but on 3.x you would not.

Maybe I'm missing something...

@sloria
Copy link
Member

sloria commented Aug 8, 2018

Thanks for the suggestion @tuukkamustonen ; I think it's a good one. I don't think there's a [good] reason for _parse_request to filter on declared fields.

Would you be up for sending a PR?

@tuukkamustonen
Copy link
Contributor Author

tuukkamustonen commented Aug 8, 2018

I can try to come up with something (hopefully within a few days or so).

Edit: Turns out I bumped into other tickets/issues and this got delayed. Still hoping I (or anyone) can dig up time for it!

@lafrech
Copy link
Member

lafrech commented Nov 22, 2018

Related to #173.

@toonalbers
Copy link

toonalbers commented Jul 22, 2019

Hi all,

Finding all fields seems less trivial than I had hoped, but I have tried to think up a solution with minimal changes to the existing code. My idea was to have each parser provide a list of fields it knows at each location. Then _parse_request would still do what it currently does, but afterwards also parses values in the other discovered fields and adds them to the result.

Benefits of this approach:

  • No need to change large parts of the webargs codebase
  • Backwards compatible if we enable it through some option

Downsides to this:

  • I'm not sure if/how undeclared nested arguments would work since I use ma.fields.Raw as a default.
  • Using unknown=RAISE would break when using locations=('headers',) as you would get all included HTTP headers.

Would something like this be interesting to pursue? (I did not finish or test it.)

@sirosen
Copy link
Collaborator

sirosen commented Jul 31, 2019

From my perspective, this was very confusing. I read about marshmallow's support for unknown=RAISE only to find when I tried to use it that webargs isn't passing the full original data to the schema for validation.

I was looking at doing something for this based on the above, but I struggled a bit with it. I might try again, but I'd like to share the failing test I wrote in case it helps someone.

failing test with strict schema
diff --git a/tests/test_core.py b/tests/test_core.py
index 0df18bc..5767cc8 100644
--- a/tests/test_core.py
+++ b/tests/test_core.py
@@ -4,7 +4,8 @@ import mock
 import datetime

 import pytest
-from marshmallow import Schema, post_load, pre_load, class_registry, validates_schema
+from marshmallow import (Schema, post_load, pre_load, class_registry, validates_schema,
+                         RAISE)
 from werkzeug.datastructures import MultiDict as WerkMultiDict
 from django.utils.datastructures import MultiValueDict as DjMultiDict
 from bottle import MultiDict as BotMultiDict
@@ -944,6 +945,17 @@ def test_parse_basic(web_request, parser):
     assert result == {"foo": 42}


+def test_parse_with_strict_schema(web_request, parser):
+    class StrictSchema(Schema):
+        foo = fields.Int()
+
+    strict_schema = StrictSchema(unknown=RAISE)
+
+    web_request.json = {"foo": "42", "bar": "baz"}
+    with pytest.raises(ValidationError):
+        parser.parse(strict_schema, web_request)
+
+
 def test_parse_raises_validation_error_if_data_invalid(web_request, parser):
     args = {"email": fields.Email()}
     web_request.json = {"email": "invalid"}

@lafrech
Copy link
Member

lafrech commented Aug 23, 2019

Using latest webargs and marshmallow 3, I just got surprised by the fact that unknown parameters are ignored in top-level schema but trigger an error in nested schemas.

@sirosen
Copy link
Collaborator

sirosen commented Aug 23, 2019

@lafrech, can you give me an example to work with as a test case for #410 ?
I basically didn't touch the nested schema handling as it's a feature I'm not really familiar/comfortable with, but it seems logical that if webargs is going to have the option to set unknown=EXCLUDE (with that as the default), it should apply to nested schemas as well.

I suspect that the current code in #410 does not handle this correctly.

@lafrech
Copy link
Member

lafrech commented Aug 26, 2019

@sirosen here's a piece of code that illustrates what I reported. Just a quick draft, not a test I'd integrate as is. Hope it helps.

Put this somewhere in test_core.py:

def test_parser_nested_unknown(web_request):
    class MyNestedSchema(Schema):
        i = fields.Int()

    class MySchema(Schema):
        n = fields.Nested(MyNestedSchema)

    parser = MockRequestParser()
    web_request.json = {
        'n': {
            'i': 12,
            #'j': 12,
        },
        'o': 12,
    }
    parsed = parser.parse(MySchema, web_request)
    print(parsed)
    # {'n': {'i': 12}}

Uncomment the 'j': 12 line and you get

marshmallow.exceptions.ValidationError: {'n': {'j': ['Unknown field.']}}

while the 'o': 12 line, top-level, triggers no error.

Indeed, at the top level, webargs is responsible for picking the attributes, so it doesn't notice the unknown ones. Nested schemas are managed by marshmallow so things go as expected.

I didn't try the other way around (having unknown=EXCLUDE in the Nested fields).

It would be nice to let the marshmallow Schema do everything by passing it the request object, even proxified, rather than calling its top-level fields one by one, which duplicates the logic. I don't know if this is feasible. Might be a great refactor.

@lafrech
Copy link
Member

lafrech commented Aug 26, 2019

It would be nice to let the marshmallow Schema do everything by passing it the request object, even proxified, rather than calling its top-level fields one by one, which duplicates the logic. I don't know if this is feasible. Might be a great refactor.

At a quick glance, this is pretty hard to do right now. I think the main issue is that the location can be specified in each field so a schema can have different locations depending on the field.

I don't know if there is widespread use of this feature. In practice, if several locations are used, one could call use_args twice with separate arg maps, each with a single location.

How many times do we actually do this?

class InputSchema:
    q1 = ma.fields.Int(location="query")
    q2 = ma.fields.Int(location="query")
    h1 = ma.fields.Int(location="headers")
    c1 = ma.fields.Int(location="cookies")

Perhaps removing this feature would allow webargs to parse the request first and then feed each Schema the whole data it expects at once.

Or maybe I'm totally mistaken and this feature is a must have for a reason I'm missing. Apparently, apispec also uses the location field metadata.

In any case, it would be a huge rewrite.

@sloria
Copy link
Member

sloria commented Aug 26, 2019

It would be nice to let the marshmallow Schema do everything by passing it the request object, even proxified, rather than calling its top-level fields one by one, which duplicates the logic.

This seems like the right approach. I'm not entirely opposed to getting rid of the location argument on fields. It's a big breaking change, but might be worth the cleanup.

@sirosen
Copy link
Collaborator

sirosen commented Aug 26, 2019

I don't know if there is widespread use of this [location on fields] feature.

In my two months of using webargs, I haven't touched it. So clearly nobody uses it. 😜

Huge 👍 for the idea of webargs dropping this feature. I think it would be a net reduction in complexity, not just for webargs but also for users.

I think this might also require other changes, however. At a glance, Parser.parse(locations=...) doesn't seem very compatible with the idea of passing through request objects to the schemas.
I think the more desirable paradigm -- better than either the parser OR the fields specifying locations -- is to have a location-to-schema map somewhere. So that I can say something along the lines of

class FooSchema:
   foo_id = ma.fields.UUID()

class BarSchema:
    bar_id = ma.fields.Integer()

@app.route("/get_foo/<foo_id>")
@parser.use_args({"view_args": FooSchema()})
def get_foo(foo_id):
    ...

@app.route("/add_foo/<bar_id>")
@parser.use_args({"query": FooSchema(), "view_args": BarSchema()})
def add_foo_to_bar(bar_id, foo_id):
    ...

right now I'm doing this (and maybe there's a better way today) with things like

foo_id_path = parser.use_args(FooSchema(), locations=("view_args",))

@app.route("/get_foo/<foo_id>")
@foo_id_path
def get_foo(foo_id):
    ...

@sloria
Copy link
Member

sloria commented Aug 26, 2019

You could just have multiple use_args calls to meet the same use case , right?

@app.route("/add_foo/<bar_id>")
@parser.use_args(BarSchema(), locations=('view_args', ))
@parser.use_args(FooSchema(), locations=('query', ))
def add_foo_to_bar(bar_id, foo_id):

@lafrech
Copy link
Member

lafrech commented Aug 26, 2019

What about calling the decorator twice?

@app.route("/add_foo/<bar_id>")
@parser.use_args(FooSchema, locations=('query', ))
@parser.use_args(BarSchema, locations=('view_args', ))
def add_foo_to_bar(bar_id, foo_id):
    ...

Do you use the same schema so often it is worth declaring foo_id_path earlier in the file?

Edit: @sloria, great spirits meet.

@sirosen
Copy link
Collaborator

sirosen commented Aug 26, 2019

Sorry, I wrote a bad example. I bring up the locations argument because it's not clear how this usage should resolve:

@parser.use_args(FooSchema, locations=("query", "view_args"))
def get_foo(foo_id):
    ...

Does that mean that FooSchema must match all of query and all of view_args params? Which one actually gets passed as foo_id? Even if webargs handles it today, it's ambiguous to read.
I don't think this case, which exists today, needs to exist at all if the schemas are associated with locations.

That's why the notion of

@parser.use_args({"view_args": FooSchema})
def get_foo(foo_id):
    ...
# or
@parser.use_args(view_args_schema=FooSchema)
...

appeals as an interface. You specify "the schema for query params" and "the schema for view args" etc.

Cases like the above, with FooSchema and BarSchema, illustrate which I would expect is the norm -- separate schemas are used for separate locations in a given call, even if a single schema may be applied to different locations in different contexts.


Aside:

Do you use the same schema so often it is worth declaring foo_id_path earlier in the file?

Actually yes, but that's somewhat OT. 😄
I like having the schema + location association as part of my parsing tools, rather than demanding that the usage sites know about webargs usage at all.

@sloria
Copy link
Member

sloria commented Aug 26, 2019

Does that mean that FooSchema must match all of query and all of view_args params?

Yes, that's what that would do.

Even if webargs handles it today, it's ambiguous to read.
I don't think this case, which exists today, needs to exist at all if the schemas are associated with locations.

True. If multiple use_[kw]args calls are the way forward, then we should deprecate support for passing a collection (locations) in favor of a single location.

Can't say whether @parser.use_args(view_args_schema=FooSchema) is better or worse than multiple use_args calls. Would have to give it more thought.

@lafrech
Copy link
Member

lafrech commented Aug 26, 2019

I always use a single location. In fact, our framework (flask-rest-api) only takes a single location as input and make it a list to pass it to webargs' use_args.

I wouldn't mind removing the multi-locations feature and only taking a single location as input.

It could read:

    @parser.use_args(FooSchema, location="query")

with location defaulting to json, for instance.

I like the multiple use_args call syntax better but it could be because I'm used to it. I don't really mind. This is rather cosmetic.

However, if we went this way, I don't know what workaround we could propose to people who actually rely on the multi-locations feature to allow a parameter to be passed in different locations.

To put it another way, do we want to support this and what would it do?

    @parser.use_args(FooSchema, location="query")
    @parser.use_args(FooSchema, location="view_args")

Because as is

  • the inner would override the outer
  • the outer would fail on required fields that could be found in the inner

@sirosen
Copy link
Collaborator

sirosen commented Aug 26, 2019

location="query" and the like is fine by me; just to be clear that I'm not advocating strongly for something different. It's more similar to the current interface, so maybe that's a winning argument for it all on its own.

do we want to support this and what would it do?

    @parser.use_args(FooSchema, location="query")
    @parser.use_args(FooSchema, location="view_args")

Because as is

  • the inner would override the outer
  • the outer would fail on required fields that could be found in the inner

I think the fact that you need to ask that question -- "what would it do?" -- means it should not be supported.

If you want to try to load arguments from multiple locations, you could do this:

def myparse_wrapper(func):
    def decorated(*args, **kwargs):
        try:
            return parser.use_args(FooSchema, location="query")(func)(*args, **kwargs)
        except ValidationError:  # slightly imprecise, since this would capture `func` raising ValidationError
            return parser.use_args(FooSchema, location="view_args")(func)(*args, **kwargs)
    return decorated

Can (should?) webargs facilitate such usage by allowing you to specify try_locations=... with the meaning "try to parse each in order, catching validation errors, and raising the final exception if none work"?

@lafrech
Copy link
Member

lafrech commented Aug 27, 2019

I think the fact that you need to ask that question -- "what would it do?" -- means it should not be supported.

We're on the same line. All I'm saying is that not supporting this removes a feature with no real workaround.

Currently, using a Schema with multiple locations means for each field, try to fetch the data from one location then from the other if the first fail. Each field may come from any location.

The workaround you propose tries the whole Schema on a location then the whole Schema again on the second if the first fails validation. This is not equivalent. To propose something equivalent, you'd need to load partial from the first location, then the second, then merge both results. Then check required fields. It already sounds awful.

I can't really think of an API that would expect some inputs from several locations. I mean an API may accept a payload in json and query parameters in the same request, but I'd find it weird to build a loose API with a POST /users/ resource where user.age could be specified either in json or in query param, etc. Mixed location is fine, as long as each field can only come from one location.

In other words, I wouldn't mind dropping this feature and enforce a single location for a field. On one hand, people may be relying on it, but we don't know that. On the other hand, if someone wanted to add this feature, we'd probably reject that because it adds complexity for what is arguably a smelly edge case.

Overall, the proposal is

  • drop location argument to Field
  • only accept a single location in use_args

Whether we use nested calls to use_args for different locations or a single call with a {location: schema} mapping is just an interface choice.

@lafrech
Copy link
Member

lafrech commented Aug 27, 2019

Let's discuss this refactor in #419.

@lafrech
Copy link
Member

lafrech commented Jan 7, 2020

Fixed in #420.

@pzhang65
Copy link

pzhang65 commented Aug 27, 2021

Using latest webargs and marshmallow 3, I just got surprised by the fact that unknown parameters are ignored in top-level schema but trigger an error in nested schemas.

Is there a workaround for this issue? I'm using aiohttp-apispec which depends on webargs==5.x but I'm using marshmallow 3.x for all my argmaps (schemas). I want the same unknown param behavior as what I set in my schemas, but it only applies for nested fields.

@sirosen
Copy link
Collaborator

sirosen commented Aug 27, 2021

It's been 1.5 years since we did the 6.0 release. If a tool doesn't support 6.0 or later, it might no longer be maintained.

You could try subclassing the webargs 5.x parser and making the _parse_request method do a Schema.load. I'm sure that leaves all kinds of sharp corners, but it's the nearest thing I can think of after taking a quick look back at the 5.x code.

@pzhang65
Copy link

pzhang65 commented Aug 27, 2021

Wow wasn't expecting such a fast response. Thanks for the tip, I'll take a quick look at writing my custom parser but maybe it'll make more sense I try build on aiohttp-apispec to support webargs >=6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants