Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marshmallow’s Nested fields with a restricted schema steal the full schema’s registry spot #591

Open
flying-sheep opened this issue Aug 10, 2020 · 13 comments

Comments

@flying-sheep
Copy link

flying-sheep commented Aug 10, 2020

If we e.g. have a marshmallow Schema like this:

from marshmallow_sqlalchemy import fields, SQLAlchemyAutoSchema

class FooSchema(SQLAlchemyAutoSchema):
	parent = fields.Nested("BarSchema", only=("id", "name"))

and call spec.components.schema('Foo', schema=FooSchema), then “BarSchema” gets registered as having only the fields id and name, and subsequent tries to register the real thing fail.

The problem lies here:

name = get_unique_schema_name(self.spec.components, name)

As that function somehow doesn’t give us something useful. Also it should first modify names for schemas where schema.only is not None before it modifies schemas where schema.only is None.

@Bangertm
Copy link
Collaborator

In most cases you should not rely on get_unique_schema_name to provide names for schemas using modifiers. This function is provided as a fall back to ensure that the generated spec is valid - even if it is not particularly readable. As the warning indicates the recommended solution is either:

  1. add the modified and unmodified versions of BarSchema prior to adding FooSchema
  2. provide a custom schema_name_resolver function to provide appropriate names for the modified and unmodified versions of BarSchema - this is where logic for modifying the names based on schema.only could reside

@flying-sheep
Copy link
Author

As I explained, option 1 isn’t possible, as spec.components.schema automatically adds all referred-to schemas transitively.

A custom schema_name_resolver does work, but should one have to rely on one for basic functionality? I’d think that stuff should just work without crashing with a inscrutable error message leading to a lot of debugging.

@lafrech
Copy link
Member

lafrech commented Jul 8, 2021

Does it work if you register mini bar schema like this first?

spec.components.schema("MiniBarSchema", BarSchema(only=("id", "name")))

@lafrech
Copy link
Member

lafrech commented Jul 8, 2021

Also, if you register BarSchema (real one) first, you should only get a warning, not an error, when the mini one is auto-registered.

@flying-sheep
Copy link
Author

flying-sheep commented Jul 8, 2021

Even if it does, that means figuring out for which schemas I need the mini one beforehand, instead of being able to rely on auto registering:

My suggestion still stands:

  1. The message should be clearer
  2. There should be a way to easily configure name generation to avoid things like this

@lafrech
Copy link
Member

lafrech commented Jul 8, 2021

There should be a way to easily configure name generation

That is schema_name_resolver. It is a somewhat advanced feature, indeed. But apispec can't reasonably define automatic names for modified schemas.

@flying-sheep
Copy link
Author

OK, so how about we

  1. remember which schema-configuration combination is registered for which name
  2. if a collision happens we show a nice error message describing the problem and mentioning that solving it involves either pre-registering all schemas before any auto registering happens or schema_name_resolver

Helpful error messages are always better than having to jump into a debugger and googling some inscrutable message

@Sinclert
Copy link

Sinclert commented Nov 17, 2021

Hey there 👋🏻

Context:

Just stumbled into this issue as I am encountering the same problem that @flying-sheep described. I think he explained it perfectly in his original message, but just to reiterate:

Problem:

When users use the spec.components.schema function to register schemas into an APISpec object, the MarshmallowPlugin (provided by this package) automatically registers nested schemas, making the user-level schema registration to fail when manually registering a schema that was defined as "nested" on a previous registered one.

One could say: "Ok then, do not register a schema that has already been register, duh". Sadly, there are use cases where schemas are defined within a list, and the short-and-nice registration loop fails... 😞

Example:

from apispec.core import APISpec
from apispec.ext.marshmallow import MarshmallowPlugin
from marshmallow import Schema
from marshmallow import fields


class Children(Schema):
    children_id = fields.String(required=True)


class Parent(Schema):
    parent_id = fields.String(required=True)
    children = fields.Nested(
        required=True,
        nested=Children,
        many=True,
    )


# NOTE:
# This structure could be handy when trying to register all project-defined schemas,
# probably to be defined on a folder's __init__.py module, for instance.
all_schemas = [
    Parent,
    Children,
    # ...
]

spec = APISpec(
    title="Example",
    version="0.1.0",
    openapi_version="3.0.2",
    plugins=[MarshmallowPlugin()],
)

# Fails...
for schema in all_schemas:
    spec.components.schema(schema.__name__, schema=schema)

Proposal:

I can think of two solutions, either:

  • To expose a parameter to avoid the registration of nested schemas (within MarshmallowPlugin?).
  • To remove this DuplicateComponentNameError raise, and just return when the schema is already registered.

@Sinclert
Copy link

Just to clarify (given the age of the issue creation): this is still a problem on the current version (5.1.1).

@Bangertm
Copy link
Collaborator

@Sinclert To avoid registering nested schemas, you can pass a schema_name_resolver that always returns None. If you do this you cannot have circular references in your schemas. My opinion is that defining nested schemas in line is more confusing for your users than defining them as references so I would simply catch the errors.

@Sinclert
Copy link

Thanks for the quick reply!

I see... that seems to avoid the automatic registration of nested schemas ✅

Not sure how to avoid people stepping into this confusion again though. I reviewed both the Nested schemas documentation, and the MarshmallowPlugin parameters section and I could not find any reference to "in order to avoid the automatic registering of nested schemas, please provide a None-returning schema_name_resolver function". Indeed, documentation specifies exactly the opposite:

  • Here in the Nested schemas documentation.
  • Here in the MarshmallowPlugin parameters section.

@Bangertm could you include what you said on your previous message somewhere in the docs? I would also consider closing this issue so people see that there is actually an official way to solve it.

@Bangertm
Copy link
Collaborator

The documentation under the Nested Schema documentation you reference includes this sentence:

If the schema_name_resolver function returns a value that evaluates to False in a boolean context the nested schema will not be added to the spec and instead defined in-line.

Can you suggest a way to clarify further?

@Sinclert
Copy link

Maybe another "❕ NOTE" block 🤷🏻‍♂️

In order to deactivate the automatic registration of nested schemas, and avoid registration duplicates, consider providing a None returning function to the MarshmallowPlugin schema_name_resolver initialization argument.

Example:

MarshmallowPlugin(schema_name_resolver=lambda schema_class: None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants