Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple schemas for same nested type in data class with marshmallow-dataclass > 8.5.3 #762

Open
DanielPettersson opened this issue Apr 28, 2022 · 1 comment

Comments

@DanielPettersson
Copy link

When generating a schema for a single data class referencing same nested type in multiple fields each field get a separate schema. Even though it is the same type. Though, the trick is that this only happens if you first get the schema for the type in a different call context.

Code to reproduce the bug:

from dataclasses import dataclass

import marshmallow_dataclass
from apispec import APISpec
from apispec.ext.marshmallow import MarshmallowPlugin


@dataclass
class Foo:
    s: str

@dataclass
class Bar:
    a: Foo
    b: Foo


def get_marshmallow_schema():
    marshmallow_dataclass.class_schema(Bar)()

# When this function is called the bug is triggered. Commenting out this line makes the bug not happen.
get_marshmallow_schema()


spec = APISpec(
    title="Test schema",
    version="0.0.1",
    openapi_version="3.0.3",
    plugins=[MarshmallowPlugin()],
)

spec.components.schema("bar", schema=marshmallow_dataclass.class_schema(Bar))

print(spec.to_yaml())

Running the above code will output the warning from marshmallow plugin:
UserWarning: Multiple schemas resolved to the name Foo. The name has been modified. Either manually add each of the schemas with a different name or provide a custom schema_name_resolver.

And the schema for Bar will look like:

properties:
        a:
          $ref: '#/components/schemas/Foo1'
        b:
          $ref: '#/components/schemas/Foo'
      required:
      - a
      - b
      type: object
@gabor-akeero
Copy link

Found the problem, but the reasoning is long, so first the solution 😁

In short:
Calling class_schema multiple times on the same dataclass is re-defining the schema class again and again, and some caching is involved, so OpenAPIConverter sometimes gets the same schema (eg. marshmallow.schema.Foo) and it's OK, other times it gets a re-created one and doesn't recognise it, treats it as a new class with the same name, and generates a new unique name for it.

Solution:
Never call class_schema twice on the same dataclass, preferably use the dataclass decorator of marshmallow-dataclass which generates the schema class once and stores it as Foo.Schema and Bar.Schema:

from typing import ClassVar, Type
from marshmallow import Schema
# NOTE: we import `dataclass` from marshmallow_dataclass and NOT from dataclasses
from marshmallow_dataclass import dataclass

from apispec import APISpec
from apispec.ext.marshmallow import MarshmallowPlugin

@dataclass
class Foo:
    Schema: ClassVar[Type[Schema]] = Schema
    s: str

@dataclass
class Bar:
    Schema: ClassVar[Type[Schema]] = Schema
    a: Foo
    b: Foo

@dataclass
class Baz:
    Schema: ClassVar[Type[Schema]] = Schema
    c: Foo
    d: Foo


def get_marshmallow_schema():
    return Bar.Schema

# When this function is called the bug is no longer triggered.
get_marshmallow_schema()
# even if we instantiate the schema
get_marshmallow_schema()()

spec = APISpec(
    title="Test schema",
    version="0.0.1",
    openapi_version="3.0.3",
    plugins=[MarshmallowPlugin()],
)

spec.components.schema("bar", schema=Bar.Schema)

print(spec.to_yaml())

And now the long story for those who are interested:

The immediate reason for duplicating the output schemas is than whenever you call marshmallow_dataclass.class_schema(Foo)(), marshmallow-dataclass will re-define the class marshmallow.schema.Foo, so although these are identical, they are new instances of marshmallow.schema.SchemaMeta, so their hash(...) result is different, so technically they are different.

When apispec's OpenAPIConverter is converting dataclasses to OpenAPI schema names, it maintains a lookup table whose key is a tuple of (schema class, modifier1, mod2, etc.) and the corresponding value is the OpenAPI output schema name.

So when the schema marshmallow.schema.Foo changes, and then it's passed to resolve_nested_schema again, it won't be found it in the lookup table, but it's assumed to be a new class (with the same name), so a new unique name is generated, and so on.

Exactly this is happening when you're doing class_schema(Bar), because that creates the two sub-schemas for the two Foo members, and this is why you see a different schema ref for a and b.

This explains why repeated use of class_schema doesn't work with the OpenAPI plugin. (In fact, one piece is still missing, but I'll get to that soon.)

Most probably you're asking why it didn't fail without that function call 😁.

Well, marshmallow-dataclass memoizes the class creation, meaning that if you try to invoke that _internal_class_schema(...) function with the same arguments as previously, then it won't be executed, but its previous return value is just dug up from a cache, and returned immediately.

So if you try to re-create the same class in the same context, then you're not actually re-creating it but get the same result, which is precisely what the OpenAPI plugin (and anyone else) expects.

So the problem would happen every time you call class_schema(...) more than once for the same dataclass, just that cache prevents it in certain circumstances.

Now that final missing piece: What are these circumstances? When hashes of the arguments of _internal_class_schema(...) are the same as before, and these arguments are:

  • the name of the dataclass
  • the base schema from which the new schema should be derived from
  • ... and the calling frame, that is, from where class_schema(...) is being executed

The hash of the string and the base schema is straightforward: same values yield the same hash, different values usually yield different hashes.

Calling frames, on the other hand, are implemented in CPython, their hash is their memory address, so whenever a new traceback is created, it'll yield a new hash, and whenever one is re-used, it'll yield the same 😞.

But one function call has one traceback, so when at the end of your script you're executing

spec.components.schema("bar", schema=marshmallow_dataclass.class_schema(Bar))

then the calling frame is that exact expression schema=..., so when creating the sub-schemas for the two embedded Foo-s, all three abovementioned args will be the same: the dataclass name is "Foo", the base schema is None, and the calling frame is the same.

So the first sub-schema is created normally, then at the second the LRU cache kicks in, and returns the same sub-schema again.

One nasty surprise: Try replacing that get_marshmallow_schema() call with directly executing marshmallow_dataclass.class_schema(Bar)(), most probably it will work.
The reason is that this call:

  • Starts creating the schema for Bar
  • Creates the sub-schema for the 1st Foo
  • When creating the sub-schema for the 2nd Foo, the call frame is the same, so it re-uses the 1st one from the cache
  • And when at the end you again create a schema for Bar (in that specs.components.schema(...)), the call frame will probably be re-used, so it doesn't even re-creates Bar, but uses it from the cache
  • So OpenAPIConverter will recognise it. But that's just by coincidence 😞 ...

What we can and need to do is avoid re-creating the schema classes, so NEVER call class_schema twice on the same dataclass.

You may either call it once, like right after the dataclass declaration:

FooSchema = class_schema(Foo)
BarSchema = class_schema(Bar)
# and later in that function instantiate the schema if you need to:
    ...
    BarSchema()

Or you may use the marshmallow_dataclass.dataclass decorator, which is the same as dataclasses.dataclass, just defines the schema class once and injects it as Foo.Schema and Baz.Schema.

And that's how we got the example at the top 😁.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants