New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate ancillary classes to Pydantic Models #60
Conversation
Codecov Report
@@ Coverage Diff @@
## golden #60 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 16 +1
Lines 567 583 +16
Branches 71 72 +1
=========================================
+ Hits 567 583 +16
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks neat so far! I think my main comment here is the one about abc
.
One other syntax thing - I think I recall there being some way to make it possible to instantiate pydantic models using positional args instead of kwargs, I think if you used the pydantic.dataclass
syntax. But then I believe there was a tradeoff about how you had to specify config options (like the extra
thing you did). Try to play around with it! Not a necessity but might be nice if it starts to feel like a headache to kwarg everything
src/arti/internal/models.py
Outdated
MODELS_REQUIRING_SUBCLASS = [] | ||
|
||
|
||
def requires_subclass(cls: type) -> type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a little while to realize this was being used as a decorator, a comment would be nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know what, now that I'm seeing about what this mechanism is trying to accomplish, I think it'd be possible to do with abc
's abstract classes. It's a bit more of a standard mechanism, have you gotten to look at it? I'm 95% sure that I have used it in combination with pydantic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good idea w/ abc
! Two questions (may experiment in a bit):
- the
requires_subclass
ones are intended to be "dynamic" (setting arbitrary attributes) so won't have anything to mark abstract here. 😅 Does subclassing ABC alone prevent direct instantiation or what do you think there? - do you remember getting a metaclass conflict w/ pydantic+ABC? Think I could resolve w/ a custom metaclass if so, but might be slightly messy
If ABCs don't work cleanly, another option I thought about was an _abstract
class attr flagging whether to allow/disallow instantiation and adding an __init_subclass__
that sets to "allow" if not set explicitly on each subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, they already thought of ABCMeta of course (their ModelMetaclass
subclasses ABCMeta
)! Now to figure out the abstract part 🔍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so as I'm looking at this stuff again, I'm reminded that the only way to define an "abstract class" is for it to have methods (or properties) that are annotated as abstract. And if I understand your question, you're saying that you're not sure what it is that should be marked abstract in some of these cases.
So in the case of Version
, it's fairly clear that you can make fingerprint
an abstract method and you're good to go. In some of the other cases, it's not as clear. When I'm fighting with a library like that, it is sometimes a sign that I could stand to rethink my strategy. Like, why does Annotation
need to be abstract? Or what about Type
? I wonder if it might be useful to use a Union
instead of a dummy base class, so something like Type = Union[String, Int, Null]
, something like that.
Happy to talk in more detail, this stuff is really fun to play around with!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if I understand your question, you're saying that you're not sure what it is that should be marked abstract in some of these cases.
Yeah, some tension is the ambiguous semantics for "abstract". With the current approach, one could trivially subclass any @requires_subclass
class with no additional properties and instantiate - is that ok or should that still be "abstract"?
For Type
or Annotation
specifically, yes, subclassing alone may be enough: there is additional semantic meaning intrinsic in the subclass. For example, Geo(Type)
or PII(Annotation)
alone is meaningful (though of course, expanding with eg: srid: int
or sensitive: bool
wouldn't hurt).
So in the case of Version, it's fairly clear that you can make fingerprint an abstract method
Ah, good call. 👍
why does
Annotation
need to be abstract? Or what aboutType
? I wonder if it might be useful to use aUnion
instead of a dummy base class, so something likeType = Union[String, Int, Null]
, something like that.
Good question. 🤔 The intent on these two (and a lot of the other classes in the core.py
modules) is that external modules (plugins, user code, etc) will define additional subclasses known only at runtime.
I think you're right on Union
style behavior but because it's 1. dynamic (precluding Union
) and 2. arbitrary data rather than abstract behavior (precluding ABC
and Protocol
), we may be forced out of the "happy path"...? The only other thing I can think of is (ab)using inheritance to create "dynamic" discriminated Union
s (I want to check both isinstance(x, Type)
and isinstance(x, Geo)
, depending on the context) similar to now.
--
Seems pydantic has an open issue about (static) discriminated unions (which would kinda be like discriminator='__class_key__'
in this PR) and another one with a desire for dynamic unions (they went with a registry too, but using __init_subclass__
and missing the actual validator part).
Unless you can think of anything else, I will probably:
- replace
@register_subclass
with an__abstract__: ClassVar[bool]
, add an__init_subclass__
defaulting__abstract__ = False
, and change the validator to check forcls.__abstract__
(changes not necessary, I just like it a bit more) - add some good comments summarizing a handful of these things above the
__abstract__
flag (hopefully a bit more concisely, now that you've helped clarify the intent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is all veering into pure discussion territory; I can tell these are early strokes that will be revised, so don't consider any of this blocking)
OK, the bit about user-defined types is an interesting twist, I can see how that gets in the way implementation-wise of Unions. I still think there's a question here worth considering: what are the actual requirements for an Annotation
? Or for a Type
? Like, what does code that interacts with those objects expect them to be able to do? I think that trying to implement some functionality will likely be illuminating. If it's just that they need to be serializable, for example, then perhaps requiring that they be pydantic models is sufficient. Serialization of user-defined types presents its own set of challenges, which you can see succinctly described in the second issue that you linked. I worked on my own library that tried to deal with this sort of thing, happy to discuss further if/when that comes up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serializing arbitrary data is the main need within the framework (which pydantic covers, as you say), but they will be instantiated by users and most consuming code will be "purpose" built. Eg:
Type
s are used by theTypeSystems
to convert artiType
s <-> third party typesAnnotations
is largely for "data ops", but may influence (user) code "responding" to different subclasses specifically
Guess adding comments to Annotation
, Type
, etc would be useful sooner than later. 😁
Does this feel a bit tidier than the registry?
|
||
@validator("dt", always=True) | ||
@classmethod | ||
def _requires_timezone(cls, dt: datetime) -> datetime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, now we're seeing where pydantic comes in handy!
Gonna go ahead and merge, but will circle back for |
As an alternative to the serialization work in #39, we decided to port many of the core classes to Pydantic models. In addition to ~automatic serialization (at least to
dict
, each Backend will need to handle from there), Pydantic simplifies class definitions (akin to dataclasses), provides a standard validation mechanism, and improves error reporting.I haven't yet handled
Artifact
,Producer
, norGraph
, but I think those can be handled simply enough:@validator
methodsbuild
andmap
arguments referencing a (non-proper) subset of those.ArtifactBox
(not sure how that'll serialize - may not be an issue)Additionally for Artifact and Producer, we'll probably want separate "ArtifactPartition" and "ProducerRun" style sub-models respectively to track more granular details (partition keys, runtime metadata, etc) - but we'll punt on these for now.