You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Writing Producer output to Literal storage was recently added with tests confirming non-partitioned use. Most of the logic should be good for use with partitioned Producers (ie: those implementing map), but probably will need a couple small fixes + a test.
The first thing that comes to mind as likely to error is that the Producer's autogenerated input and output Artifacts will have type=Int(...) or type=List(...) instead of type=Collection(...). We'll have to see if there's a good way to know (or infer) whether a given input/output literal should be partitioned.
The text was updated successfully, but these errors were encountered:
Perhaps we can determine whether a Producer is mapped and if so, change the generated Literal output Artifacts to have type=Collection(...).
To determine if mapped, we should mostly be safe to just check whether map is implemented (ie: hasattr(cls, "map")). This may be a bit circular (we define the output artifacts first, then use that metadata to validate/generate map), but should be ok since we only auto-generate a map method for non-partitioned cases. We may want to set an attribute on the generated map method that we can check for to handle Producer subclassing (that way we don't just see a base class's generated map and then think it is partitioned).
--
One general type system caveat: the Collection logic currently assumes each "partition" of the collection is itself "list like" or "concatenatable" (eg: pd.DataFrame or database table). For example, Collection(element=Int64()) actually corresponds to a python type hint of -> list[int] (def build(...): return [1]), but these Literal uses are more likely -> int (def build(...): return 1). Perhaps we add an extra flag to Collection like scalar_partitions to disambiguate?
This makes sense for the output (because we run build multiple times), but any input artifacts with scalar_partitions might need to be flexible on list[int] or intdependent on whether map defines a single partition as the dependency or multiple (which is unfortunately, not known/possible to validate until "runtime"). Perhaps we can limit this to "inputs are always lists" in the short term.
Writing Producer output to Literal storage was recently added with tests confirming non-partitioned use. Most of the logic should be good for use with partitioned Producers (ie: those implementing
map
), but probably will need a couple small fixes + a test.The first thing that comes to mind as likely to error is that the Producer's autogenerated input and output Artifacts will have
type=Int(...)
ortype=List(...)
instead oftype=Collection(...)
. We'll have to see if there's a good way to know (or infer) whether a given input/output literal should be partitioned.The text was updated successfully, but these errors were encountered: