New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove get_event_records calls - from caching instance queryer #21780
Conversation
return record | ||
has_more = True | ||
cursor = None | ||
while has_more: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When implementing similar code on top of fetch_materializations
, I've noticed that needing to add this while loop is a bit of an ergonomic regression, and easy to mess up. Thoughts on including some sort of wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about explicitly not doing that. I think it feels bad to write the loop, and it probably should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess the way that I would do it, just to avoid bugs is to name the helper fn something like:
def fetch_materializations_in_a_loop_which_is_probably_bad_design(instance, asset_key, ...):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What in particular should feel bad? Looking at older records than the most recent record that matches a set of constraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One case I'm thinking about here is asset checks that do anomaly detection and need to compare present values to multiple historical values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fetching an unknown number of records in a single call is dangerous. It hides the fact that this single function call is creating a lot of load on the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly do you mean by "unknown"? In the anomaly detection case, the user would specify a number of records they want. But they would still need to write this loop, right?
while has_more: | ||
result = self.instance.fetch_observations( | ||
AssetRecordsFilter(asset_key=asset_key, after_storage_id=after_cursor), | ||
limit=RECORD_BATCH_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we "double batching" here? I.e. should we not trust the fetch_observations
implementation to determine the right batch size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of fetch_observations
internally sets a limit on how many records that it fetches from the DB at once, right? I.e. this is what has_more
is for, so that the implementation doesn't need to return all the records at once?
And then RECORD_BATCH_SIZE
is presumably also setting a limit on how many records to fetch at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prha and I chatted about this in person and he cleared up my concerns. My reaction was based on a misunderstanding – I thought that fetch_materializations
would sometimes return fewer events than the limit, even when the event log had more events than that.
## Summary & Motivation We want to deprecate get_event_records calls in favor of narrower APIs ## How I Tested These Changes BK
Summary & Motivation
We want to deprecate get_event_records calls in favor of narrower APIs
How I Tested These Changes
BK