Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove get_event_records calls - from caching instance queryer #21780

Merged
merged 2 commits into from May 16, 2024

Conversation

prha
Copy link
Member

@prha prha commented May 10, 2024

Summary & Motivation

We want to deprecate get_event_records calls in favor of narrower APIs

How I Tested These Changes

BK

return record
has_more = True
cursor = None
while has_more:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing similar code on top of fetch_materializations, I've noticed that needing to add this while loop is a bit of an ergonomic regression, and easy to mess up. Thoughts on including some sort of wrapper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about explicitly not doing that. I think it feels bad to write the loop, and it probably should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess the way that I would do it, just to avoid bugs is to name the helper fn something like:

def fetch_materializations_in_a_loop_which_is_probably_bad_design(instance, asset_key, ...):
    ...

Copy link
Contributor

@sryza sryza May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What in particular should feel bad? Looking at older records than the most recent record that matches a set of constraints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case I'm thinking about here is asset checks that do anomaly detection and need to compare present values to multiple historical values

Copy link
Member Author

@prha prha May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fetching an unknown number of records in a single call is dangerous. It hides the fact that this single function call is creating a lot of load on the DB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "unknown"? In the anomaly detection case, the user would specify a number of records they want. But they would still need to write this loop, right?

while has_more:
result = self.instance.fetch_observations(
AssetRecordsFilter(asset_key=asset_key, after_storage_id=after_cursor),
limit=RECORD_BATCH_SIZE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we "double batching" here? I.e. should we not trust the fetch_observations implementation to determine the right batch size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of fetch_observations internally sets a limit on how many records that it fetches from the DB at once, right? I.e. this is what has_more is for, so that the implementation doesn't need to return all the records at once?

And then RECORD_BATCH_SIZE is presumably also setting a limit on how many records to fetch at once.

Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prha and I chatted about this in person and he cleared up my concerns. My reaction was based on a misunderstanding – I thought that fetch_materializations would sometimes return fewer events than the limit, even when the event log had more events than that.

Base automatically changed from prha/rm_get_event_records_graphql_1 to master May 16, 2024 17:16
@prha prha merged commit 1de6324 into master May 16, 2024
1 check passed
@prha prha deleted the prha/rm_get_event_records_caching_queryer_2 branch May 16, 2024 17:18
nikomancy pushed a commit that referenced this pull request May 22, 2024
## Summary & Motivation
We want to deprecate get_event_records calls in favor of narrower APIs

## How I Tested These Changes
BK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants