remove get_event_records calls - from caching instance queryer #21780

prha · 2024-05-10T17:15:35Z

Summary & Motivation

We want to deprecate get_event_records calls in favor of narrower APIs

How I Tested These Changes

BK

prha · 2024-05-10T17:15:46Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @prha and the rest of your teammates on Graphite

sryza · 2024-05-13T15:41:56Z

python_modules/dagster/dagster/_utils/caching_instance_queryer.py

-                return record
+        has_more = True
+        cursor = None
+        while has_more:


When implementing similar code on top of fetch_materializations, I've noticed that needing to add this while loop is a bit of an ergonomic regression, and easy to mess up. Thoughts on including some sort of wrapper?

I've been thinking about explicitly not doing that. I think it feels bad to write the loop, and it probably should.

i guess the way that I would do it, just to avoid bugs is to name the helper fn something like:

def fetch_materializations_in_a_loop_which_is_probably_bad_design(instance, asset_key, ...): ...

What in particular should feel bad? Looking at older records than the most recent record that matches a set of constraints?

One case I'm thinking about here is asset checks that do anomaly detection and need to compare present values to multiple historical values

I think fetching an unknown number of records in a single call is dangerous. It hides the fact that this single function call is creating a lot of load on the DB.

What exactly do you mean by "unknown"? In the anomaly detection case, the user would specify a number of records they want. But they would still need to write this loop, right?

sryza · 2024-05-13T16:10:44Z

python_modules/dagster/dagster/_utils/caching_instance_queryer.py

+        while has_more:
+            result = self.instance.fetch_observations(
+                AssetRecordsFilter(asset_key=asset_key, after_storage_id=after_cursor),
+                limit=RECORD_BATCH_SIZE,


Are we "double batching" here? I.e. should we not trust the fetch_observations implementation to determine the right batch size?

I'm not sure what you mean by this.

The implementation of fetch_observations internally sets a limit on how many records that it fetches from the DB at once, right? I.e. this is what has_more is for, so that the implementation doesn't need to return all the records at once?

And then RECORD_BATCH_SIZE is presumably also setting a limit on how many records to fetch at once.

sryza

@prha and I chatted about this in person and he cleared up my concerns. My reaction was based on a misunderstanding – I thought that fetch_materializations would sometimes return fewer events than the limit, even when the event log had more events than that.

## Summary & Motivation We want to deprecate get_event_records calls in favor of narrower APIs ## How I Tested These Changes BK

prha added 2 commits May 10, 2024 10:15

remove get_event_records calls from graphql

8a07033

remove get_event_records calls - from caching instance queryer

a2fa956

prha marked this pull request as ready for review May 13, 2024 14:36

prha requested review from OwenKephart, smackesey and sryza May 13, 2024 15:38

sryza reviewed May 13, 2024

View reviewed changes

sryza approved these changes May 13, 2024

View reviewed changes

Base automatically changed from prha/rm_get_event_records_graphql_1 to master May 16, 2024 17:16

prha merged commit 1de6324 into master May 16, 2024
1 check passed

prha deleted the prha/rm_get_event_records_caching_queryer_2 branch May 16, 2024 17:18

nikomancy pushed a commit that referenced this pull request May 22, 2024

remove get_event_records calls - from caching instance queryer (#21780)

b0fed5c

## Summary & Motivation We want to deprecate get_event_records calls in favor of narrower APIs ## How I Tested These Changes BK

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove get_event_records calls - from caching instance queryer #21780

remove get_event_records calls - from caching instance queryer #21780

prha commented May 10, 2024 •

edited

prha commented May 10, 2024 •

edited

sryza May 13, 2024

prha May 13, 2024

prha May 13, 2024

sryza May 13, 2024 •

edited

sryza May 13, 2024

prha May 13, 2024 •

edited

sryza May 13, 2024

sryza May 13, 2024

prha May 13, 2024

sryza May 13, 2024

sryza left a comment

remove get_event_records calls - from caching instance queryer #21780

remove get_event_records calls - from caching instance queryer #21780

Conversation

prha commented May 10, 2024 • edited

Summary & Motivation

How I Tested These Changes

prha commented May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sryza May 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prha May 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sryza left a comment

Choose a reason for hiding this comment

prha commented May 10, 2024 •

edited

prha commented May 10, 2024 •

edited

sryza May 13, 2024 •

edited

prha May 13, 2024 •

edited