check for main module in reducer override #8455

fjetter · 2024-01-12T09:23:04Z

This rips out the entire "check in pickled result byte string" logic in favor of piggy backing on the reducer_override which we're already using anyhow.

For those who aren't familiar with this override: This is called for every object and allows us to override the pickle dispatch table.
So far, we've been only using this for objects that failed to pickle without custom logic so there may be a performance penalty involved since we're now doing it for every object.

On the flip side, this removes the need to check the entire byte string for b"__main__" and the cloudpickle fallback is much more precise now. For instance, if one serializes a large graph where a single function would fail to pickle we'd now use cloudpickle only on this single function instead of the entire graph which in theory should be faster, if only because we're not doing everything twice.

I'll run some tests. If this doesn't work out because it's slower than main or causes other issues I will revert #8443 (since it turns out we have to check for main in every byte string after all...)

github-actions · 2024-01-12T10:29:06Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

27 files ± 0 27 suites ±0 9h 49m 33s ⏱️ + 19m 46s
3 960 tests + 3 3 835 ✅ - 12 109 💤 ±0 16 ❌ + 15
49 809 runs +53 47 345 ✅ - 114 2 289 💤 - 7 175 ❌ +174

For more details on these failures, see this check.

Results for commit 1978c5b. ± Comparison against base commit 9fb41e3.

♻️ This comment has been updated with latest results.

fjetter · 2024-01-12T12:45:14Z

Funny story... almost all graphs I tested run either into the custom dask pickler + main in result or into the catch-all cloudpickle check so almost all graphs are serialized at least two times. If the dask pickler thing fails for whatever reason, this is even executed three times...

So, for graph serialization, the main in result check is typically not that relevant.

fjetter · 2024-01-12T13:14:12Z

I updated the pickler logic that it works for the cases I tested and is faster when the worst case is triggered. it is slightly slower for the happy path compared to main. Overall, that's probably still net positive.

I don't really like messing with the pickling logic that much and am not sure if I want to move forward with this. Therefore, I propose to revert the change for the release #8456 and if we decide to move forward with this we can let it sit on main for a bit before the release

fjetter · 2024-01-12T13:15:38Z

This all really only matters for very large graphs. I tested this on a graph with millions of tasks and the timing of this was in the range of 10s of seconds. For moderate graph sizes this is in the ms range or below. Therefore, this isn't a critical thing

hendrikmakait · 2024-01-22T08:56:10Z

CI is very red, should we move this into draft mode?

fjetter added 5 commits January 12, 2024 10:14

check for main module in reducer override

4b918f1

add the CLOUDPICKLE_GE_20 back again

e92dca8

remove obsolete buffers.clear

2362acd

test without pandas

40814d7

improve pickler override

7e7b7c7

this is going nowhere

e0dd5e8

fjetter mentioned this pull request Jan 12, 2024

Revert pickle change #8456

Merged

cleanup

1978c5b

fjetter marked this pull request as draft January 22, 2024 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check for main module in reducer override #8455

check for main module in reducer override #8455

fjetter commented Jan 12, 2024 •

edited

github-actions bot commented Jan 12, 2024 •

edited

fjetter commented Jan 12, 2024

fjetter commented Jan 12, 2024

fjetter commented Jan 12, 2024

hendrikmakait commented Jan 22, 2024

check for main module in reducer override #8455

Are you sure you want to change the base?

check for main module in reducer override #8455

Conversation

fjetter commented Jan 12, 2024 • edited

github-actions bot commented Jan 12, 2024 • edited

Unit Test Results

fjetter commented Jan 12, 2024

fjetter commented Jan 12, 2024

fjetter commented Jan 12, 2024

hendrikmakait commented Jan 22, 2024

fjetter commented Jan 12, 2024 •

edited

github-actions bot commented Jan 12, 2024 •

edited