New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check for main module in reducer override #8455
base: main
Are you sure you want to change the base?
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 9h 49m 33s ⏱️ + 19m 46s For more details on these failures, see this check. Results for commit 1978c5b. ± Comparison against base commit 9fb41e3. ♻️ This comment has been updated with latest results. |
Funny story... almost all graphs I tested run either into the custom So, for graph serialization, the main in result check is typically not that relevant. |
I updated the pickler logic that it works for the cases I tested and is faster when the worst case is triggered. it is slightly slower for the happy path compared to main. Overall, that's probably still net positive. I don't really like messing with the pickling logic that much and am not sure if I want to move forward with this. Therefore, I propose to revert the change for the release #8456 and if we decide to move forward with this we can let it sit on main for a bit before the release |
This all really only matters for very large graphs. I tested this on a graph with millions of tasks and the timing of this was in the range of 10s of seconds. For moderate graph sizes this is in the ms range or below. Therefore, this isn't a critical thing |
CI is very red, should we move this into |
Closes #8454
This rips out the entire "check in pickled result byte string" logic in favor of piggy backing on the
reducer_override
which we're already using anyhow.For those who aren't familiar with this override: This is called for every object and allows us to override the pickle dispatch table.
So far, we've been only using this for objects that failed to pickle without custom logic so there may be a performance penalty involved since we're now doing it for every object.
On the flip side, this removes the need to check the entire byte string for
b"__main__"
and the cloudpickle fallback is much more precise now. For instance, if one serializes a large graph where a single function would fail to pickle we'd now use cloudpickle only on this single function instead of the entire graph which in theory should be faster, if only because we're not doing everything twice.I'll run some tests. If this doesn't work out because it's slower than main or causes other issues I will revert #8443 (since it turns out we have to check for main in every byte string after all...)