You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I mentioned this in #11067, but maybe this deserves its own issue: I find it difficult to turn off query planning using the Python API. Using dask.config.set only works if dask.dataframe hasn't been imported up until that point.
importdaskimportpandasaspdimportdask.dataframeasdd# I moved this import upwithdask.config.set({"dataframe.query-planning": False}):
ddf=dd.from_pandas(pd.DataFrame({"x": [1, 2, 3]}), chunksize=1)
out=ddf.mean()
print(hasattr(out, "_expr")) # True
When I write library code, I typically can't control what users already imported.
Yes, adding dd = importlib.reload(dd) inside the context also fixes the issue in this example, but that doesn't work in all settings.
E.g., imagine that I write a library with a function that some user can call with a dask dataframe:
# my library codeimportdaskdefmy_func(ddf):
withdask.config.set({"dataframe.query-planning": False}):
# reloading dask.dataframe here of course doesn't make a differenceout=ddf.mean()
print(hasattr(out, "_expr")) # True
# user codeimportdask.dataframeasddimportpandasaspdfrommy_libraryimportmy_funcddf=dd.from_pandas(pd.DataFrame({"x": [1, 2, 3]}), chunksize=1)
my_func(ddf)
Is there a clever way around this? I have drastic ideas like building a conda package that sets the DASK_DATAFRAME__QUERY_PLANNING environment variable, but that might be a bit much. I would much rather turn the query planner off selectively.
As an aside: maybe all of this is moot once #11067 is fixed.
The text was updated successfully, but these errors were encountered:
I mentioned this in #11067, but maybe this deserves its own issue: I find it difficult to turn off query planning using the Python API. Using
dask.config.set
only works ifdask.dataframe
hasn't been imported up until that point.This works (query planner is turned off):
This doesn't work (query planner is turned on):
When I write library code, I typically can't control what users already imported.
Yes, adding
dd = importlib.reload(dd)
inside the context also fixes the issue in this example, but that doesn't work in all settings.E.g., imagine that I write a library with a function that some user can call with a dask dataframe:
Is there a clever way around this? I have drastic ideas like building a conda package that sets the
DASK_DATAFRAME__QUERY_PLANNING
environment variable, but that might be a bit much. I would much rather turn the query planner off selectively.As an aside: maybe all of this is moot once #11067 is fixed.
The text was updated successfully, but these errors were encountered: