-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] What is the timeline for dask.dataframe
deprecation
#10934
Comments
Thanks for opening the issue. First of all, this is all up for debate. Nothing here has been definitively decided. Our intention is currently to enable the query planning as soon as possible. We feel good about the current performance and stability. However, we won't be able to release it with full API coverage and are rather focusing on the most important APIs (e.g. currently There are two missing features that possibly lock out a larger number of users than we'd feel comfortable with. These are
Edit: I got mixed up in my calendar. I suspect the most realistic release date will be
There hasn't been any decision about this, yet. My current assumption is that we'll hold on to this for a while until we're certain that we won't cut out larger user groups. Please let us know if anything here sounds concerning or problematic. We're also interested if this all sounds too careful or too reckless :) |
The conversation about annotations is happening over in #10937 |
In conversation @fjetter mentioned to me that we should probably try things out with xgboost.dask and make sure that that project is ok post-transition. |
For context with xgboost, they specify workers, but only after they've already converted to futures, which seems pretty safe for dask-xgboost. |
Linking dask/community#361 (sorry - just saw that issue now) |
my fault. I only opened that one now 😅 |
We're currently seeing a couple of weird recursive import errors when using dask-expr in our coiled benchmarks test suite, see coiled/benchmarks#1419 This is something we definitely want to fix or at least have better understood before moving forward. Therefore, I suggest to not block on any of the above issues, i.e. neithe on the annotations #10937 nor on the scheduler integration dask/dask-expr#14 This would mean that the next release would have This leaves the question about what to do with pandas 1.X support. I opened another issue for this #10962 |
This is fixed now on main |
Many users and down-stream libraries were a bit surprised to see a loud deprecation warning when importing
dask.dataframe
after the2024.2.0
release. The dask-expr migration was certainly obvious for anyone watching github. However, the discussion/decision over the specific timeline was largely internal to Coiled.Could we use this issue to establish a basic timeline for users and down-stream libraries to use as a reference? Note that I am not asking that we try to reach a consensus on these kinds of decisions. It would just be very useful to know what the plan is (so it can be communicated easily to others).
Critical Questions:
"dataframe.query-planning"
default will change from"False"
to"True"
? For example, will it be2024.2.1
, or is the plan to do this in2024.3.0
or later?"dataframe.query-planning": "False"
will be disabled entirely?The text was updated successfully, but these errors were encountered: