Caching when using a distributed scheduler #7175
-
Let's say we have a large Dask DataFrame And then we have a bunch of operations on Here, calling Opportunistic caching also doesn't seem to work because we are using a distributed scheduler. In this case, what would be the best caching mechanism? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
@xinrong-databricks,
have you tried using |
Beta Was this translation helpful? Give feedback.
@xinrong-databricks,
have you tried using
.persist()
i.e.df[filter_expr].persist()
to persist your results in the your cluster's distributed memory?