Skip to content

Caching when using a distributed scheduler #7175

Answered by andersy005
xinrong-meng asked this question in Q&A
Discussion options

You must be logged in to vote

@xinrong-databricks,

Here, calling df[filter_expr].compute() to cache doesn't work because df[filter_expr] doesn't fit into memory.

have you tried using .persist() i.e. df[filter_expr].persist() to persist your results in the your cluster's distributed memory?

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@xinrong-meng
Comment options

@andersy005
Comment options

@xinrong-meng
Comment options

Answer selected by jrbourbeau
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants