Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf.pandas dataframe.__repr__ slow in jupyterlab for large datasets #15747

Open
AjayThorve opened this issue May 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@AjayThorve
Copy link
Member

Describe the bug
Calling a dataframe.repr in a notebook cell either takes very long or results in a kernel failure for large datasets.
Steps/Code to reproduce bug
In a jupyterlab environment, run this in a cell:

# [cell 1]
%load_ext cudf.pandas

# [cell 2]
import pandas as pd
import numpy as np

# Define the number of rows and columns
num_rows = 25_000_000
num_columns = 12

# Create a DataFrame with random data
df = pd.DataFrame(np.random.randint(0, 100, size=(num_rows, num_columns)),
                  columns=[f'Column_{i}' for i in range(1, num_columns + 1)])


# [cell 3]
df

image

Expected behavior
dataframe should render quickly, as is the case when working directly with cudf, or pandas

Note
This works as expected in a python interactive shell, or when calling print(df) in a notebook.

@AjayThorve AjayThorve added the bug Something isn't working label May 14, 2024
@AjayThorve AjayThorve changed the title [BUG] cudf.pandas dataframe.__repr__ fails in jupyterlab for large datasets [BUG] cudf.pandas dataframe.__repr__ slow in jupyterlab for large datasets May 14, 2024
@vyasr
Copy link
Contributor

vyasr commented May 15, 2024

cf #13297

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

3 participants
@vyasr @AjayThorve and others