Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling large number of files #538

Open
cnydw opened this issue Jun 8, 2021 · 3 comments
Open

Handling large number of files #538

cnydw opened this issue Jun 8, 2021 · 3 comments

Comments

@cnydw
Copy link

cnydw commented Jun 8, 2021

At the moment, when a user opens a folder from notebook or jupyterlab, jupyter_server would read all the files inside the folder using os.lstat, which is very costly for large number of files.

for name in os.listdir(os_dir):
try:
os_path = os.path.join(os_dir, name)
except UnicodeDecodeError as e:
self.log.warning(
"failed to decode filename '%s': %s", name, e)
continue
try:
st = os.lstat(os_path)

This makes it basically impossible to open a folder with large number of files, the backend would freeze for a long time before being responsive again. And even when the backend returns the data, the frontend would crash due to the rendering of all the files. See jupyterlab/jupyterlab#8700

It would be nice to improve this architecture, using paging or other methods to partially read the files.

@welcome
Copy link

welcome bot commented Jun 8, 2021

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@cnydw
Copy link
Author

cnydw commented Jun 8, 2021

I created a draft pull request #539

Together with my other commit cnydw/jupyterlab@6e615c0 on the JupyterLab frontend, it could open a folder with 100000 files without problem.

ezgif-1-2c77e653fc85

The two commits I made are just POC, the API changes can certainly be improved. I think it makes sense to first make the backend API changes in jupyter_server, then propagate the frontend changes to JupyterLab and Jupyter Notebook accordingly.

@fcollonval @telamonian

@kzhang2
Copy link

kzhang2 commented Jun 8, 2022

hi, any updates on getting this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants