Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Restore Notebook execution progress when a browser page is reloaded #1274

Open
skukhtichev opened this issue May 12, 2023 · 25 comments

Comments

@skukhtichev
Copy link

Restore Notebook execution progress when a browser page is reload

Problem

Jupyter Notebook/Lab does not restore execution progress after the page is reloaded. As a result there is no option to monitor execution progress and retrieve execution output for long running notebooks.

It happens due to the following reasons:

  1. Notebook/Lab UI generates a new session id every time when a notebook/lab page is reloaded.

    Jupyter Server supports replaying kernel messages to the client after the kernel session is reconnected. It is based on the session id set by the client while connecting to the kernel. Jupyter Notebook/Lab UI generates a new session id every time when the notebook page is loaded. So there is no way to replay buffered messages from Jupyter Server after the notebook page was re-opened because there is no way to reconnect to the existing kernel session.

  2. Message IDs for submitted code are not persisted in cells metadata

    Kernel message ids are not persisted in the notebook metadata and cleaned up after the notebook window or tab is reloaded. It means that Jupyter Frontend code is not able to link kernel messages to cells. As a result there is no way to show the output and an execution count.

  3. Unsaved cell’s output is missing after a notebook page is reloaded

    If the kernel message with the execution result is sent to the browser but the notebook is not saved then output will not be displayed after reloading a page

Proposed Solution

A proposal is to enable restoring sessions for kernel connections and to move a Notebook model (cells metadata) to Jupyter Server and synchronize changes triggered by a Notebooks/Lab UI and a kernel.

Restore kernel sessions

Make a kernel connection independent from the session id provided in web socket session_id url argument. A new session id is generated every time when the notebook is reloaded. In this case buffered kernel messages will not be replayed after the Notebook page/tab is re-opened. JupyterLab and Notebook 7 support collaboration mode which allows you to differentiate between notebook users. A user info and a notebook path could be used as a session identifier and will be mapped to the kernel id.

kernel_session_restore

Move a Notebook model (cells metadata) to Jupyter Server and synchronize it with a kernel and a notebook opened in UI

Storing a notebook model on Jupyter Server will allow:

  • restore message ids for all code cells with submitted code after the page is reloaded;
  • Restore execution progress and output for unsaved changes;

There are implementation notes for enabling Notebooks model on Jupyter Server:

  • JupyterLab/Notebook UI sends messages when the following cell's state is changed:
    • On Cell Changed
    • On Cell Inserted
    • On Cell Deleted
    • On Cell Cleared
    • On Cell Executed
      The messages are sent via kernel's web socket connection with a new message type (e.g. nb_state). Then ZMQChannelHandler parses incoming messages and forwards notebook state related messages to NotebooksStatesManager. NotebooksStatesManager is responsible for synchronizing a notebook state between UI and Jupyter Server. It will also handle kernel messages.
  • When the code is submitted for execution a message id (msg_id) is returned to the client (Browser). The client tracks an execution progress based on the message id. Currently the message id is stored in the browser and should not be persisted in the notebook ipynb file because it is relevant only during a runtime. Since each cell has a unique id then it is possible to map message id to cell id, store a message id for each submitted cell on the Jupyter Server and return it to the client when the notebook is reloaded. Then the client will be able to handle incoming kernel messages and display execution progress. Message id is not saved in the ipynb file and is available only during a runtime.
  • When the notebook is reloaded then cells metadata (including output) from the ipynb file will be merged with the cells metadata from the notebook model saved on the Jupyter Server.

Additional context

The image below shows components and data flow for execution restore logic:

  1. When the notebook is loaded for a first time ContentsManager creates a copy of a Notebook model on Jupyter Server and sends it to the client (Notebook/Lab UI)
  2. When a user edits the notebook then changes are sent to the Jupyter Server via kernel's web socket connection. ZMQChannelsHandler parses messages by the type (e.g. nb_state type) and forwards messages related to state changes to the NotebooksStatesManager. NotebooksStatesManager updates the notebook model stored on the server
  3. When the kernel sends the message ZMQchannelsHandler forwards message to the NotebooksStatesManager and to the Jupyter Notebook/Lab UI
  4. When the notebook page is reloaded ContentsManager loads a notebook file from the storage (file system, cloud storage, etc.) and merges it (including message ids for submitted code cells execution) with the notebook model stored on Notebook Server. It allows to identify which cells are in executing state and restore execution progress.
  5. When the user saves the notebook then contents manager removes message ids from the notebook model which should be saved in the file and saves ipynb file in the storage (msg_id parameter still exists in the Notebook model stored on Jupyter Server).

restore_execution_progress_components

@echarles
Copy link
Member

echarles commented May 12, 2023

Thank you @skukhtichev for the great demo during JupyterCon and for opening the discussion to upstream the work you have been doing.

I would be great to have a chat during one of our dev calls. When would you be able to join?

cc/ @Zsailer @kevin-bates

@Mako-L
Copy link

Mako-L commented May 13, 2023

Haules

@skukhtichev
Copy link
Author

@echarles I will be happy to discuss the proposal during the dev talk. After @davidbrochart presentation about Jupyter Server at JupyterCon, I realized that the proposal needs to be adapted to the latest Jupyter Server changes (authorization, updated kernel web socket handler, etc.). I want to dive deeper into the server's code and update the proposal. Will the May 25th Jupyter Server call be appropriate to discuss the proposal?

@echarles
Copy link
Member

I realized that the proposal needs to be adapted to the latest Jupyter Server changes (authorization, updated kernel web socket handler, etc.)

Doesn't the general principle remain the same. Authentication and Authorization should not come in the picture, does it? The new kernel websocket handler is meant to be more easily extensible, so I guess it should not impact the validity of your proposal.

Will the May 25th Jupyter Server call be appropriate to discuss the proposal?

Sounds good. I will join and we will chat with the people online.

@kevin-bates
Copy link
Member

@skukhtichev, @echarles - I've gone ahead and added this to Thursday's agenda. See you there!

@echarles
Copy link
Member

@kevin-bates I think @skukhtichev was mentioning 25th (next week), not 18th (this week)

@skukhtichev
Copy link
Author

@echarles yes, @kevin-bates Could we reschedule a discussion to the next week (May 25th)?

@skukhtichev
Copy link
Author

Doesn't the general principle remain the same. Authentication and Authorization should not come in the picture, does it? The new kernel websocket handler is meant to be more easily extensible, so I guess it should not impact the validity of your proposal.

Yes, the principle remains the same. There is a limitation with establishing websocket sessions between the Browser and Jupyter Server. There is only one session could be established for a user. If the user opens the same notebook in the new browser tab, then then the previous web socket session is closed. Currently it is similar for other users who are connecting to the same kernel. The authorization logic sets user-specific cookies, so it should be possible to distinguish between users connecting to the same kernel. Implementing this approach will allow support for multiple users and will not interfere with the collaboration feature.

@kevin-bates
Copy link
Member

@kevin-bates I think @skukhtichev was mentioning 25th (next week), not 18th (this week)

I'm sorry I missed that. I've updated the agenda such that this discussion is slated for next week (May 25).

@parul100495
Copy link

I would also like to join this conversation (will attend the discussion next week). I have worked on similar concept and would like to collaborate on this effort.

@kevin-bates
Copy link
Member

Hi @parul100495 - thank you for sharing your interest in helping! See you next Thursday.

(For the sake of others, the Server/Kernels team meeting is open to anyone - all are welcome - no participation required.)

@echarles
Copy link
Member

PS: unfortunately I won't be able to join this week call. Excited to see any progress on this feature.

@matthewwiese
Copy link
Collaborator

Hi all, I recently discovered this proposal and am interested in learning more about how I could help.

I work in a plant breeding lab that collaborates frequently with international researchers. These individuals don't always have the most reliable internet connection, and so oftentimes face the frustration of losing their work when in-progress cells are "canceled" due to a network interruption. The way I read @skukhtichev's proposal, it ought to also cover situations like these (i.e. a user reconnects to their notebook after being disconnected for some time, and is able to see their code cells continuing to run and not lose data).

I've talked briefly with @davidbrochart on this issue related to the new kernels REST API, as I believe it would also be a potential solution to this problem.

What would be the best avenue for a volunteer to dedicate time to assisting with this? I saw that this proposal was discussed at last week's server meeting.

@Zsailer
Copy link
Member

Zsailer commented Jun 7, 2023

Hey folks,

Since there appears to be interest from multiple people representing multiple organizations, I propose we try to meet regularly over the next couple of months to discuss this topic to 1) collect implementation ideas 2) develop a plan towards an open-source solution 3) coordinate who might be able to work on this.

At last week's server meeting, I proposed we reserve the last 20 minutes of the Jupyter Server/Kernels Meeting (8am Pacific) discuss this topic. Would folks on this thread be able to stop by regularly for those 20 minutes? Give this comment a 👍 to signal that you'd like to join.

We'll post notes from those discussions here. Thanks!

@matthewwiese
Copy link
Collaborator

@Zsailer Just to double-check, will those 20 minute blocks begin at tomorrow's meeting, or next week's?

@Zsailer
Copy link
Member

Zsailer commented Jun 15, 2023

In today's Jupyter Server call, we discussed this issue extensively. I'll do my best to summarize the discussion below based on some notes I took.

User story: As a user, I'd like to run a cell (or series of cells) in a notebook, close JupyterLab before the cells finish. When I come back, I'd expect to see

  1. the output of any completed cells,
  2. the execution count on those cells,
  3. if any cells are still running (with "*" in the corner),
  4. and the notebook's execution state showing accurate whether it is "idle"/"busy".

Today's UX: When the user closes/refreshes JupyterLab,

  1. no completed outputs
  2. no execution counts
  3. empty brackets where the execution count or "*" should be
  4. "idle" is shown even if it's busy (due to this issue)

Multiple solutions have been proposed over the past few weeks. Here's my attempt at summarizing below.

There are three layers to the problem:

  1. Message replay from the kernel.
    • keep a history/log of all reply messages from the kernel that might have been missed
  2. Kernel state reconstruction from kernel message replay. We need a way take replay of messages and resolve:
    • execution_state (busy/idle)
    • execution_count
    • all outputs
  3. Notebook model reconstruction from kernel message replay. We need a way take replay of messages and resolve:
    • which cells are finished
    • which cells still need executing
    • outputs for finished cells

To begin building a solution, we proposed using a new, separate repo in jupyter-server org that

  • Will provide a kernel message replay mechanism.
  • Will provide an API to reconstruct kernel state from a replayed message stream.
  • Will provide an API to reconstruct notebook model from a replayed message stream.
  • Will provide a server extension that automatically configures Jupyter Server's kernel REST API to replay.

Question: Why can't we solve this problem with just the kernel replay and replay all message to JupyterLab when it reconnects?

Answer: I (Zach) believe you can. If the messages are timestamped, you should be able to just replay all messages from the last time the user was connected and the client should collapse this to the current state of the notebook. There are advantages to making the server more notebook document aware though. We can elaborate on this more in a separate thread.

Aside comment to keep in mind: kernel gateway / enterprise gateway add an addition passthrough layer where kernel messages can be missed and we need replay options.

How does Jupyter's RTC efforts play into this?

Y-docs offer a solution for rebuilding a notebook model server-side from log of patches/diffs/messages. Maybe we can leverage this machinery to store and resolve (2) and (3) once we have a message replay system in place?

@fcollonval
Copy link
Member

fcollonval commented Jun 16, 2023

I would like to relay here a comment appearing in the meeting note of an important point that is missing in the above summary (thanks for it Zach): the state of the kernel waiting for an user input (e.g. a Python code using input built-in function).

@davidbrochart
Copy link
Contributor

I think that the issue with input is not only that we don't handle it in the YNotebook, but also that the frontend doesn't treat it as collaborative text. But it is indeed a small text editor, so it should be treated just like a cell if we want it to be collaborative, i.e. seen and/or editable by other users.

@Zsailer
Copy link
Member

Zsailer commented Jun 23, 2023

Hey folks, I'm going to be out-of-the-office next week and will miss the Jupyter Server meeting (6/29).

If folks want to still meet and discuss, please feel free to do so!

Otherwise, I'll be back for the next meeting on 7/6. Until then, I'll work on setting up a new repo and drafting a loose roadmap of the work ahead of us. Cheers everyone!

@Wh1isper
Copy link
Contributor

Wh1isper commented Jul 31, 2023

Very glad to hear that!

State retention of the JupyterLab front-end and the ipynb file itself is a major issue due to messaging delays and network effects, and will significantly impact the user experience if Jupyter Server is running in the cloud (This is the main problem I had before: JupyterLab always prompts whether to overwrite the file or not).

Based on this, I'm in favor of @davidbrochart‘s comment and jupyverse's solution: use Y-CRDT to establish consistency, which only requires code execution and result writes to be put into the backend

I've put jupyter_kernel_executor on hold for now due to a change in focus at work. In this plugin, the user can execute the code through the HTTP interface, and the Jupyter server's handler performs the parsing of the zmq results and write them to the file, using a form that converts http to a localhost websocket connection (we can even connect to another Jupyter server, i.e., a remote kernel).

I regard this feature as a port of JupyterLab's ability to execute code and write to a file to Jupyter Server, triggered by a button. The input is file+kernel+cellid and the process is that Jupyter Server establishes a websocket connection and writes the result of the code run to the file continuously.JupyterLab will get the updates through Jupyter's RTC feature.

EDIT:

This picture simply illustrates my thoughts. Hope this helps. Thank you all.

image

@davidbrochart
Copy link
Contributor

I made some progress towards restoring notebook state, using jupyverse/jpterm:

Peek.2023-11-09.11-29.mp4

@Wh1isper
Copy link
Contributor

Wh1isper commented Nov 9, 2023

I made some progress towards restoring notebook state, using jupyverse/jpterm:

Peek.2023-11-09.11-29.mp4

Forgive me for not following the development of the project for too long. This is exciting. 👏

Is it because you open another client for writing(as collaborative)? Or have we implemented Jupyter Server to write directly to files or replay messages? (And replaying the message doesn't solve the problem of the output not being saved, it just offers the possibility of a delayed save)

UPDATE:

There is a YDoc representing the notebook but not in the kernel. This YDoc lives in the (jupyverse) server and in the (jpterm) client.
I'm currently working on doing the same for widgets, and in this case yes, there will be a YDoc representing the widget in the kernel, in the server and in the client. The widget will synchronize between the kernel and the server using the Comm protocol, and between the server and the client using a WebSocket. It will allow widget state restore as well.
jupyterlab/jupyterlab#2833 (comment)

It seems to go further with my comment above that using Y-CRDT(YDOC) to establish consistency, we can develop a wide variety of applications 👍

@davidbrochart
Copy link
Contributor

And here is a demo showing full notebook state recovery, including widgets:

Peek.2023-11-10.11-29.mp4

@sa-
Copy link

sa- commented Nov 10, 2023

Having a ydoc in the kernel is great, because this will work with kernel gateway and enterprise gateway too

@ivanov ivanov changed the title Proposal: Restore Notebook execution progress when a browser page is reload Proposal: Restore Notebook execution progress when a browser page is reloaded Nov 17, 2023
@3f6a
Copy link

3f6a commented Apr 12, 2024

Curious to know what's the current status of this? Specially now that jupyterlab/jupyter-collaboration#279 is merged? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests