Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add isConnected to ServiceManager, use it in hub-extension #10156

Merged

Conversation

vkaidalov-rft
Copy link
Member

@vkaidalov-rft vkaidalov-rft commented Apr 26, 2021

References

Resolves #7756.

Code changes

Currently, the periodic HTTP requests are sent by KernelManager, SessionManager and the other specific managers created within the common ServiceManager instance. Each specific manager creates an instance of the Poll class from the lumino library. The instances are configured in such a way that the frequency of the polling is getting gradually reduced in case of unsuccessful results of the requests.

The Poll instance used by the FileBrowserModel is created in the filebrowser package and is not connected to the ServiceManager. An instance of the ServiceManager is still available there.

The SaveHandler class of the docmanager package also tries to save the file that it handles if there are any unsaved changes. It does it periodically using the setTimeout function. In case of a server shutdown, the unsaved files cause a lot of logs due to the retries.

As a solution, the bandwidthSaveMode boolean property was added to the ServiceManager class, so that the extensions could switch it to inform the other extensions whether to pause sending periodical requests. This property can also be used by third-party extensions, e.g. to pause polling the /clusters endpoint in the Dask extension for JupyterLab that also sends periodic requests.

Given the fact that there had already been the "decorrelated jitter" strategy implemented in lumino and used, I only added switching the bandwidthSaveMode property in the hub-extension, so that the bandwidthSaveMode is turned on if there is a Dialog widget showing a disconnection error message created by the hub-extension in particular. I suppose that the issue is not related to the scenario when JupyterLab is launched locally, so this PR doesn't modify the current behaviour in any other cases. I'm looking forward to your reviews!

User-facing changes

For those who deploy and maintain JupyterLab under JupyterHub, the changes will significantly reduce the number of logs produced by JupyterHub in case of both intended (e.g. server maintanence or the cull-idle-servers service of JupyterHub enabled) and unintended shutdowns of the individual notebook servers.

Before

Despite the disconnection error, the periodic requests are still sent creating a lot of 302 and 503 log records. To reproduce this, one can manually shut down the notebook server via the Hub Control Panel.
how-it-currently-works-browser
how-it-currently-works-hub-logs

After

The polling gets paused if there is a disconnection error message created by the hub-extension.
solution-demo-browser
solution-demo-hub-logs

Backwards-incompatible changes

No backwards-incompatible changes have been found by me so far.

@jupyterlab-dev-mode
Copy link

Thanks for making a pull request to JupyterLab!

To try out this branch on binder, follow this link: Binder

@afshin
Copy link
Member

afshin commented Apr 26, 2021

Hi @vkaidalov-rft! Thank you for your contribution!

I think this approach makes sense and is a sensible addition. The one misgiving I have is about where the bandwidthSaveMode should reside. I don't believe it should be in the @jupyterlab/services package because that package isn't really respecting its value anyway.

To me, this sounds more like an attribute of the specific application instance at runtime, so I would propose creating an attribute in JupyterLab.IInfo and then it is automatically exposed by the @jupyterlab/application-extension:info plugin that anyone can require in their extension.

The downside of this is that you'll need to thread the info through, but semantically it seems like it is a more natural location for this information.

What do you think?

@vkaidalov-rft vkaidalov-rft force-pushed the no-polling-if-disconnected-from-hub branch from 7240863 to e68b01a Compare April 28, 2021 13:38
@vkaidalov-rft
Copy link
Member Author

Hello @afshin. Thank you for the review!
I think you are right about moving the bandwidthSaveMode to JupyterLab.IInfo. I moved it there and refactored the code in such a way that the extensions would use the @jupyterlab/application-extension:info as an optional plugin. Generally, I tried to make these changes to be the least backwards-incompatible as possible. Moreover, only the @jupyterlab/hub-extension switches the bandwidthSaveMode for now, so that the JupyterLab-under-JupyterHub deployments will only be affected by the changes, please review.

@jasongrout
Copy link
Contributor

jasongrout commented May 17, 2021

Thanks for looking into this. We're definitely not being very smart about server down conditions.

A couple of thoughts:

  • I think each poller should be responding intelligently to 503 errors, perhaps looking at the retry-after header to see how long to pause for, and perhaps doing an exponential backoff as a fallback. Do you think this might solve the issue without having a central switch? It doesn't communicate intent or outside knowledge of the server situation, and may make it difficult to restart polling (how do you tell everyone to restart?), but it does help take care of the errors and handles the generic case better.
  • I think the name "bandwidthSave" communicates more intent than needed here. I would imagine bandwidth saving is for low-bandwidth situations (perhaps a mobile network?), and should limit, but not stop, all requests to any server (not just the Jupyter server). In other words, to me, saving bandwidth is a constraint imposed by the user, not a reaction to a server condition.
  • It sounds like we're really trying to communicate that we know from outside sources that we cannot reach the Jupyter server, and we will be able to tell everyone at some future time when we can reach the Jupyter server again. Is that an accurate statement?

@afshin
Copy link
Member

afshin commented May 18, 2021

@jasongrout Thanks for a second look at @vkaidalov-rft's PR. Since this will affect the API surface area, more deliberation is good.

I think the name "bandwidthSave" communicates more intent than needed here. I would imagine bandwidth saving is for low-bandwidth situations (perhaps a mobile network?), and should limit, but not stop, all requests to any server (not just the Jupyter server). In other words, to me, saving bandwidth is a constraint imposed by the user, not a reaction to a server condition.

I didn't know how to think about this, but your description makes me wonder if what we actually wanted is something like a connected signal and an isConnected attribute or something in that neighborhood. What do you think?

@jasongrout
Copy link
Contributor

I didn't know how to think about this, but your description makes me wonder if what we actually wanted is something like a connected signal and an isConnected attribute or something in that neighborhood. What do you think?

Yes, I think that conveys the intent better, and is similar to the terminology we use in the services package for kernel connections.

@goanpeca
Copy link
Member

Hi @vkaidalov-rft could you update the PR and rebase/merge with master?

Thanks!

@vkaidalov-rft
Copy link
Member Author

Hi @goanpeca! Sure, I will be glad to finish the PR. Thank you for reminding me.

Hi @jasongrout! Thank you a lot for your feedback.

I think each poller should be responding intelligently to 503 errors, perhaps looking at the retry-after header to see how long to pause for, and perhaps doing an exponential backoff as a fallback.

As far as I know, the retry-after header is indeed not taken into account by the current implementation. However, I don't think that JupyterHub may set this header to a useful value in the case of the server maintenance described in the linked issue by @n-a-sz or in the case of the cull-idle-servers service used in my JupyterHub deployment. The single-user servers are being shut down in both cases, so the users have to restart their servers explicitly by pressing the "Restart" button in the error Dialog showed by the hub-extension or by refreshing the page.

The exponential backoff strategy has already been used by the pollers in the case of failed requests. The number of the periodical requests sent to the cluster showed itself to stay huge in practice even though the frequency was getting reduced. The goal of this PR is to stop polling the cluster unless the error Dialog showed by the hub-extension has been resolved by the user.

I think the name "bandwidthSave" communicates more intent than needed here.

I totally agree with you, let's rename it. As @afshin proposed, it might be named isConnected or something like that. If we stick to the current idea, then the name should imply the possibility for the UI elements, such as the error Dialog, to prevent JupyterLab from making periodical requests unless certain actions have been performed by the user in reply. Therefore, I can still only see this as a central switch so far. What do you think?

@fcollonval
Copy link
Member

Hey @vkaidalov-rft

This will be a great addition. I would support getting in it. Could you do the rename to isConnected? For the signal, I would let it aside for now as there is no use case for it at the moment.

@vkaidalov-rft vkaidalov-rft force-pushed the no-polling-if-disconnected-from-hub branch from 250ac1d to 2775683 Compare August 10, 2021 19:49
@vkaidalov-rft
Copy link
Member Author

Hi @fcollonval. Thank you for your feedback!

I've done the rename to isConnected, please review.

Copy link
Member

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviving your PR @vkaidalov-rft.

@fcollonval fcollonval merged commit e671140 into jupyterlab:master Aug 13, 2021
@fcollonval fcollonval added this to the 4.0 milestone Aug 13, 2021
@fcollonval fcollonval changed the title Add bandwidthSaveMode to ServiceManager, use it in hub-extension Add isConnected to ServiceManager, use it in hub-extension Aug 13, 2021
@github-actions github-actions bot added the status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Feb 10, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement pkg:application pkg:docmanager pkg:filebrowser pkg:hub pkg:services status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JupyterLab makes too many requests resulting in 503 while the backend is not reachable
5 participants