Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise AppStopped in function calls of stopping apps #1452

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

freider
Copy link
Contributor

@freider freider commented Mar 1, 2024

Refactor of how we "cancel" ongoing function runs when an app shuts down.

Previously we had a _mute_cancellations which just made us mute some errors that were spit out on interpreter shutdown when ongoing function polling got cancelled.

Now we instead add a Future that gets resolved with an Exception object when the local stub gets a termination call, which we in turn trigger from both KeyboardInterrupt (which was the old case of muted cancellations) and now also from the log loop when it receieves app_done=True, indicating that no results will ever come in on the polled functions.

We could extend this with more failure states in other parts of the code (e.g. the app heartbeat loop), but to be truly rigorous about this we should add support in our longpolling backend calls (like FunctionGetOutput) for catching app terminations and similar and return immediately with an error indicating it so no long-pollers sit around unecessarily/forever.

Gets rid of the ugly cancellation error muting we did before, in favor of
using an asyncio.Event that is fired when the local app stops either for
a sigint/KeyboardInterrupt *or* due to an app_done signal being received
in the logs.

This fixes the massive tracebacks that were previously spit out when someone
stopped the app in the web interface while having a `modal run` CLI session running.
else:
output_mgr.hide_status_spinner()
output_mgr._visible_progress = False
pty_shell_finish_event = asyncio.Event()
asyncio.create_task(stream_pty_shell_input(client, log_batch.pty_exec_id, pty_shell_finish_event))
task_context.create_task(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed we had this dangling asyncio task which could randomly get cancelled when it gets garbage collected, so added a TaskContext to give it an explicit life time 😬

@@ -563,7 +569,7 @@ class _Function(_Object, type_prefix="fu"):
# TODO: more type annotations
_info: Optional[FunctionInfo]
_all_mounts: Collection[_Mount]
_stub: "modal.stub._Stub"
_stub: Optional["modal.stub._Stub"] = None # not set in case of external lookups
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _stub could actually be unset before, now it's more explicitly set to None in those cases (mainly Functions received from Function.lookup())

yield item
)

while 1:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should roughly be equivalent to async for item in _map_invocation(...)? Let me know if I missed something here?

@@ -257,10 +257,11 @@ async def test_volume_upload_large_file(client, tmp_path, servicer, blob_server,
async def test_volume_upload_file_timeout(client, tmp_path, servicer, blob_server, *args):
call_count = 0

def mount_put_file(_request):
async def mount_put_file(self, stream):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated fallout from my refactor of grpc_testing above...

@freider freider changed the title Freider/stop on app done Raise AppStopped in function calls of stopping apps Mar 1, 2024
@freider freider requested a review from mwaskom March 1, 2024 15:55
@freider
Copy link
Contributor Author

freider commented Mar 1, 2024

A bit more messy than I wanted to make it, but it's due to the fact that functions are not necessarily tied to an active app/stub in the user's current scope (in case of lookups). For that reason we should probably add backend support for terminating ongoing function calls as well (sending out special termination error outputs for all ongoing function calls when an app stops).

In general it should be a bit better than simply ignoring CancellationErrors like we did before though.

@@ -88,6 +88,12 @@ class DeprecationError(UserWarning):
# Overloading it to evade the default filter, which excludes __main__.


class AppStopped(Error):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should inherit from BaseException? Similar to other "exit" exceptions: https://docs.python.org/3/library/exceptions.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But on the other hand, it's not always meant to exit the interpreter as it's only an indicator that the remote app has stopped. Like in a script you wouldn't necessarily want that to exit your script (and if not a script, then it's our CLI so we can handle it as we like?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh got it, this is not in the app that gets stopped itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants