Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: log runner errors explicitly #2952

Merged
merged 2 commits into from Aug 30, 2022
Merged

Conversation

ssheng
Copy link
Collaborator

@ssheng ssheng commented Aug 30, 2022

What does this PR address?

Catch and log errors raised inside runners explicitly. If not logged explicitly, Uvicorn will log the error but we will lose the trace context causing the exception log to contain no trace and span IDs.

2022-08-30T02:32:24-0700 [ERROR] [runner:iris_clf:1] Exception in ASGI application
Traceback (most recent call last):
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 86, in __call__
    raise exc from None
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 82, in __call__
    await self.app(scope, inner_receive, inner_send)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 482, in __call__
    await self.app(scope, otel_receive, otel_send)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/server/access.py", line 124, in __call__
    await self.app(scope, receive, wrapped_send)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
    raise exc
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
    await self.app(scope, receive, sender)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/routing.py", line 680, in __call__
    await route.handle(scope, receive, send)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
    await self.app(scope, receive, send)
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/server/runner_app.py", line 216, in _run
    ret = await runner_method.async_run(*params.args, **params.kwargs)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/runner/runner.py", line 51, in async_run
    return await self.runner._runner_handle.async_run_method(  # type: ignore
  File "/Users/ssheng/github/BentoML/bentoml/_internal/runner/runner_handle/local.py", line 57, in async_run_method
    return await anyio.to_thread.run_sync(
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/runner/runnable.py", line 139, in method
    return self.func(obj, *args, **kwargs)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/frameworks/sklearn.py", line 199, in _run
    raise ValueError("Test Error in runner")

With this fix, errors will be explicitly logged in runner_app and an HTTP 500 response will be explicitly returned. Returning the HTTP 500 error by BentoML is ideal because Uvicorn will otherwise close the underlying connection with connection: close header, see https://github.com/encode/uvicorn/blob/master/uvicorn/protocols/http/h11_impl.py#L436. Returning HTTP 500 response ourselves will preserve the connection for re-use. See new error message below.

2022-08-30T02:24:47-0700 [ERROR] [runner:iris_clf:1] Exception on runner 'iris_clf' method 'predict' (trace=131995672636859149405386882178741931655,span=5671561614823693742,sampled=0)
Traceback (most recent call last):
  File "/Users/ssheng/github/BentoML/bentoml/_internal/server/runner_app.py", line 220, in _run
    ret = await runner_method.async_run(*params.args, **params.kwargs)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/runner/runner.py", line 51, in async_run
    return await self.runner._runner_handle.async_run_method(  # type: ignore
  File "/Users/ssheng/github/BentoML/bentoml/_internal/runner/runner_handle/local.py", line 57, in async_run_method
    return await anyio.to_thread.run_sync(
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/miniconda3/envs/bentoml/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/runner/runnable.py", line 139, in method
    return self.func(obj, *args, **kwargs)
  File "/Users/ssheng/github/BentoML/bentoml/_internal/frameworks/sklearn.py", line 199, in _run
    raise ValueError("Test error in runner")
ValueError: Test error in runner

Thanks for reporting the error, @jiewpeng.

Before submitting:

Who can help review?

Feel free to tag members/contributors who can help review your PR.

@ssheng ssheng requested a review from a team as a code owner August 30, 2022 09:38
@ssheng ssheng requested review from parano and bojiang and removed request for a team August 30, 2022 09:38
@codecov
Copy link

codecov bot commented Aug 30, 2022

Codecov Report

Merging #2952 (343d8b5) into main (3de0c9a) will decrease coverage by 0.01%.
The diff coverage is 57.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2952      +/-   ##
==========================================
- Coverage   70.62%   70.60%   -0.02%     
==========================================
  Files         104      104              
  Lines        9534     9538       +4     
==========================================
+ Hits         6733     6734       +1     
- Misses       2801     2804       +3     
Impacted Files Coverage Δ
bentoml/_internal/server/runner_app.py 95.09% <57.14%> (-2.87%) ⬇️

@bojiang
Copy link
Member

bojiang commented Aug 30, 2022

@ssheng I think a more common way is to use https://www.starlette.io/exceptions/ . But I wonder if this solution breaks the benefit of connection re-using the current solution takes?

With this, we don't need to catch everywhere

@bojiang
Copy link
Member

bojiang commented Aug 30, 2022

Discussed with @ssheng. For the sake of a quick release, would accept this now and left the alternative to future discussion.

@ssheng
Copy link
Collaborator Author

ssheng commented Aug 30, 2022

@ssheng I think a more common way is to use https://www.starlette.io/exceptions/ . But I wonder if this solution breaks the benefit of connection re-using the current solution takes?

With this, we don't need to catch everywhere

Beside connection re-use, https://www.starlette.io/exceptions/ will also cause two exceptions to be logged. One from runner and one from Starlette server. It could be potentially confusing.

@aarnphm aarnphm merged commit 9e05c51 into bentoml:main Aug 30, 2022
@ssheng ssheng added this to the 1.0.6 milestone Aug 30, 2022
@ssheng ssheng deleted the runner-error-trace branch August 31, 2022 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants