Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CancelledError #6719

Closed
1 task done
H--o-l opened this issue Apr 26, 2022 · 25 comments · Fixed by #6727
Closed
1 task done

CancelledError #6719

H--o-l opened this issue Apr 26, 2022 · 25 comments · Fixed by #6727
Labels

Comments

@H--o-l
Copy link
Contributor

H--o-l commented Apr 26, 2022

Describe the bug

Using the aiohttp web server, I struggle with asyncio.CancelledError() when a client disconnect for some time.
I shielded what needed to be but it has its limits.
Today I found this old PR:
#4080

It looks like my issue.
If I understand correctly asyncio.CancelledError() was fixed at the beginning of v3.7.0 (1th October 2019).
But I still have frequent asyncio.CancelledError() in my server.

I have investigated a bit more and I find another PR, at the end of v3.7.0 (just before v3.7.0b0 I think) that for me re-introduce the issue:
#4415

For me the wrong code is the following:

        if self._task_handler is not None:
            self._task_handler.cancel()

It was removed by #4080 (and with all the documentation regarding this issue) but re-introduced silently by #4415.

I'm clearly not an expert so I'm not sure of what I'm saying, but does it make sense for you?

To Reproduce

NA

Expected behavior

NA

Logs/tracebacks

NA

Python Version

3.10.4

aiohttp Version

3.8.1

multidict Version

5.1.0

yarl Version

1.6.3

OS

Fedora 35

Related component

Server

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@H--o-l H--o-l added the bug label Apr 26, 2022
@Dreamsorcerer
Copy link
Member

I think the first one caused a regression (as mentioned at #4408 (comment)), so the second one has reverted that.

Not sure if the documentation should be reverted or if the plan is to improve this later. Will need some feedback from @asvetlov. Personally, I like the fact that the handler can be aborted as soon as the connection has dropped.

@H--o-l
Copy link
Contributor Author

H--o-l commented Apr 27, 2022

Hey @Dreamsorcerer thanks a lot for the quick answer!

In the following comment that proposed a fix to the regression it was only about self._waiter.cancel() not self._task_handler.cancel():
#4415 (comment)

The final fix of the regression does both self._waiter.cancel() and self._task_handler.cancel(), maybe self._task_handler.cancel() was a mistake?

Not sure if the documentation should be reverted or if the plan is to improve this later.

OK!

Personally, I like the fact that the handler can be aborted as soon as the connection has dropped.

I find a lot of all issues on the subject, most closed because the subject was thought to be close.
Personally, the solution I prefer would be one discussed in the below issue, with an allow_cancel parameter per route:
#2492 (comment)

The solution of #4080 is not ok for me, I need some part of my code to keep executing even when the client disconnect.
But at least the situation would be clearer, and easier to hack in a future proof way, if the code and doc were aligned.

@Dreamsorcerer
Copy link
Member

Personally, the solution I prefer would be one discussed in the below issue, with an allow_cancel parameter per route: #2492 (comment)

Well spotted. Appears there's already a solution in master which closes the issue, even if it's not allowing cancellation throughout the app.

I think you're probably right, and that line can be reverted. Please make a PR.
Have you already tested the change fixes it for you? Ideally, with -Wall -X dev, so you can check those warnings don't return.

The solution of #4080 is not ok for me, I need some part of my code to keep executing even when the client disconnect.

Well, that's what asyncio.shield() is for, to avoid cancellation. So, you can still use that for now, until the revert is released.

@H--o-l
Copy link
Contributor Author

H--o-l commented Apr 28, 2022

Hey @Dreamsorcerer !

Have you already tested the change fixes it for you? Ideally, with -Wall -X dev, so you can check those warnings don't return.

I'm sorry, I don't understand the -Wall -X dev you are talking about, I think I don't know the project well enough.

I still tried a few things.

With the simplest server example from here https://github.com/aio-libs/aiohttp/tree/0215cdd4275ea0150f169545b25b66e1f5dc69f0#server, I reverted #4415 with the following diff:

--- a/aiohttp/web_protocol.py
+++ b/aiohttp/web_protocol.py
@@ -331,10 +331,10 @@ class RequestHandler(BaseProtocol):
                 exc = ConnectionResetError("Connection lost")
             self._current_request._cancel(exc)
 
-        if self._task_handler is not None:
-            self._task_handler.cancel()
-        if self._waiter is not None:
-            self._waiter.cancel()
 
         self._task_handler = None

However, I'm not able to reproduce the issue described in #4408, so I can't test if I will re-create the regression or not.
I tried both on master and on v3.8.0.

I can't try on v3.7.0 because my server doesn't start:

$ python test.py                                                                               16:52:00
Traceback (most recent call last):
  File "/home/hoel/Tolteck/test.py", line 2, in <module>
    from aiohttp import web
  File "/home/hoel/.local/lib/python3.10/site-packages/aiohttp/__init__.py", line 6, in <module>
    from .client import BaseConnector as BaseConnector
  File "/home/hoel/.local/lib/python3.10/site-packages/aiohttp/client.py", line 35, in <module>
    from . import hdrs, http, payload
  File "/home/hoel/.local/lib/python3.10/site-packages/aiohttp/http.py", line 7, in <module>
    from .http_parser import HeadersParser as HeadersParser
  File "/home/hoel/.local/lib/python3.10/site-packages/aiohttp/http_parser.py", line 15, in <module>
    from .helpers import NO_EXTENSIONS, BaseTimerContext
  File "/home/hoel/.local/lib/python3.10/site-packages/aiohttp/helpers.py", line 612, in <module>
    class CeilTimeout(async_timeout.timeout):
TypeError: function() argument 'code' must be code, not str

I did another test with the following smaller diff:

--- a/aiohttp/web_protocol.py
+++ b/aiohttp/web_protocol.py
@@ -331,8 +331,8 @@ class RequestHandler(BaseProtocol):
                 exc = ConnectionResetError("Connection lost")
             self._current_request._cancel(exc)
 
-        if self._task_handler is not None:
-            self._task_handler.cancel()
         if self._waiter is not None:
             self._waiter.cancel()

With the above diff, on client disconnection, I don't have CancelledError exception anymore, and in fact I don't have any exception, no OSError.
I think the doc say we should have OSError https://docs.aiohttp.org/en/stable/web_advanced.html?highlight=disconnection#peer-disconnection so my fix doesn't do what the documentation expects.

For info I'm using Python 3.10.4

What do you think?

@Dreamsorcerer
Copy link
Member

I'm sorry, I don't understand the -Wall -X dev you are talking about, I think I don't know the project well enough.

python -Wall -X dev
It enables all warnings and developer mode features in python.

I can't try on v3.7.0 because my server doesn't start:

That error is caused by an incompatible version of async-timeout. You would need to reinstall the correct dependencies when switching between versions.

With the above diff, on client disconnection, I don't have CancelledError exception anymore, and in fact I don't have any exception, no OSError.

It says you'll get OSError when reading/writing. It also says that aiohttp will handle it, so I'd expect no exception to appear in your logs. If you want to test it, you'd probably have to add something to the end of your handler (i.e. after the disconnection happens) like:

try:
    await request.text()
except OSError:
    print("OSError occurred")

H--o-l added a commit to H--o-l/aiohttp that referenced this issue Apr 29, 2022
`asyncio.CancelledError()` on peer disconnection have been removed by aio-libs#4080,
but aio-libs#4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by aio-libs#4415,
but aio-libs#4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both aio-libs#4080 and aio-libs#4415 will
be fixed.

To test that I re-resolved aio-libs#4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce aio-libs#4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.
H--o-l added a commit to H--o-l/aiohttp that referenced this issue Apr 29, 2022
`asyncio.CancelledError()` on peer disconnection have been removed by aio-libs#4080,
but aio-libs#4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by aio-libs#4415,
but aio-libs#4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both aio-libs#4080 and aio-libs#4415 will
be fixed.

To test that I re-resolved aio-libs#4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce aio-libs#4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because aio-libs#4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

fixes aio-libs#6719
H--o-l added a commit to H--o-l/aiohttp that referenced this issue Apr 29, 2022
`asyncio.CancelledError()` on peer disconnection have been removed by aio-libs#4080,
but aio-libs#4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by aio-libs#4415,
but aio-libs#4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both aio-libs#4080 and aio-libs#4415 will
be fixed.

To test that I re-resolved aio-libs#4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce aio-libs#4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because aio-libs#4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

Tested on master and on v3.8.1.

fixes aio-libs#6719
H--o-l added a commit to H--o-l/aiohttp that referenced this issue Apr 29, 2022
`asyncio.CancelledError()` on peer disconnection have been removed by aio-libs#4080,
but aio-libs#4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by aio-libs#4415,
but aio-libs#4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both aio-libs#4080 and aio-libs#4415 will
be fixed.

To test that I re-resolved aio-libs#4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce aio-libs#4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because aio-libs#4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

Tested on master and on v3.8.1.

fixes aio-libs#6719
@H--o-l
Copy link
Contributor Author

H--o-l commented Apr 29, 2022

@Dreamsorcerer Thanks for your help! PR opened #6727

@Dreamsorcerer
Copy link
Member

Any chance you managed to reproduce the warnings from #4408 with python -Wall -X dev?

@H--o-l
Copy link
Contributor Author

H--o-l commented Apr 29, 2022

Yes I did! You can see in the PR I talk about it in the end of the commit&PR message.

@Dreamsorcerer
Copy link
Member

Ah, perfect, haven't got to the PR just yet. Thanks.

Dreamsorcerer added a commit that referenced this issue May 8, 2022
* Fix asyncio.CancelledError again (#6719)

`asyncio.CancelledError()` on peer disconnection have been removed by #4080,
but #4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by #4415,
but #4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both #4080 and #4415 will
be fixed.

To test that I re-resolved #4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce #4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because #4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

Tested on master and on v3.8.1.

fixes #6719

* Update 6719.bugfix

Co-authored-by: Sam Bull <aa6bs0@sambull.org>
patchback bot pushed a commit that referenced this issue May 8, 2022
* Fix asyncio.CancelledError again (#6719)

`asyncio.CancelledError()` on peer disconnection have been removed by #4080,
but #4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by #4415,
but #4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both #4080 and #4415 will
be fixed.

To test that I re-resolved #4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce #4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because #4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

Tested on master and on v3.8.1.

fixes #6719

* Update 6719.bugfix

Co-authored-by: Sam Bull <aa6bs0@sambull.org>
(cherry picked from commit adeece3)
patchback bot pushed a commit that referenced this issue May 8, 2022
* Fix asyncio.CancelledError again (#6719)

`asyncio.CancelledError()` on peer disconnection have been removed by #4080,
but #4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by #4415,
but #4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both #4080 and #4415 will
be fixed.

To test that I re-resolved #4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce #4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because #4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

Tested on master and on v3.8.1.

fixes #6719

* Update 6719.bugfix

Co-authored-by: Sam Bull <aa6bs0@sambull.org>
(cherry picked from commit adeece3)
Dreamsorcerer pushed a commit that referenced this issue May 8, 2022
* Fix asyncio.CancelledError again (#6719)

`asyncio.CancelledError()` on peer disconnection have been removed by #4080,
but #4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by #4415,
but #4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both #4080 and #4415 will
be fixed.

To test that I re-resolved #4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce #4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because #4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

Tested on master and on v3.8.1.

fixes #6719

* Update 6719.bugfix

Co-authored-by: Sam Bull <aa6bs0@sambull.org>
(cherry picked from commit adeece3)

Co-authored-by: Hoel IRIS <hoel.iris@gmail.com>
Dreamsorcerer pushed a commit that referenced this issue May 8, 2022
* Fix asyncio.CancelledError again (#6719)

`asyncio.CancelledError()` on peer disconnection have been removed by #4080,
but #4415 re-introduced it silently.

`self._waiter.cancel()` and `self._task_handler.cancel()` were added by #4415,
but #4415 in fact only needed `self._waiter.cancel()` (proof below).

So I propose to remove `self._task_handler.cancel()`, both #4080 and #4415 will
be fixed.

To test that I re-resolved #4080 I used:
```py
async def handle(request):
    try:
        await asyncio.sleep(5)
        await request.text()
    except BaseException as e:
        print(f'base exception {type(e).__name__}')
    return web.Response(text='toto')
```
```console
curl -X POST -H "Content-Type: text/plain" --data "bouh" localhost:8080
```
I kill the curl request before the 5 seconds of sleep.
Before this PR I have the following error right after killing the curl:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception CancelledError
```

After this commit I have the following error, but only after the 5 seconds
sleep:
```console
$ python -Wall -X dev test.py
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
base exception ConnectionResetError
```

To test that I didn't re-introduce #4415 I use a basic `handle` and 30
`curl localhost:8080`:

- Before this commit no issue
- If I remove `self._task_handler.cancel()` no issue
- If I remove both `self._task_handler.cancel()` and `self._waiter.cancel()`:

  ```console
  $ python -Wall -X dev test.py
  ======== Running on http://0.0.0.0:8080 ========
  (Press CTRL+C to quit)
  Task was destroyed but it is pending!
  task: <Task pending name='Task-4' coro=<RequestHandler.start() done, defined at ...
  Task was destroyed but it is pending!
  ...
  ```

So it's OK to remove only `self._task_handler.cancel()`.

There is no documentation or tests to be updated because #4080 already did that
part.

However I guess that for a few people it might be seen as a breaking change,
I'm not sure.

Tested on master and on v3.8.1.

fixes #6719

* Update 6719.bugfix

Co-authored-by: Sam Bull <aa6bs0@sambull.org>
(cherry picked from commit adeece3)

Co-authored-by: Hoel IRIS <hoel.iris@gmail.com>
@Tinche
Copy link

Tinche commented Sep 21, 2022

Sorry for necroposting, I'm just now going through the aiohttp changelog.

There's no way to restore the old behavior, right? Changing this was big mistake in my mind. Task cancellation is one of asyncio superpowers, and the ability to clean up server resources when a client gives up is invaluable.

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Sep 22, 2022

My understanding is that we received enough issues from users who did not expect this behaviour (and possibly didn't have the asyncio knowledge to handle it well), that the decision was made to remove it (back in the 3.7 release).

I kind of liked the cancellation feature, so I'd be open to accepting a PR which adds it back in as an opt-in feature.

@mosquito
Copy link
Contributor

@Dreamsorcerer in #7056 I'm trying to fix it

@tzoiker
Copy link

tzoiker commented Oct 31, 2022

To be honest, not cancelling by default on disconnections seems to be unintuitive to me. It seems that most of the time the client either wants some data or sends some to initiate changes. For the former there is no much sense processing any further on disconnect. For the latter, it can be done with background jobs while still responding to the client as quickly as possible.

Background jobs can always be initiated with asyncio.create_task. Otherwise, you can shield it if you definitely need to wait for its completion before responding.

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Oct 31, 2022

It seems that most of the time the client either wants some data or sends some to initiate changes. For the former there is no much sense processing any further on disconnect. For the latter, it can be done with background jobs while still responding to the client as quickly as possible.

Yes, that's the ideal flow. But, most users don't realise that the disconnection may happen, so they don't even think to create a background task and then are confused why an update didn't happen or only partially happened.

Background jobs can always be initiated with asyncio.create_task. Otherwise, you can shield it if you definitely need to wait for its completion before responding.

And this is somewhere we need to help educate users to do the right thing. You can't just use asyncio.create_task(), as you'd then need to keep track of the returned Task and remember to await it at some point. Failing to do so will result in unawaited warnings and missing exceptions in your code.

I think the best tool for these kind of fire-and-forget tasks we have currently is aiojobs, which will handle all that complexity for you.

@mosquito
Copy link
Contributor

Yes, that's the ideal flow. But, most users don't realise that the disconnection may happen, so they don't even think to create a background task and then are confused why an update didn't happen or only partially happened.

Safe code and simple code it's a tradeoff. It seems that there is a misunderstanding here, if somewhere you can create resources faster than the service can release resources, the server is always vulnerable to attacks of this kind.

It seems that this should be very detailed description in the documentation, and a safe default should be set.

I think the best tool for these kind of fire-and-forget tasks we have currently is aiojobs, which will handle all that complexity for you.

Great, let's give this recommendation to anyone who wants their tasks to run after the client connection is terminated.

We have a lot of services running on aiohttp and expect the client to not be able to spawn hundreds of thousands of tasks just by disconnecting from the server.

@H--o-l
Copy link
Contributor Author

H--o-l commented Nov 1, 2022

@mosquito
As long as your suggestion is opt-in I have nothing against it.
Other developers know what they are doing too, don't think you are the only one. You have a use case, we have other.
Shield was a headache for us, as catching exceptions in third-party middleware when the corresponding user had disconnected was really hard to do.
As for aiojobs: add dependencies and complexity in your project if you want, but don't force others to do it, by thinking your use-case is the only one that deserves to be the default.

Last thing, this PR only re-aligned aiohttp with its documentation, as the documentation made the change in v3.7. please discuss this change in the original v3.7 PR otherwise you are spamming me.

Dreamsorcerer added a commit that referenced this issue Dec 11, 2022
… when client connection closing. (#7056)

<!-- Thank you for your contribution! -->

## Related to #6719 #6727. Added a configuration flag for enable request
task handler cancelling when client connection closing.

After changes in version 3.8.3, there is no longer any way to enable
this behaviour. In our services, we want to handle protocol-level
errors, for example for canceling the execution of a heavy query in the
DBMS if the user's connection is broken.

Now I created this PR in order to discuss my solution, of course if I
did everything well I will add tests changelog, etc.

<!-- Please give a short brief about these changes. -->

## I guess this breakdown can be solved using the configuration flag
that is passed to the Server instance.

Of course `AppRunner` and `SiteRunner` can pass this through `**kwargs`
too.

<!-- Outline any notable behaviour for the end users. -->

## Related issue number #6719

<!-- Are there any issues opened that will be resolved by merging this
change? -->

## Checklist

- [ ] I think the code is well written
- [ ] Unit tests for the changes exist
- [ ] Documentation reflects the changes
- [ ] If you provide code modification, please add yourself to
`CONTRIBUTORS.txt`
  * The format is &lt;Name&gt; &lt;Surname&gt;.
  * Please keep alphabetical order, the file is sorted by names.
- [ ] Add a new news fragment into the `CHANGES` folder
  * name it `<issue_id>.<type>` for example (588.bugfix)
* if you don't have an `issue_id` change it to the pr id after creating
the pr
  * ensure type is one of the following:
    * `.feature`: Signifying a new feature.
    * `.bugfix`: Signifying a bug fix.
    * `.doc`: Signifying a documentation improvement.
    * `.removal`: Signifying a deprecation or removal of public API.
* `.misc`: A ticket has been closed, but it is not of interest to users.
* Make sure to use full sentences with correct case and punctuation, for
example: "Fix issue with non-ascii contents in doctest text files."

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sam Bull <aa6bs0@sambull.org>
@balloob
Copy link
Contributor

balloob commented Dec 12, 2022

How come this was released in a patch release? This is a major behavioral and breaking change and should not have been put into 3.8.3. Instead, it should have been kept for 3.9.0, or even better 4.0.0.

@Dreamsorcerer
Copy link
Member

How come this was released in a patch release? This is a major behavioral and breaking change and should not have been put into 3.8.3. Instead, it should have been kept for 3.9.0, or even better 4.0.0.

It was released in 3.7. It was then accidentally reverted in 3.8, causing a regression, so was changed again in a patch release.

The memory issue you are experiencing sounds like a separate bug that should be looked at though. It doesn't seem like that should happen regardless of the cancellation method.

@H--o-l
Copy link
Contributor Author

H--o-l commented Dec 12, 2022

I think it was first released in a v3.7 beta or alpha, or RC or something like that, and reverted (only the code, not the doc, and inside an un-related commit) just before the final v3.7, so the code might never have reached an official final version before the v3.8.3 (but it was still listed in the changelog of the V3.7, and the documentation had changed).

@balloob
Copy link
Contributor

balloob commented Dec 12, 2022

@Dreamsorcerer the memory issue went away for Home Assistant users after downgrading to aiohttp 3.8.1.

It seems like there is a bug in 3.8.3 with StreamResponse which doesn't seem to raise on write when the client is gone. With 3.8.1 it gets cancelled when the client gets disconnected. I will try to make a small reproducing script for a new issue later today.

(update: unable to create a reproduce script but users issues went away, will dig further)

@Dreamsorcerer
Copy link
Member

Any chance you are using ssl_context?
Seems we have a reproducer that only occurs if the server is handling TLS: #7172

@jsbronder
Copy link

jsbronder commented May 22, 2023

How come this was released in a patch release? This is a major behavioral and breaking change and should not have been put into 3.8.3. Instead, it should have been kept for 3.9.0, or even better 4.0.0.

It was released in 3.7. It was then accidentally reverted in 3.8, causing a regression, so was changed again in a patch release.

The memory issue you are experiencing sounds like a separate bug that should be looked at though. It doesn't seem like that should happen regardless of the cancellation method.

I've been extensively relying on this behavior for at least 5 years, since
3.5.x, but probably for much longer. While I haven't used every stable
release, it was definitely the behavior in 3.7.4.post0 [1] and 3.6.2 [2].
Browsing further, it looks like it was the behavior in 3.7.0 [3] and 3.8.0 as
well [4]. Despite what documentation was changed, I really think this should
be reversed and only changed in a new major release, it's definitely a backward
incompatible change.

  1. if self._task_handler is not None:
  2. if self._task_handler is not None:
  3. if self._task_handler is not None:
  4. if self._task_handler is not None:

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented May 22, 2023

It's opt-in for 3.9 and we have tests covering it to catch regressions now.
Should be able to push a beta release soon.

@H--o-l
Copy link
Contributor Author

H--o-l commented May 23, 2023

I've been extensively relying on this behavior for at least 5 years, since
3.5.x, but probably for much longer. While I haven't used every stable
release, it was definitely the behavior in 3.7.4.post0 [1] and 3.6.2 [2].
Browsing further, it looks like it was the behavior in 3.7.0 [3] and 3.8.0 as
well [4].

Just to help the next person finding this issue, I think the change was first introduced 22 commits before v3.7.0b0:

$ git name-rev --tags --name-only b8ba3d0cd63db671454cbfa832d58d8477850ece
v3.7.0b0~22

Then wrongly reverted (only a tiny bit of code was reverted, and the documentation changes weren't reverted) 4 commits before v3.7.0b0:

$ git name-rev --tags --name-only 69de6fe9f098ca51637d1269783e734dca0f9a34
v3.7.0b0~4

Maybe for better history and follow-up, discussions about this change should be posted on #4080 and not here, as my PR just fixed the revert. Indeed the revert did too much compared to what was expected (see 69de6fe#diff-e979a7e0fdf2258350295395380bf4b52346031bc1da69e81982040c20c31662R1).

It's opt-in for 3.9 and we have tests covering it to catch regressions now.
Should be able to push a beta release soon.

Nice!
Just for ref, here is the link to the PR #7056

@jsbronder
Copy link

It's opt-in for 3.9 and we have tests covering it to catch regressions now. Should be able to push a beta release soon.

This is great and I'm happy to see the improvement. However my point is that this is a backward incompatible change made in a patch release and should be reverted in 3.8.5. That's assuming that the project is still following semver as stated here, https://github.com/aio-libs/aiohttp/blob/a91a8aeebc9823ebd04c1516b3465922190e97e2/docs/faq.rst#what-is-the-api-stability-and-deprecation-policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants