Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aiohttp does not try connecting to multiple IPs if first connect times out #7342

Open
1 task done
brettdh opened this issue Jul 7, 2023 · 6 comments · May be fixed by #7368
Open
1 task done

aiohttp does not try connecting to multiple IPs if first connect times out #7342

brettdh opened this issue Jul 7, 2023 · 6 comments · May be fixed by #7368
Labels

Comments

@brettdh
Copy link

brettdh commented Jul 7, 2023

Describe the bug

The ability to try connecting all addresses in a DNS record was added in #2447, some time ago. However, when using aiohttp in a static IP setup with entries in /etc/hosts, if the connection to the first address for a hostname times out, aiohttp gives up and does not try connecting to other hosts. I confirmed this by adding a print statement here, in the loop over hosts, before the connection attempt; only the first address was printed before the ServerTimeoutError exception was raised.

This appears to be because of a too-narrow except block, which catches only ClientConnectorError - but the error raised on connection timeout is ServerTimeoutError. It's possible that this except block should use ClientConnectionError instead, which is ServerTimeoutError's grandparent class (via ServerConnectionError). However, I don't know whether that's correct or not in the larger context.

To Reproduce

Note: I did my testing with https, since I already had a certificate for a static IP hostname set up; the steps and script could probably be adjusted to use http instead, which would make the repro simpler (and would not affect its validity).

  1. Prepare a Linux environment with these installed (I used a docker container built from the stock python:3.11.4 image):
    • Python 3
    • aiohttp
    • iptables
  2. Copy the script below into the environment
  3. Change the ips array to contain two real IP addresses:
    • The first can be any routable IP address; it doesn't have to be a real server
    • The second should be a server you control. serving https on port 443
  4. Change the host string to the hostname matching a certificate that your https server above is using
  5. Run the script
#!/usr/bin/env python3

import asyncio
from subprocess import run, check_call

import aiohttp

# Update these to be valid IPs
ips = ["a.b.c.d", "a.b.c.e"] 
host = "my-static-hostname.example.com"

etc_hosts = "\n".join([f"{ip} {host}" for ip in ips])

url = f"https://{host}"
timeout = 3


async def main():
    # 1. Set up static IP
    with open("/etc/hosts") as f:
        content = f.read()

    if etc_hosts not in content:
        with open("/etc/hosts", "a") as f:
            print(etc_hosts, file=f)

    # 2. Drop all traffic to first IP
    ssl_opts = ["-p", "tcp", "-m", "tcp", "--dport", "443"]
    opts = ["OUTPUT", "-d", ips[0], *ssl_opts, "-j", "DROP"]
    run(["iptables", "-D", *opts])  # remove from previous run if present
    check_call(["iptables", "-A", *opts])

    # 3. Try to connect
    async with aiohttp.ClientSession(conn_timeout=timeout) as client:
        async with client.get(url) as r:
            print(r.status)


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())


### Expected behavior

Whether via `/etc/hosts` or a DNS server response with multiple IPs, aiohttp should try connecting to all IPs in the resolved list, regardless of the reason for connection failure, until it has successfully connected to one IP or failed to connect to them all.

### Logs/tracebacks

```python-traceback
Note: actual IPs and hostnames replaced with placeholders.


(venv) root@39d71fd1e451:/src# python ./docker/static_ip_failover.py
iptables: Bad rule (does a matching rule exist in that chain?).
/src/./docker/static_ip_failover.py:31: DeprecationWarning: conn_timeout is deprecated, use timeout argument instead
  async with aiohttp.ClientSession(conn_timeout=timeout) as client:
Trying to connect to a.b.c.d
Traceback (most recent call last):
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/client.py", line 536, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/connector.py", line 540, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/connector.py", line 901, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/connector.py", line 1176, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1069, in create_connection
    sock = await self._connect_sock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 973, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 634, in sock_connect
    return await fut
           ^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/client.py", line 534, in _request
    async with ceil_timeout(real_timeout.connect):
  File "/src/build_output/venv/lib/python3.11/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/src/build_output/venv/lib/python3.11/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/src/./docker/static_ip_failover.py", line 38, in <module>
    loop.run_until_complete(main())
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/src/./docker/static_ip_failover.py", line 32, in main
    async with client.get(url) as r:
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
                 ^^^^^^^^^^^^^^^^
  File "/src/build_output/venv/lib/python3.11/site-packages/aiohttp/client.py", line 540, in _request
    raise ServerTimeoutError(
aiohttp.client_exceptions.ServerTimeoutError: Connection timeout to host https://my-static-hostname.example.com


### Python Version

```console
$ python --version
Python 3.11.4

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.8.4
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author:
Author-email:
License: Apache 2
Location: /src/build_output/venv/lib/python3.11/site-packages
Requires: aiosignal, async-timeout, attrs, charset-normalizer, frozenlist, multidict, yarl
Required-by: mypkg

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.4
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /src/build_output/venv/lib/python3.11/site-packages
Requires:
Required-by: aiohttp, yarl

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.9.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache-2.0
Location: /src/build_output/venv/lib/python3.11/site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux 39d71fd1e451 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 x86_64 GNU/Linux

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@brettdh brettdh added the bug label Jul 7, 2023
@brettdh
Copy link
Author

brettdh commented Jul 7, 2023

Just noticed that it's an asyncio.exceptions.CancelledError that occurs on connection timeout, which of course doesn't get caught by the except ClientConnectorError block. Not sure why that's the case.

@Dreamsorcerer
Copy link
Member

Would be great if you could create a regression test in a PR for this. Fix looks pretty simple, just need to catch asyncio.TimeoutError.

We probably also need to adjust the default timeout to make sure this works by default (currently, only total timeout is specified, which means it will wait until the end of the timeout on the first attempt and then cancel the whole client request; sock_connect needs to be set to something smaller).

@brettdh
Copy link
Author

brettdh commented Jul 7, 2023

currently, only total timeout is specified, which means it will wait until the end of the timeout on the first attempt and then cancel the whole client request

Ah, that would explain why my test saw an uncaught asyncio.CancelledError rather than asyncio.TimeoutError. Presumably we should catch both in the loop, though? Maybe others as well?

I can look at creating a PR; I'm just still unsure what the desired behavior is - i.e. what various errors should fail over to the next host in the list, and which should be raised to the caller.

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Jul 8, 2023

Ah, that would explain why my test saw an uncaught asyncio.CancelledError rather than asyncio.TimeoutError. Presumably we should catch both in the loop, though? Maybe others as well?

I think you just misread the traceback. The CancelledError is triggered by the timeout code, which changes it to a TimeoutError, then at a later point back in the Client code that becomes a ServerTimeoutError. At the point of that loop, it is a TimeoutError that we need to catch (if there's a different CancelledError, then the task is being asked to cancel, so we certainly shouldn't ignore it).

@brettdh
Copy link
Author

brettdh commented Jul 10, 2023

In a previous test, I was using the connect kwarg for ClientTimeout rather than sock_connect, and I was seeing an uncaught CancelledException. When I changed the client code to use sock_connect instead, the test raised TimeoutError as expected.

I guess it's a bit surprising that changing client code with an innocent-looking change like this:

-        timeout = aiohttp.ClientTimeout(total=30, connect=5)
+        timeout = aiohttp.ClientTimeout(total=30, sock_connect=5)

changes the type of the exception raised on connection timeout, but I can confirm that the loop works as expected when modified to catch TimeoutError and the client uses sock_connect like this. I'd guess that other users might find this as surprising as I did - particularly if aiohttp tries connecting to all IPs only if the caller sets sock_connect.

@Dreamsorcerer
Copy link
Member

Hmm, no, I still see ServerTimeoutError if using connect. This is because the timer used inside that loop is using the sock_connect timeout. The connect timeout is used at a higher level in the client (https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client.py#L507).

I'm not sure it makes sense to retry here, and not too sure what the best approach is, maybe we just need to tweak the documentation to highlight the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants