Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Server disconnected" on HTTP/2 connection pool with big keepalive_expiry. #2983

Open
jonathanslenders opened this issue Dec 5, 2023 · 4 comments

Comments

@jonathanslenders
Copy link

When httpx is used to connect to an http/2 server, and the server disconnects, then httpx will attempt to reuse this connection for the following request. It happens when keepalive_expiry in the httpx limits is bigger than the keep_alive_timeout of the HTTP/2 web server.

To reproduce, we can run a Hypercorn http/2 server using a dummy Starlette application like this (TLS is required because of http/2):

import asyncio
from hypercorn.asyncio import serve
from hypercorn.config import Config
from starlette.applications import Starlette

app = Starlette(routes=[])

config = Config.from_mapping({})
config.bind = ["127.0.0.1:8001"]
config.certfile = ".../webserver.pem"
config.keyfile = ".../webserver.key"
config.keyfile_password = "...."
config.keep_alive_timeout = 1  # Second (is 5 by default).

asyncio.run(serve(app, config))

Then run this client code:

import asyncio
import httpx

async def main() -> None:
      async with httpx.AsyncClient(
          verify=False,
          http2=True,
          limits=httpx.Limits(
              keepalive_expiry=100,
          ),
      ) as client:
          while True:
              await client.get("https://localhost:8001/")
              await asyncio.sleep(6)  # Should be bigger than the `keep_alive_timeout` Hypercorn config.

asyncio.run(main())

Important is to ensure http2=True is set and keepalive_expiry is None or a value bigger than the corresponding option of the Hypercorn server.

For the second request, httpx tries to reuse the connection from the connection pool although it has been closed by the other side in the meantime and fails, immediately producing the following Server disconnected error:

 Traceback (most recent call last):
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
      yield
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
      resp = await self._pool.handle_async_request(req)
    File ".../lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
      raise exc
    File ".../lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
      response = await connection.handle_async_request(request)
    File ".../lib/python3.9/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
      return await self._connection.handle_async_request(request)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 185, in handle_async_request
      raise exc
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 148, in handle_async_request
      status, headers = await self._receive_response(
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 292, in _receive_response
      event = await self._receive_stream_event(request, stream_id)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 333, in _receive_stream_event
      await self._receive_events(request, stream_id)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 361, in _receive_events
      events = await self._read_incoming_data(request)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 452, in _read_incoming_data
      raise exc
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 440, in _read_incoming_data
      raise RemoteProtocolError("Server disconnected")
  httpcore.RemoteProtocolError: Server disconnected

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    ...
    File ".../our-code.py", line 191, in make_stream_request
      async with client.stream(
    File ".../lib/python3.9/contextlib.py", line 181, in __aenter__
      return await self.gen.__anext__()
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1573, in stream
      response = await self.send(
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1617, in send
      response = await self._send_handling_auth(
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
      response = await self._send_handling_redirects(
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
      response = await self._send_single_request(request)
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1719, in _send_single_request
      response = await transport.handle_async_request(request)
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
      resp = await self._pool.handle_async_request(req)
    File ".../lib/python3.9/contextlib.py", line 137, in __exit__
      self.gen.throw(typ, value, traceback)
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
      raise mapped_exc(message) from exc
  httpx.RemoteProtocolError: Server disconnected

I started by commenting in this discussion: #2056 , but turned it in an issue after I had a clear reproducible case. Feel free to close if it's not appropriate.

@tomchristie
Copy link
Member

Thanks @jonathanslenders.

First places I'd start narrowing this down are...

Different HTTP versions.

  • Does this reproduce with HTTP/1.1 at all? (From my understanding you've suggested not, but am clarifying.)

Different clients.

  • Does this reproduce with httpx using a sync client?
  • Does this reproduce with httpx async running under trio?
  • Does this reproduce with other HTTP clients? (In particular requests, aiohttp, curl.)

Different servers.

  • Does this reproduce with both hypercorn and uvicorn?

@moshego189
Copy link

Hi @jonathanslenders
it seems to me that the problem is hypercorn specific
it just happened to me on a similar setup,
i think hypercorn is closing the connection in some unexpected way when using http2

@jonathanslenders
Copy link
Author

Thank you @tomchristie for looking into it!

I've been diving deeper. Maybe I have a fix (see below).

I think the problem is due to the complexity of terminating TLS connections. It's hard to half-close a TLS connection, and by doing so, the underlying TCP stream is not necessarily closed, which makes it very hard for the other end to observe a half closed TLS connection.

From httpx's point of view, when performing the second request, the HTTPConnectionState is IDLE, and _network_stream.get_extra_info('is_readable') is True. The TCP connection is still full open in both directions, and the data that is readable at this point could very well be TLS unwrap data.

Our is_socket_readable check works fine for non-TLS connections, but for a TLS connection, we have to decrypt any pending traffic to see whether the TLS stream was not terminated. Basically, that means passing any buffered data through the ssl.MemoryBio and see whether an EOF/ssl.SSLZeroReturnError comes out at the other end, indicating that the other party unwrapped the connection.

From Hypercorn's point of view, I can see that Hypercorn does call asyncio.StreamWriter.close() after the idle timeout. However, the transport is of type asyncio.sslproto._SSLProtocolTransport, which means there's no guarantee that the actual socket is closed. Looking at netstat, the TCP socket remains ESTABLISHED even after Hypercorn tries to close the writer. (@pgjones, you might want to look into this.)

Diving into httpcore, and looking at httpcore/_async/http2, in handle_async_request, I don't see is_socket_readable being called at any point in time. Where do we check when the connection is still alive for http/2+TLS connections? I don't think we do that.

The following httpcore patch does detect whether the connection is alive. I don't know which exception to raise, but this could be a starting point:

diff --git a/httpcore/_async/http2.py b/httpcore/_async/http2.py
index 8dc776f..e08e018 100644
--- a/httpcore/_async/http2.py
+++ b/httpcore/_async/http2.py
@@ -136,6 +136,10 @@ class AsyncHTTP2Connection(AsyncConnectionInterface):
             self._request_count -= 1
             raise ConnectionNotAvailable()

+        alive = await self._network_stream.is_alive()
+        if not alive:
+            raise Exception('Not alive!')
+
         try:
             kwargs = {"request": request, "stream_id": stream_id}
             async with Trace("send_request_headers", logger, request, kwargs):
diff --git a/httpcore/_backends/anyio.py b/httpcore/_backends/anyio.py
index 1ed5228..d18217f 100644
--- a/httpcore/_backends/anyio.py
+++ b/httpcore/_backends/anyio.py
@@ -1,5 +1,6 @@
 import ssl
 import typing
+from collections import deque

 import anyio

@@ -19,10 +20,13 @@ from .base import SOCKET_OPTION, AsyncNetworkBackend, AsyncNetworkStream
 class AnyIOStream(AsyncNetworkStream):
     def __init__(self, stream: anyio.abc.ByteStream) -> None:
         self._stream = stream
+        self._buffer = deque()

     async def read(
         self, max_bytes: int, timeout: typing.Optional[float] = None
     ) -> bytes:
+        if len(self._buffer) > 0:
+            return self._buffer.popleft()
         exc_map = {
             TimeoutError: ReadTimeout,
             anyio.BrokenResourceError: ReadError,
@@ -92,6 +96,25 @@ class AnyIOStream(AsyncNetworkStream):
             return is_socket_readable(sock)
         return None

+    async def is_alive(self) -> bool:
+        """
+        Test whether the connection is still alive. If this is a TLS stream,
+        then that is different from testing whether the socket is still alive.
+        It's possible that the other end did a half-close of the TLS stream,
+        but not a half-close of the underlying TCP stream.
+
+        We test whether it's alive by trying to read data. When we get b"", we
+        know EOF was reached.
+        """
+        try:
+            data = await self.read(max_bytes=1, timeout=0.01)
+        except ReadTimeout:
+            return True
+        if data == b'':
+            return False
+        self._buffer.append(data)
+        return True
+

 class AnyIOBackend(AsyncNetworkBackend):
     async def connect_tcp(

@zanieb
Copy link
Contributor

zanieb commented Dec 6, 2023

Looks similar to the previously resolved encode/httpcore#679

Thanks for all the detailed commentary!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants