Sanic drops part of HTTP response data #2921

xbeastx · 2024-02-25T17:12:29Z

Is there an existing issue for this?

I have searched the existing issues

Describe the bug

The bug is that Sanic closes the connection before the final transfer of all data (see example below)

From my point of view this is a very critical bug. For a reason unknown to me, it repeats only when starting the Sanic server inside the docker container (i.e., perhaps it is somehow related to the networkmode or delays during the work of the docker).

After analyzing the commits, we managed to understand that the bug was introduced in 1310684 and not repeated on d1fc867

Code snippet

So we have really simple server returning some json demo.py:

from sanic import Sanic
from sanic.response import json

app = Sanic("Demo")

@app.route("/")
async def handle(request):
    return json({'some_val': 'A'*743000})

if __name__ == "__main__":
    app.run(host="0.0.0.0")

Running in docker container:

FROM python:3.11

COPY demo.py demo.py
RUN pip install sanic==23.12.1
ENTRYPOINT python3 demo.py

$ docker build -t demo .
$ docker run -p 127.0.0.1:8000:8000 --rm -it demo

and the client.py:

import socket
import sys


def getsock(ip, port, req):
    s = socket.socket(2, 1, 6)
    s.connect((ip, port))
    s.send(req)
    return s


REQ = (
    b'GET / HTTP/1.1\r\n'
    b'User-Agent: Mozilla/5.0\r\n'
    b'Connection: close\r\n'
    b'\r\n'
)

s = getsock('127.0.0.1', 8000, REQ)

headers = s.recv(94).decode()

print(f'Headers: {headers!r}')

total = 0
by_len = int(sys.argv[1])
while (data := s.recv(by_len)):
    total += len(data)

print(f'Total length: {total}')

s.close()

I was not able to reproduce by curl, may be it's read too fast... But in real case it's repeats with Nginx proxy and Sanic as upstream.

so now if you will run hundred times you will get something like this:

$ for i in {1..100}; do python3 client.py 4096; done
...
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 652954
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 650058
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 652954
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
...

So length should be 743015 but sanic returns only 586346-652954.

client.py must be runner outside the docker container, e.g on host. If you will run inside the docker it will not reproduce.

Expected Behavior

Return all the data from response.

How do you run Sanic?

Sanic CLI

Operating System

Linux

Sanic Version

v23.12.1

Additional context

No response

The text was updated successfully, but these errors were encountered:

makeev · 2024-02-26T16:41:44Z

I can confirm this problem and this is a very critical bug.

makeev · 2024-02-27T09:59:17Z

I believe the issue lies within the close() method. Instead of directly calling self.abort(), it should be replaced with:

                timeout = self.app.config.GRACEFUL_SHUTDOWN_TIMEOUT
                self.loop.call_later(timeout, self.abort)

By making this adjustment, the connection will function correctly. Otherwise, nginx encounters an error stating upstream prematurely closed connection while reading upstream.

jhonsonlaid · 2024-03-21T02:29:46Z

I encountered the same issue. When utilizing the Python packages requests or aiohttp to send requests to a Sanic server, everything works fine. However, when using Nginx as a reverse proxy, Nginx reports an error: 'upstream prematurely closed connection while reading upstream.' Downgrading to sanic 23.6.0 resolves this error.

robd003 · 2024-03-26T05:40:20Z

@ahopkins any chance you can confirm that the fix from @makeev all that is needed?

If so can we please do a point release for the 23.12 LTS branch?

Tronic · 2024-03-30T00:46:52Z

I believe the issue lies within the close() method. Instead of directly calling self.abort(), it should be replaced with:
                timeout = self.app.config.GRACEFUL_SHUTDOWN_TIMEOUT
                self.loop.call_later(timeout, self.abort)
By making this adjustment, the connection will function correctly. Otherwise, nginx encounters an error stating upstream prematurely closed connection while reading upstream.

This code in sanic/server/protocol/http_protocol.py:242 appears to have been introduced in #2831. Ping @gluhar2006 can you have a look? This asyncio transport code is very complicated and easily broken by small details, also the docstring of that function is not quite understandable.

The graceful shutdown timeout normally has a different meaning in Sanic (roughly speaking: how long to wait for handler to finish), not that of just closing a TCP connection. Would need a bit deeper look at what exactly is being fixed here and what is to proper approach + write tests for those cases.

Possibly related also to #2531 (Nginx failures).

gluhar2006 · 2024-04-01T08:40:02Z

I believe the issue lies within the close() method. Instead of directly calling self.abort(), it should be replaced with:
                timeout = self.app.config.GRACEFUL_SHUTDOWN_TIMEOUT
                self.loop.call_later(timeout, self.abort)
By making this adjustment, the connection will function correctly. Otherwise, nginx encounters an error stating upstream prematurely closed connection while reading upstream.
This code in sanic/server/protocol/http_protocol.py:242 appears to have been introduced in #2831. Ping @gluhar2006 can you have a look? This asyncio transport code is very complicated and easily broken by small details, also the docstring of that function is not quite understandable.

The graceful shutdown timeout normally has a different meaning in Sanic (roughly speaking: how long to wait for handler to finish), not that of just closing a TCP connection. Would need a bit deeper look at what exactly is being fixed here and what is to proper approach + write tests for those cases.

Possibly related also to #2531 (Nginx failures).

It appears that the incorrect behavior of my changes was due to my misunderstanding of the meaning of the graceful shutdown timeout. I tried to look deeper, but so far without success. The main problem here is that I use Keep Alive 99% of the time, so I haven't encountered this problem when testing my changes.
I'll keep researching and come back if I find something.

As for now, I still cannot reproduce such problems with using keep-alive.

ahopkins · 2024-04-01T13:08:16Z

The graceful shutdown timeout normally has a different meaning in Sanic (roughly speaking: how long to wait for handler to finish), not that of just closing a TCP connection. Would need a bit deeper look at what exactly is being fixed here and what is to proper approach + write tests for those cases.

@Tronic I think they are just using the shutdown timer here as a further delay of the response timeout.

@ahopkins any chance you can confirm that the fix from @makeev all that is needed?

If so can we please do a point release for the 23.12 LTS branch?

@robd003 No. That is an incorrect use of graceful timeout. But also, I am not sure why we would want to further delay abort. Needs investigation.

@xbeastx Shouldn't this just be easily solvable by increasing the response timeout?

…ged how multiprocessing works. Python 3.11 is now required, old versions were causing unpredictability in tests. (Sanic does not yet support 3.12) Sanic has been upgraded to 23.6.0, which is the latest version that avoids this bug: sanic-org/sanic#2921 New strategy for multiprocessing is to create all multiprocessing tools in one process, then fork to other processes. The previous strategy was to declare multiprocessing tools at the top of every file, or wherever they were needed at import/creation. Now all multiprocessing tools are attached to the app.shared_ctx. This means `api_app` is imported in many, many places. This forced a change in how the DownloadManager works. Previously, it would continually run download workers which would pull downloads from a multiprocessing.Queue. Now, a single worker checks for new downloads and sends a Sanic signal. Flags have been reworked to use the `api_app`. I removed the `which` flag functionality because the `which` are called at import and needed their own multiprocessing.Event.

gregflynn · 2024-04-30T17:31:51Z

@xbeastx Shouldn't this just be easily solvable by increasing the response timeout?

@ahopkins I just came across this issue upgrading from 23.6.0 to 23.12.1 and my response timeout is configured to 60 seconds which is not reached before response data is truncated, in my case always at 109kb, so to your question about increasing response_timeout, i don't think so.

There's also a thread on discord which seems to be the same issue https://discord.com/channels/812221182594121728/1209575840203939880

xbeastx added the bug label Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanic drops part of HTTP response data #2921

Sanic drops part of HTTP response data #2921

xbeastx commented Feb 25, 2024

makeev commented Feb 26, 2024

makeev commented Feb 27, 2024

jhonsonlaid commented Mar 21, 2024

robd003 commented Mar 26, 2024

Tronic commented Mar 30, 2024 •

edited

gluhar2006 commented Apr 1, 2024

ahopkins commented Apr 1, 2024

gregflynn commented Apr 30, 2024

Sanic drops part of HTTP response data #2921

Sanic drops part of HTTP response data #2921

Comments

xbeastx commented Feb 25, 2024

Is there an existing issue for this?

Describe the bug

Code snippet

Expected Behavior

How do you run Sanic?

Operating System

Sanic Version

Additional context

makeev commented Feb 26, 2024

makeev commented Feb 27, 2024

jhonsonlaid commented Mar 21, 2024

robd003 commented Mar 26, 2024

Tronic commented Mar 30, 2024 • edited

gluhar2006 commented Apr 1, 2024

ahopkins commented Apr 1, 2024

gregflynn commented Apr 30, 2024

Tronic commented Mar 30, 2024 •

edited