Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanic drops part of HTTP response data #2921

Open
1 task done
xbeastx opened this issue Feb 25, 2024 · 8 comments
Open
1 task done

Sanic drops part of HTTP response data #2921

xbeastx opened this issue Feb 25, 2024 · 8 comments
Labels

Comments

@xbeastx
Copy link

xbeastx commented Feb 25, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

The bug is that Sanic closes the connection before the final transfer of all data (see example below)

From my point of view this is a very critical bug. For a reason unknown to me, it repeats only when starting the Sanic server inside the docker container (i.e., perhaps it is somehow related to the networkmode or delays during the work of the docker).

After analyzing the commits, we managed to understand that the bug was introduced in 1310684 and not repeated on d1fc867

Code snippet

So we have really simple server returning some json demo.py:

from sanic import Sanic
from sanic.response import json

app = Sanic("Demo")

@app.route("/")
async def handle(request):
    return json({'some_val': 'A'*743000})

if __name__ == "__main__":
    app.run(host="0.0.0.0")

Running in docker container:

FROM python:3.11

COPY demo.py demo.py
RUN pip install sanic==23.12.1
ENTRYPOINT python3 demo.py
$ docker build -t demo .
$ docker run -p 127.0.0.1:8000:8000 --rm -it demo

and the client.py:

import socket
import sys


def getsock(ip, port, req):
    s = socket.socket(2, 1, 6)
    s.connect((ip, port))
    s.send(req)
    return s


REQ = (
    b'GET / HTTP/1.1\r\n'
    b'User-Agent: Mozilla/5.0\r\n'
    b'Connection: close\r\n'
    b'\r\n'
)

s = getsock('127.0.0.1', 8000, REQ)

headers = s.recv(94).decode()

print(f'Headers: {headers!r}')

total = 0
by_len = int(sys.argv[1])
while (data := s.recv(by_len)):
    total += len(data)

print(f'Total length: {total}')

s.close()

I was not able to reproduce by curl, may be it's read too fast... But in real case it's repeats with Nginx proxy and Sanic as upstream.

so now if you will run hundred times you will get something like this:

$ for i in {1..100}; do python3 client.py 4096; done
...
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 652954
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 650058
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 652954
Headers: 'HTTP/1.1 200 OK\r\ncontent-length: 743015\r\nconnection: close\r\ncontent-type: application/json\r\n\r\n'
Total length: 586346
...

So length should be 743015 but sanic returns only 586346-652954.

client.py must be runner outside the docker container, e.g on host. If you will run inside the docker it will not reproduce.

Expected Behavior

Return all the data from response.

How do you run Sanic?

Sanic CLI

Operating System

Linux

Sanic Version

v23.12.1

Additional context

No response

@xbeastx xbeastx added the bug label Feb 25, 2024
@makeev
Copy link

makeev commented Feb 26, 2024

I can confirm this problem and this is a very critical bug.

@makeev
Copy link

makeev commented Feb 27, 2024

I believe the issue lies within the close() method. Instead of directly calling self.abort(), it should be replaced with:

                timeout = self.app.config.GRACEFUL_SHUTDOWN_TIMEOUT
                self.loop.call_later(timeout, self.abort)

By making this adjustment, the connection will function correctly. Otherwise, nginx encounters an error stating upstream prematurely closed connection while reading upstream.

@jhonsonlaid
Copy link

I encountered the same issue. When utilizing the Python packages requests or aiohttp to send requests to a Sanic server, everything works fine. However, when using Nginx as a reverse proxy, Nginx reports an error: 'upstream prematurely closed connection while reading upstream.' Downgrading to sanic 23.6.0 resolves this error.

@robd003
Copy link
Contributor

robd003 commented Mar 26, 2024

@ahopkins any chance you can confirm that the fix from @makeev all that is needed?

If so can we please do a point release for the 23.12 LTS branch?

@Tronic
Copy link
Member

Tronic commented Mar 30, 2024

I believe the issue lies within the close() method. Instead of directly calling self.abort(), it should be replaced with:

                timeout = self.app.config.GRACEFUL_SHUTDOWN_TIMEOUT
                self.loop.call_later(timeout, self.abort)

By making this adjustment, the connection will function correctly. Otherwise, nginx encounters an error stating upstream prematurely closed connection while reading upstream.

This code in sanic/server/protocol/http_protocol.py:242 appears to have been introduced in #2831. Ping @gluhar2006 can you have a look? This asyncio transport code is very complicated and easily broken by small details, also the docstring of that function is not quite understandable.

The graceful shutdown timeout normally has a different meaning in Sanic (roughly speaking: how long to wait for handler to finish), not that of just closing a TCP connection. Would need a bit deeper look at what exactly is being fixed here and what is to proper approach + write tests for those cases.

Possibly related also to #2531 (Nginx failures).

@gluhar2006
Copy link
Contributor

I believe the issue lies within the close() method. Instead of directly calling self.abort(), it should be replaced with:

                timeout = self.app.config.GRACEFUL_SHUTDOWN_TIMEOUT
                self.loop.call_later(timeout, self.abort)

By making this adjustment, the connection will function correctly. Otherwise, nginx encounters an error stating upstream prematurely closed connection while reading upstream.

This code in sanic/server/protocol/http_protocol.py:242 appears to have been introduced in #2831. Ping @gluhar2006 can you have a look? This asyncio transport code is very complicated and easily broken by small details, also the docstring of that function is not quite understandable.

The graceful shutdown timeout normally has a different meaning in Sanic (roughly speaking: how long to wait for handler to finish), not that of just closing a TCP connection. Would need a bit deeper look at what exactly is being fixed here and what is to proper approach + write tests for those cases.

Possibly related also to #2531 (Nginx failures).

It appears that the incorrect behavior of my changes was due to my misunderstanding of the meaning of the graceful shutdown timeout. I tried to look deeper, but so far without success. The main problem here is that I use Keep Alive 99% of the time, so I haven't encountered this problem when testing my changes.
I'll keep researching and come back if I find something.

As for now, I still cannot reproduce such problems with using keep-alive.

@ahopkins
Copy link
Member

ahopkins commented Apr 1, 2024

The graceful shutdown timeout normally has a different meaning in Sanic (roughly speaking: how long to wait for handler to finish), not that of just closing a TCP connection. Would need a bit deeper look at what exactly is being fixed here and what is to proper approach + write tests for those cases.

@Tronic I think they are just using the shutdown timer here as a further delay of the response timeout.


@ahopkins any chance you can confirm that the fix from @makeev all that is needed?

If so can we please do a point release for the 23.12 LTS branch?

@robd003 No. That is an incorrect use of graceful timeout. But also, I am not sure why we would want to further delay abort. Needs investigation.


@xbeastx Shouldn't this just be easily solvable by increasing the response timeout?

lrnselfreliance added a commit to lrnselfreliance/wrolpi that referenced this issue Apr 30, 2024
…ged how multiprocessing works.

Python 3.11 is now required, old versions were causing unpredictability in tests.  (Sanic does not yet
support 3.12)

Sanic has been upgraded to 23.6.0, which is the latest version that avoids this bug:
 sanic-org/sanic#2921

New strategy for multiprocessing is to create all multiprocessing tools in one process, then fork
to other processes.  The previous strategy was to declare multiprocessing tools at the top of every
file, or wherever they were needed at import/creation.  Now all multiprocessing tools are attached to the
app.shared_ctx.  This means `api_app` is imported in many, many places.

This forced a change in how the DownloadManager works.  Previously, it would continually run
download workers which would pull downloads from a multiprocessing.Queue.  Now, a single worker checks
for new downloads and sends a Sanic signal.

Flags have been reworked to use the `api_app`.  I removed the `which` flag functionality because the
`which` are called at import and needed their own multiprocessing.Event.
lrnselfreliance added a commit to lrnselfreliance/wrolpi that referenced this issue Apr 30, 2024
…ged how multiprocessing works.

Python 3.11 is now required, old versions were causing unpredictability in tests.  (Sanic does not yet
support 3.12)

Sanic has been upgraded to 23.6.0, which is the latest version that avoids this bug:
 sanic-org/sanic#2921

New strategy for multiprocessing is to create all multiprocessing tools in one process, then fork
to other processes.  The previous strategy was to declare multiprocessing tools at the top of every
file, or wherever they were needed at import/creation.  Now all multiprocessing tools are attached to the
app.shared_ctx.  This means `api_app` is imported in many, many places.

This forced a change in how the DownloadManager works.  Previously, it would continually run
download workers which would pull downloads from a multiprocessing.Queue.  Now, a single worker checks
for new downloads and sends a Sanic signal.

Flags have been reworked to use the `api_app`.  I removed the `which` flag functionality because the
`which` are called at import and needed their own multiprocessing.Event.
lrnselfreliance added a commit to lrnselfreliance/wrolpi that referenced this issue Apr 30, 2024
…ged how multiprocessing works.

Python 3.11 is now required, old versions were causing unpredictability in tests.  (Sanic does not yet
support 3.12)

Sanic has been upgraded to 23.6.0, which is the latest version that avoids this bug:
 sanic-org/sanic#2921

New strategy for multiprocessing is to create all multiprocessing tools in one process, then fork
to other processes.  The previous strategy was to declare multiprocessing tools at the top of every
file, or wherever they were needed at import/creation.  Now all multiprocessing tools are attached to the
app.shared_ctx.  This means `api_app` is imported in many, many places.

This forced a change in how the DownloadManager works.  Previously, it would continually run
download workers which would pull downloads from a multiprocessing.Queue.  Now, a single worker checks
for new downloads and sends a Sanic signal.

Flags have been reworked to use the `api_app`.  I removed the `which` flag functionality because the
`which` are called at import and needed their own multiprocessing.Event.
@gregflynn
Copy link

@xbeastx Shouldn't this just be easily solvable by increasing the response timeout?

@ahopkins I just came across this issue upgrading from 23.6.0 to 23.12.1 and my response timeout is configured to 60 seconds which is not reached before response data is truncated, in my case always at 109kb, so to your question about increasing response_timeout, i don't think so.

There's also a thread on discord which seems to be the same issue https://discord.com/channels/812221182594121728/1209575840203939880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants