Timeout during long file upload #2574

darkhelmet · 2021-03-12T00:22:11Z

Not really a bug, more clarification

We use a little file upload server that is a simple rack application and uses Puma as the server. I upgraded it from 4.3.3 to 5.2.2 and now we see timeouts after 30 seconds (first_data_timeout) when uploading files to it where the upload takes longer than 30 seconds. I can increase first_data_timeout and the timeout happens at the appropriate time.

An example using curl of an upload is pretty simple curl --data-binary @file.zip server:5000/upload?some=params

The application runs in our Kubernetes cluster behind Traefik which sits behind an AWS ALB/WAF. All that doesn't seem to matter though as using port-forward to talk directly to the pod gives the same result.

Looking in some code it looks like first_data_timeout is deprecated in v6? Or is that just the attr_writer for it?

Just wanting some guidance as to what I should be looking at to solve this problem. I could just revert back to 4.3.3 (which I'll probably do for now), but I'd like to keep upgrading and running the newer version for security, performance, etc., reasons.

I'm slightly confused as to why it's happening in the first place, since I don't know where the data is buffering for it to be waiting and causing puma to timeout. Or is the name first_data_timeout just not accurate in this case and it is getting data, but it's timing out for another reason?

Do I need some other puma configuration? Should I be running nginx in a sidebar to sit in front of puma?

Puma config

workers 1
threads 1, 16

preload_app!

rackup      DefaultRackup
port        5000
environment 'production'

The text was updated successfully, but these errors were encountered:

MSP-Greg · 2021-03-12T02:20:21Z

Looking in some code it looks like first_data_timeout is deprecated in v6? Or is that just the attr_writer for it?

Just the attr_writer in Puma::Server. It will continue to be in Puma::DSL, which defines the methods used in a Puma config file.

You can add it to config via:

first_data_timeout <int seconds>

Or is the name first_data_timeout just not accurate in this case and it is getting data, but it's timing out for another reason?

Not sure. The DSL comment is "Define how long the tcp socket stays open, if no data has been received.", which certainly implies 'inactivity'. Could it have accidentally changed to what one might call first_request_timeout, not sure.

I can increase first_data_timeout and the timeout happens at the appropriate time.

If you do that in config, are you okay? A user may want to configure Puma so it sets a limit on a request body, whether that be by time or size. And, there maybe a need for an 'inactive' timeout also. There's several timeout settings for clients, I haven't reviewed for a while, as I've been looking at tests and response time...

Off-topic
Re the v6 deprecations, this is an example of what may have originally been a constant setting in Puma::Server, and was later added to Puma::DSL so it could be user configured. I really don't like an API that expose needless items, as it leads to the question of why it exists.

Also, there are a few classes where there's a pattern similar to:

obj = PumaSomething.new(opts)
# use a few attr_writers with opts
obj.attr1 = f(opts[:key1])
obj.attr2 = g(opts[:key2])
obj.meth3 obj3
obj.run(args)

We may change those patterns to simply PumaSomething.run(args, obj3, opts)

dentarg · 2021-03-12T07:28:46Z

The application runs in our Kubernetes cluster behind Traefik which sits behind an AWS ALB/WAF. All that doesn't seem to matter though as using port-forward to talk directly to the pod gives the same result.

@darkhelmet does the problem also reproduce locally? I.e. running puma and curl on your laptop/work station

darkhelmet · 2021-03-12T17:30:54Z

@MSP-Greg Yes, if I configure first_data_timeout in my puma.rb file, say first_data_timeout 60 it will timeout after 1 minute instead of 30 seconds. I could try cranking that number way up, but I haven't looked at what else that may affect.

@dentarg It was a long day and I tried to do that but of course it uploads the file so quickly that it works fine. So I did that thing where you look at yourself in the reflection in your monitor and are a little disappointed and moved on. I would have to figure out how to cause the upload to be slow locally I guess, which I haven't looked into how to do (since I'm trying to go directly to the application). I don't think the kubectl port-forward is buffering anything

The more I think about it the correct answer is I should use a little nginx sidecar container that does proxy_pass and can buffer the contents for Puma. When we first deployed this a long time ago, I thought of that, but it seemed to perform well without it. We put Cloudfront in front of it for serving files, so it's not like it's getting beat up by requests, and I was like "well, it deals with these uploads fine". Just some days it has to deal with big file uploads (1GB+) and they take a while. But still...if puma doesn't think there is data for it...where is that data buffering.

cjlarose · 2021-03-12T20:07:36Z

The more I think about it the correct answer is I should use a little nginx sidecar container that does proxy_pass and can buffer the contents for Puma.

While this is possible and might help in your use case, what you're describing is something that puma should handle correctly, so it does seem like you're running into legitimate bug. Since you were on 4.x before and that working for you, I suspect the change that might be affecting you was our rewrite of the Reactor, which is responsible for client buffering in #2279

If you're able to test different versions of puma, it would help us if you figure out which version is the one that starts breaking for you. My guess is that it's version 5.0.3. But if you can identify the first failing version, that'll help us debug the issue. Alternatively, if you're able to put together a minimal application that reproduces the problem, that would work, too!

cjlarose · 2021-03-12T23:56:40Z

Also I figure I should probably loop in @wjordan. Is this something you've seen before?

wjordan · 2021-03-13T00:41:12Z

I haven't seen this before, but it does look possible that #2279 introduced a regression where first_data_timeout now unintentionally works as a 'total request timeout' instead of an first/in-between-bytes timeout which is how it behaved previously (the 'first data' name and comment description had already become obsolete at some point). In most cases the default timeout is large enough to not introduce an issue and we probably don't have tests distinguishing these two cases which is perhaps why it slipped by until now.

I think this could be fixed by adding an else clause to the conditional in reactor_wakeup, so the timeout is reset whenever any bytes are received on the connection:

puma/lib/puma/server.rb

Lines 294 to 298 in c68e20d

    
           if client.try_to_finish || (shutdown && !client.can_close?) 
        
             @thread_pool << client 
        
           elsif shutdown || client.timeout == 0 
        
             client.timeout! 
        
           end

Something like:

else
  client.set_timeout(first_data_timeout)
  false
end

That said, perhaps a fix for this would be a good time to introduce a new between_bytes_timeout config (that defaults to the value of first_data_timeout) for the timeout in between bytes, so that the name/description for first_data_timeout is once again accurate?

For reference, Varnish has a separate first_byte_timeout and between_bytes_timeout config like this, while nginx has a separate timeout for the header (client_header_timeout) and body (client_body_timeout).

darkhelmet · 2021-03-13T03:49:53Z

Okay, local testing was a success!

(Replace cr12.zip with any "large" file that would take more than 30 seconds to upload at 1MB/s)

Using puma 5.0.2 and curl -v -X PUT --data-binary @cr12.zip http://localhost:5000/upload?file=cr12.zip --limit-rate 1m works
Using puma 5.0.3 and curl -v -X PUT --data-binary @cr12.zip http://localhost:5000/upload?file=cr12.zip --limit-rate 1m fails

With latest puma 5.2.2 and config.ru of run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] } using curl -v -X PUT --data-binary @cr12.zip http://localhost:9292 --limit-rate 1m fails at the 30 second mark.

So it appears 5.0.3 is the breaking version.

@wjordan I also pulled the master branch, made that suggested change, built/installed the gem, ran the simple repro case, and it works!

darkhelmet · 2021-03-13T04:41:49Z

I started a PR, not sure on the tests I wrote though: #2575

(Unrelated, it's showing me as a contributor, I was like wtf, and found my random PR from 2013 lol #305)

nateberkopec · 2021-03-15T14:09:56Z

Welcome back, then @darkhelmet! 😆

bmclean · 2021-03-29T20:06:37Z

I like the between_bytes_timeout concept @darkhelmet. Since 5.0.3 we have been running with a large value set for first_data_timeout to handle slow clients (see #2512).

darkhelmet · 2021-03-31T21:35:39Z

I'll need some help on my PR, I think I'm close, but I'm missing some stuff.

Fixes puma#2574 (regression in v5.0.3). Co-authored-by: Daniel Huckstep <daniel@huckstep.ca> Co-authored-by: Will Jordan <will@code.org>

* Fix timing out when we get a slow request Fixes #2574 (regression in v5.0.3). Co-authored-by: Daniel Huckstep <daniel@huckstep.ca> Co-authored-by: Will Jordan <will@code.org> * Update test_timeout_in_data_phase Complete request to ensure short timeout is used properly. Co-authored-by: Daniel Huckstep <daniel@huckstep.ca>

* Fix timing out when we get a slow request Fixes puma#2574 (regression in v5.0.3). Co-authored-by: Daniel Huckstep <daniel@huckstep.ca> Co-authored-by: Will Jordan <will@code.org> * Update test_timeout_in_data_phase Complete request to ensure short timeout is used properly. Co-authored-by: Daniel Huckstep <daniel@huckstep.ca>

darkhelmet mentioned this issue Mar 13, 2021

Add between_bytes_timeout to fix timing out when we get a slow request #2575

Closed

8 tasks

cjlarose added the bug label Mar 18, 2021

wjordan mentioned this issue Apr 19, 2021

Setting first_data_timeout triggers an H18 error on Heroku #2539

Closed

wjordan added a commit to wjordan/puma that referenced this issue Apr 20, 2021

Fix timing out when we get a slow request

7e11ad1

Fixes puma#2574 (regression in v5.0.3). Co-authored-by: Daniel Huckstep <daniel@huckstep.ca> Co-authored-by: Will Jordan <will@code.org>

wjordan mentioned this issue Apr 20, 2021

Fix timing out requests too early #2606

Merged

7 tasks

MSP-Greg closed this as completed in #2606 Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout during long file upload #2574

Timeout during long file upload #2574

darkhelmet commented Mar 12, 2021

MSP-Greg commented Mar 12, 2021 •

edited

dentarg commented Mar 12, 2021

darkhelmet commented Mar 12, 2021

cjlarose commented Mar 12, 2021

cjlarose commented Mar 12, 2021

wjordan commented Mar 13, 2021 •

edited

darkhelmet commented Mar 13, 2021

darkhelmet commented Mar 13, 2021 •

edited

nateberkopec commented Mar 15, 2021

bmclean commented Mar 29, 2021

darkhelmet commented Mar 31, 2021

Navigation Menu

Timeout during long file upload #2574

Timeout during long file upload #2574

Comments

darkhelmet commented Mar 12, 2021

MSP-Greg commented Mar 12, 2021 • edited

dentarg commented Mar 12, 2021

darkhelmet commented Mar 12, 2021

cjlarose commented Mar 12, 2021

cjlarose commented Mar 12, 2021

wjordan commented Mar 13, 2021 • edited

darkhelmet commented Mar 13, 2021

darkhelmet commented Mar 13, 2021 • edited

nateberkopec commented Mar 15, 2021

bmclean commented Mar 29, 2021

darkhelmet commented Mar 31, 2021

MSP-Greg commented Mar 12, 2021 •

edited

wjordan commented Mar 13, 2021 •

edited

darkhelmet commented Mar 13, 2021 •

edited