Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immediately unlink temporary files #2613

Merged
merged 1 commit into from Apr 27, 2021

Conversation

smcgivern
Copy link
Contributor

Description

Puma has a limit (Puma::Const::MAX_BODY - around 110 KiB) over which
it will write request bodies to disk for handing off to the
application. When it does this, the request body can be left on disk
if the Puma process receives SIGKILL. Consider an extremely minimal
config.ru:

run(proc { [204, {}, []] })

If we then:

  1. Start puma, noting the process ID (to kill it later).
  2. Start a slow file transfer, using curl --limit-rate 100k (for
    example) and -T $PATH_TO_LARGE_FILE.
  3. Watch $TMPDIR/puma*.

We will see Puma start to write this temporary file. If we then send
SIGKILL to Puma, the file won't be cleaned up. With this patch, it
will - at least on POSIX systems. On Windows it may still be available.

This is suggested in the Ruby Tempfile documentation, and even uses this
specific example:
https://ruby-doc.org/stdlib-2.7.2/libdoc/tempfile/rdoc/Tempfile.html#class-Tempfile-label-Unlink+after+creation

On POSIX systems, it's possible to unlink a file right after creating
it, and before closing it. This removes the filesystem entry without
closing the file handle, so it ensures that only the processes that
already had the file handle open can access the file’s contents. It's
strongly recommended that you do this if you do not want any other
processes to be able to read from or write to the Tempfile, and you do
not need to know the Tempfile's filename either.

For example, a practical use case for unlink-after-creation would be
this: you need a large byte buffer that's too large to comfortably fit
in RAM, e.g. when you're writing a web server and you want to buffer
the client's file upload data.

This was possibly the root cause of #1187, although the issue did not contain enough detail to determine whether or not that's the case.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

Puma has a limit (`Puma::Const::MAX_BODY` - around 110 KiB) over which
it will write request bodies to disk for handing off to the
application. When it does this, the request body can be left on disk
if the Puma process receives SIGKILL. Consider an extremely minimal
`config.ru`:

    run(proc { [204, {}, []] })

If we then:

1. Start `puma`, noting the process ID.
2. Start a slow file transfer, using `curl --limit-rate 100k` (for
   example) and `-T $PATH_TO_LARGE_FILE`.
3. Watch `$TMPDIR/puma*`.

We will see Puma start to write this temporary file. If we then send
SIGKILL to Puma, the file won't be cleaned up. With this patch, it
will - at least on POSIX systems. On Windows it may still be available.

This is suggested in the Ruby Tempfile documentation, and even uses this
specific example:
https://ruby-doc.org/stdlib-2.7.2/libdoc/tempfile/rdoc/Tempfile.html#class-Tempfile-label-Unlink+after+creation

> On POSIX systems, it's possible to unlink a file right after creating
> it, and before closing it. This removes the filesystem entry without
> closing the file handle, so it ensures that only the processes that
> already had the file handle open can access the file’s contents. It's
> strongly recommended that you do this if you do not want any other
> processes to be able to read from or write to the Tempfile, and you do
> not need to know the Tempfile's filename either.
>
> For example, a practical use case for unlink-after-creation would be
> this: you need a large byte buffer that's too large to comfortably fit
> in RAM, e.g. when you're writing a web server and you want to buffer
> the client's file upload data.
@smcgivern
Copy link
Contributor Author

smcgivern commented Apr 26, 2021

I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.

It's very easy (well, in my opinion) to manually test this. I'm not really sure how to write an automated test, other than to check that #unlink is called at the right point - I'd appreciate any guidance here.


We could also remove

client.tempfile.unlink if client.tempfile

But I don't know if Windows would still need that.

@nateberkopec
Copy link
Member

nateberkopec commented Apr 26, 2021

TIL about curl --limit-rate, that's neat

@smcgivern
Copy link
Contributor Author

The two Windows failures are getting a 403 on installing MSYS. I can't retry them to see if that's transient, though: https://github.com/puma/puma/pull/2613/checks?check_run_id=2439382485 and https://github.com/puma/puma/pull/2613/checks?check_run_id=2439382549

I ran the Ruby 3.0.0 tests on macOS 11.2.3 - so not quite the same as the Actions version - and they all passed:

$ TESTOPTS='--seed=33161' bundle exec rake test:all
install -c tmp/x86_64-darwin19/puma_http11/3.0.0/puma_http11.bundle lib/puma/puma_http11.bundle
cp tmp/x86_64-darwin19/puma_http11/3.0.0/puma_http11.bundle tmp/x86_64-darwin19/stage/lib/puma/puma_http11.bundle

ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
RUBYOPT: -r/Users/smcgivern/.asdf/installs/ruby/3.0.0/lib/ruby/3.0.0/bundler/setup
                         Puma::MiniSSL                   OpenSSL
OPENSSL_LIBRARY_VERSION: OpenSSL 1.1.1k  25 Mar 2021     OpenSSL 1.1.1k  25 Mar 2021
        OPENSSL_VERSION: OpenSSL 1.1.1k  25 Mar 2021     OpenSSL 1.1.1i  8 Dec 2020

Run options: --seed=33161

# Running:

*****************************************S***************************************************SS**************************S*************************************************************************************S*****************************************************************************************************************************************************************************************************S**********************************************

Fabulous run in 209.657705s, 2.1559 runs/s, 6.3437 assertions/s.

452 runs, 1330 assertions, 0 failures, 0 errors, 6 skips

You have skipped tests. Run with --verbose for details.

------------------------------------------------------------ Debugging Info
TestIntegrationCluster#test_hot_restart_does_not_drop_connections_threads
   restart_count 3, reset 0, success after restart 2680
TestIntegrationCluster#test_term_closes_listeners_tcp
    11 successes, 1 resets, 28 refused, failures 0
TestIntegrationCluster#test_hot_restart_does_not_drop_connections
   restart_count 4, reset 0, success after restart 939
TestIntegrationCluster#test_term_closes_listeners_unix
    10 successes, 0 resets, 30 refused, failures 0
TestIntegrationSingle#test_hot_restart_does_not_drop_connections
   restart_count 2, reset 0, success after restart 429
TestIntegrationSingle#test_hot_restart_does_not_drop_connections_threads
   restart_count 2, reset 0, success after restart 841
---------------------------------------------------------------------------

@nateberkopec
Copy link
Member

Don't sweat the windows failures, that's @MSP-Greg's department. He'll be around to fix it in a jiffy.

@MSP-Greg
Copy link
Member

First, not a Puma issue. It's an issue of whether Windows Ruby 2.1 thru 2.3 can be supported as in the past for GitHub Actions, especially for extension gems.

He'll be around to fix it in a jiffy.

Not so sure. It's a BinTray issue, and we may have one file missing - ragel. In theory, I can build it, but... Anyway, three Ruby guys who have other things to work on are trying to fix this.

@nateberkopec
Copy link
Member

No rush, Greg! Only meant it as a compliment to your usual swiftness in Windows support, not as an expectation of any rapidity to fix a particular thing.

@MSP-Greg
Copy link
Member

Nate,

Didn't take it personally, just frustrated. It's a BinTray 'reminder' brownout, so we should be able to get the files needed and transfer them to GitHub releases.

I also didn't want any contributors going down the rabbit hole that this issue involves. The few lines in the workflow files that sets up Ruby is two repos of node.js code, downloading files from a few places, etc, etc...

@MSP-Greg
Copy link
Member

@smcgivern

The Actions issues are resolved, I ran the failing CI again, it passed, so this is green...

@MSP-Greg
Copy link
Member

@smcgivern

Thanks. This is good, with the client.tempfile.unlink if client.tempfile being left as is.

On Windows, normal files cannot be deleted/unlinked if a file handle is open. The source code for Tempfile#unlink allows Windows to silently fail on Errno::EACCES. Note that it catches the error from the File.unlink call.

@MSP-Greg MSP-Greg removed the waiting-for-review Waiting on review from anyone label Apr 27, 2021
@MSP-Greg MSP-Greg merged commit e870ab6 into puma:master Apr 27, 2021
@smcgivern
Copy link
Contributor Author

Great, thanks for the explanation @MSP-Greg ❤️

@renchap
Copy link

renchap commented May 10, 2021

This change can cause issues if you are using Tempfile#path in your request to perform some operations on the uploaded content.
For example in my Rails app I was using:

   content = request.body

    digest = if content.is_a?(File) || content.is_a?(Tempfile)
               Digest::MD5.file(content)
             else
               Digest::MD5.new.update(content.string)
             end

And it broke after upgrading to Puma 5.3 with #<Errno::ENOENT: No such file or directory @ rb_sysopen - /tmp/puma20210510-7869-d4ua1i>

This is caused because when you unlink a file, existing FDs to this file are kept alive and the file is kept on disk until all FDs are closed, but any new attempt to open it will fail and the file will no longer be accessible from the FS.

I patched my code by using request.body.read and computing the MD5 sum from this string, but I wanted to mention the potential breakage here.

@smcgivern
Copy link
Contributor Author

That's interesting @renchap, thanks for reporting that.

I see that #tempfile is public so that does seem like a valid use case. I was only considering the case where this is used by Puma itself, which is safe. Unfortunately I think this use case and mine are basically incompatible: if we defer calling #unlink on the temporary file, we introduce the possibility that it could hang around on disk after the process exits. But if we call it immediately, then as you say, you can't get a new file descriptor for it.

@smcgivern smcgivern deleted the immediately-unlink-temporary-files branch May 10, 2021 18:07
@nateberkopec
Copy link
Member

@renchap I don't think what you were doing previously is supported by Rack. read is always guaranteed to work because it is part of the spec.

@MSP-Greg
Copy link
Member

I agree with @nateberkopec.

You code did break, but your code was assuming that if the body is a file, that it can be opened. The Rack spec makes very clear what can be done with a response body (if it responds to to_path), but no allowance is made for that behavior in the request body.

Conversely, I can see the reason for your code, since Digest can be computed on a String or a filename, but not an IO.

Maybe consider using the code in Digest::Instance#file? That way you're using the IO provided, rather than creating another one?

@renchap
Copy link

renchap commented May 10, 2021

I am not saying that my previous code was correct and compliant with Rack's spec, but I wanted to report a possible cause for breakage in case it happens to others.

I do not remember the reason of the initial implementation, but I have been able to successfully update it to use read in every case:

content = request.body.read
hexdigest = Digest::MD5.hexdigest(content)

@MSP-Greg
Copy link
Member

I wanted to report a possible cause for breakage

Thank you for that. What constitutes a 'breaking change' is not always a simple yes/no issue. I mentioned the code in Digest::Instance#file in case you had large request bodies...

@jdelStrother
Copy link
Contributor

jdelStrother commented May 13, 2021

Just as another datapoint, our attachment-handling code is based on the venerable attachment_fu, which really wants to be able to get a path from the Tempfile to work with. This stopped working with puma 5.3.

(I've always thought their temp_data/temp_path code looked a bit sketchy, this might be good excuse to rip it all out...)

JuanitoFatas pushed a commit to JuanitoFatas/puma that referenced this pull request Sep 9, 2022
Puma has a limit (`Puma::Const::MAX_BODY` - around 110 KiB) over which
it will write request bodies to disk for handing off to the
application. When it does this, the request body can be left on disk
if the Puma process receives SIGKILL. Consider an extremely minimal
`config.ru`:

    run(proc { [204, {}, []] })

If we then:

1. Start `puma`, noting the process ID.
2. Start a slow file transfer, using `curl --limit-rate 100k` (for
   example) and `-T $PATH_TO_LARGE_FILE`.
3. Watch `$TMPDIR/puma*`.

We will see Puma start to write this temporary file. If we then send
SIGKILL to Puma, the file won't be cleaned up. With this patch, it
will - at least on POSIX systems. On Windows it may still be available.

This is suggested in the Ruby Tempfile documentation, and even uses this
specific example:
https://ruby-doc.org/stdlib-2.7.2/libdoc/tempfile/rdoc/Tempfile.html#class-Tempfile-label-Unlink+after+creation

> On POSIX systems, it's possible to unlink a file right after creating
> it, and before closing it. This removes the filesystem entry without
> closing the file handle, so it ensures that only the processes that
> already had the file handle open can access the file’s contents. It's
> strongly recommended that you do this if you do not want any other
> processes to be able to read from or write to the Tempfile, and you do
> not need to know the Tempfile's filename either.
>
> For example, a practical use case for unlink-after-creation would be
> this: you need a large byte buffer that's too large to comfortably fit
> in RAM, e.g. when you're writing a web server and you want to buffer
> the client's file upload data.
@rafaelmagu
Copy link

rafaelmagu commented Aug 1, 2023

Dropping a note here to say I've been bitten by this issue where the uploaded file is immediately unlinked before the controller has a chance to read it, on a Rails 4 monolith trying to move from Passenger to Puma. Had to revert to Puma 5.2.2 to get the code working.

A relevant section of the code where this behaviour noticeable is below:

  def create
    @attachment = current_user.account.attachments.build
    file = params[:qqfile].is_a?(ActionDispatch::Http::UploadedFile) ? params[:qqfile] : params[:file]
    if file.content_type == 'application/octet-stream'
      real_type = Mime::Type.lookup_by_extension(file.original_filename.split('.').last.downcase)
      file.content_type = real_type if real_type.present?
    end

    @attachment.data = file # <-- This fails with a "TypeError (no implicit conversion of nil into String)" error

    retries = 0
    begin
      saved = @attachment.save
    rescue => e
      retries += 1
      raise if retries > 3
      retry
    end

    if saved
      render json: attachment_hash(@attachment), content_type: 'text/plain' # IE doesn't like JSON
    else
      error = @attachment.errors.messages.try(:first).try(:last).try(:first)
      Rails.logger.warn "Unable to save attachmend: #{error}"
      render json: {error: error}, status: :unprocessable_entity
    end
  end

With the version revert out, I can spend more time looking into bringing the above code into working order. I do, however, appreciate any suggestions.

@dentarg
Copy link
Member

dentarg commented Aug 1, 2023

@rafaelmagu request.body.read? Like #2613 (comment)

@rafaelmagu
Copy link

@rafaelmagu request.body.read? Like #2613 (comment)

@dentarg wouldn't that include headers and everything else? In this case, the relevant code is trying to save the uploaded image as an attachment (using Paperclip), from the request's params. So I'm not sure if that counts as request.body. Does it?

@dentarg
Copy link
Member

dentarg commented Aug 1, 2023

There are no headers in request.body, your image is in request.body. See https://github.com/rack/rack/blob/main/SPEC.rdoc#label-The+Input+Stream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants