New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immediately unlink temporary files #2613
Immediately unlink temporary files #2613
Conversation
Puma has a limit (`Puma::Const::MAX_BODY` - around 110 KiB) over which it will write request bodies to disk for handing off to the application. When it does this, the request body can be left on disk if the Puma process receives SIGKILL. Consider an extremely minimal `config.ru`: run(proc { [204, {}, []] }) If we then: 1. Start `puma`, noting the process ID. 2. Start a slow file transfer, using `curl --limit-rate 100k` (for example) and `-T $PATH_TO_LARGE_FILE`. 3. Watch `$TMPDIR/puma*`. We will see Puma start to write this temporary file. If we then send SIGKILL to Puma, the file won't be cleaned up. With this patch, it will - at least on POSIX systems. On Windows it may still be available. This is suggested in the Ruby Tempfile documentation, and even uses this specific example: https://ruby-doc.org/stdlib-2.7.2/libdoc/tempfile/rdoc/Tempfile.html#class-Tempfile-label-Unlink+after+creation > On POSIX systems, it's possible to unlink a file right after creating > it, and before closing it. This removes the filesystem entry without > closing the file handle, so it ensures that only the processes that > already had the file handle open can access the file’s contents. It's > strongly recommended that you do this if you do not want any other > processes to be able to read from or write to the Tempfile, and you do > not need to know the Tempfile's filename either. > > For example, a practical use case for unlink-after-creation would be > this: you need a large byte buffer that's too large to comfortably fit > in RAM, e.g. when you're writing a web server and you want to buffer > the client's file upload data.
It's very easy (well, in my opinion) to manually test this. I'm not really sure how to write an automated test, other than to check that We could also remove Line 172 in d97688f
But I don't know if Windows would still need that. |
TIL about |
The two Windows failures are getting a 403 on installing MSYS. I can't retry them to see if that's transient, though: https://github.com/puma/puma/pull/2613/checks?check_run_id=2439382485 and https://github.com/puma/puma/pull/2613/checks?check_run_id=2439382549 I ran the Ruby 3.0.0 tests on macOS 11.2.3 - so not quite the same as the Actions version - and they all passed:
|
Don't sweat the windows failures, that's @MSP-Greg's department. He'll be around to fix it in a jiffy. |
First, not a Puma issue. It's an issue of whether Windows Ruby 2.1 thru 2.3 can be supported as in the past for GitHub Actions, especially for extension gems.
Not so sure. It's a BinTray issue, and we may have one file missing - ragel. In theory, I can build it, but... Anyway, three Ruby guys who have other things to work on are trying to fix this. |
No rush, Greg! Only meant it as a compliment to your usual swiftness in Windows support, not as an expectation of any rapidity to fix a particular thing. |
Nate, Didn't take it personally, just frustrated. It's a BinTray 'reminder' brownout, so we should be able to get the files needed and transfer them to GitHub releases. I also didn't want any contributors going down the rabbit hole that this issue involves. The few lines in the workflow files that sets up Ruby is two repos of node.js code, downloading files from a few places, etc, etc... |
The Actions issues are resolved, I ran the failing CI again, it passed, so this is green... |
Thanks. This is good, with the On Windows, normal files cannot be deleted/unlinked if a file handle is open. The source code for |
Great, thanks for the explanation @MSP-Greg ❤️ |
This change can cause issues if you are using content = request.body
digest = if content.is_a?(File) || content.is_a?(Tempfile)
Digest::MD5.file(content)
else
Digest::MD5.new.update(content.string)
end And it broke after upgrading to Puma 5.3 with This is caused because when you unlink a file, existing FDs to this file are kept alive and the file is kept on disk until all FDs are closed, but any new attempt to open it will fail and the file will no longer be accessible from the FS. I patched my code by using |
That's interesting @renchap, thanks for reporting that. I see that |
@renchap I don't think what you were doing previously is supported by Rack. |
I agree with @nateberkopec. You code did break, but your code was assuming that if the body is a file, that it can be opened. The Rack spec makes very clear what can be done with a response body (if it responds to Conversely, I can see the reason for your code, since Digest can be computed on a String or a filename, but not an IO. Maybe consider using the code in |
I am not saying that my previous code was correct and compliant with Rack's spec, but I wanted to report a possible cause for breakage in case it happens to others. I do not remember the reason of the initial implementation, but I have been able to successfully update it to use content = request.body.read
hexdigest = Digest::MD5.hexdigest(content) |
Thank you for that. What constitutes a 'breaking change' is not always a simple yes/no issue. I mentioned the code in |
Just as another datapoint, our attachment-handling code is based on the venerable attachment_fu, which really wants to be able to get a path from the Tempfile to work with. This stopped working with puma 5.3. (I've always thought their temp_data/temp_path code looked a bit sketchy, this might be good excuse to rip it all out...) |
Puma has a limit (`Puma::Const::MAX_BODY` - around 110 KiB) over which it will write request bodies to disk for handing off to the application. When it does this, the request body can be left on disk if the Puma process receives SIGKILL. Consider an extremely minimal `config.ru`: run(proc { [204, {}, []] }) If we then: 1. Start `puma`, noting the process ID. 2. Start a slow file transfer, using `curl --limit-rate 100k` (for example) and `-T $PATH_TO_LARGE_FILE`. 3. Watch `$TMPDIR/puma*`. We will see Puma start to write this temporary file. If we then send SIGKILL to Puma, the file won't be cleaned up. With this patch, it will - at least on POSIX systems. On Windows it may still be available. This is suggested in the Ruby Tempfile documentation, and even uses this specific example: https://ruby-doc.org/stdlib-2.7.2/libdoc/tempfile/rdoc/Tempfile.html#class-Tempfile-label-Unlink+after+creation > On POSIX systems, it's possible to unlink a file right after creating > it, and before closing it. This removes the filesystem entry without > closing the file handle, so it ensures that only the processes that > already had the file handle open can access the file’s contents. It's > strongly recommended that you do this if you do not want any other > processes to be able to read from or write to the Tempfile, and you do > not need to know the Tempfile's filename either. > > For example, a practical use case for unlink-after-creation would be > this: you need a large byte buffer that's too large to comfortably fit > in RAM, e.g. when you're writing a web server and you want to buffer > the client's file upload data.
Dropping a note here to say I've been bitten by this issue where the uploaded file is immediately unlinked before the controller has a chance to read it, on a Rails 4 monolith trying to move from Passenger to Puma. Had to revert to Puma 5.2.2 to get the code working. A relevant section of the code where this behaviour noticeable is below: def create
@attachment = current_user.account.attachments.build
file = params[:qqfile].is_a?(ActionDispatch::Http::UploadedFile) ? params[:qqfile] : params[:file]
if file.content_type == 'application/octet-stream'
real_type = Mime::Type.lookup_by_extension(file.original_filename.split('.').last.downcase)
file.content_type = real_type if real_type.present?
end
@attachment.data = file # <-- This fails with a "TypeError (no implicit conversion of nil into String)" error
retries = 0
begin
saved = @attachment.save
rescue => e
retries += 1
raise if retries > 3
retry
end
if saved
render json: attachment_hash(@attachment), content_type: 'text/plain' # IE doesn't like JSON
else
error = @attachment.errors.messages.try(:first).try(:last).try(:first)
Rails.logger.warn "Unable to save attachmend: #{error}"
render json: {error: error}, status: :unprocessable_entity
end
end With the version revert out, I can spend more time looking into bringing the above code into working order. I do, however, appreciate any suggestions. |
@rafaelmagu |
@dentarg wouldn't that include headers and everything else? In this case, the relevant code is trying to save the uploaded image as an attachment (using Paperclip), from the request's params. So I'm not sure if that counts as |
There are no headers in |
Description
Puma has a limit (
Puma::Const::MAX_BODY
- around 110 KiB) over whichit will write request bodies to disk for handing off to the
application. When it does this, the request body can be left on disk
if the Puma process receives SIGKILL. Consider an extremely minimal
config.ru
:If we then:
puma
, noting the process ID (to kill it later).curl --limit-rate 100k
(forexample) and
-T $PATH_TO_LARGE_FILE
.$TMPDIR/puma*
.We will see Puma start to write this temporary file. If we then send
SIGKILL to Puma, the file won't be cleaned up. With this patch, it
will - at least on POSIX systems. On Windows it may still be available.
This is suggested in the Ruby Tempfile documentation, and even uses this
specific example:
https://ruby-doc.org/stdlib-2.7.2/libdoc/tempfile/rdoc/Tempfile.html#class-Tempfile-label-Unlink+after+creation
This was possibly the root cause of #1187, although the issue did not contain enough detail to determine whether or not that's the case.
Your checklist for this pull request
[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.