New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM when using S3TransferManager.downloadDirectory() #4987
Comments
Hi @zzz8307 I'll try to repro the issue. In the meantime, can you try limiting the memory used by specifying lower values for Check the S3CrtAsyncClientBuilder javadoc for more info. |
Hi @debora-ito, I added the limitation to private final S3AsyncClient s3Client = S3AsyncClient.crtBuilder()
.region(Region.AP_EAST_1)
.targetThroughputInGbps(1.0)
.maxNativeMemoryLimitInBytes(1L * 1024 * 1024 * 1024)
.build(); You can repro the issue by duplicating 100k small files in the same directory on S3 then download the folder. Looks like |
We are facing the same problem. We need to split a 10GB XML into small pieces (~2Mio small files). The files are stored temporarily on the local disk. Afterwards we tried to use To my opinion the problem is in UploadDirectoryHelper.java:
This will create a huge list of CompleteableFutures (2Mio entries), which will result in a OOM after a while. As you do not care about the result of the individual file uploads (only failed are reported), you can skip the List all together. This seems to work:
|
@zzz8307 we are investigating the issue. @jensvogt memory issues with UploadDirectory were reported in a separate issue - #4999 (comment) - and we released a fix. Can you try the latest SDK version? |
@debora-ito sure, I'll test the newest version. But actually, this is not a memory leak, its simply a bad design. If I want to upload 2 Mio files, the uploadDirectory method collects 2 Mio CompletableFutures in a Java ArrayList, which results in a huge memory allocation. You need to hold 2 Mio CompletableFutures in memory, I wonder if this is needed. Maybe there is a more clever solution to collect the results of the CompletableFutures inside the directory filename stream. |
@debora-ito I created a new issues for the upload problem, as it is slightly different from the issue described here. See #5023 |
Hi @jensvogt , @debora-ito , |
@zzz8307 Yes, you're right. We did the same as a workaround. Currently, we're using a "paged" solutions, where pages of 10000 files are uploaded using transferManager.uploadDirectory. Nicer would be if the AWS SDK would take care of the paging. The failed uploads are collected anyhow (the number of failed uploads should be much less than the total). So there is no need for collecting the successes. Number of successes are simply (total - failed). |
For downloadDirectory, we don't actually store all CompletableFutures in a list, I think it's Line 141 in 09a12bf
I'm working on the fix. @zzz8307 just wanted to double check, what is the average size of the objects are you downloading? @jensvogt we'll make the fix for uploadDirectory as well. |
@zoewangg Just for your info: We're getting sometimes a 10-12GB XML with ~2Mio product XMLs. We split the 10GB XML into small product XML files. Each product XML es roughtly 4-8kBytes. We use the transferManager.downloadFile for downloading the 10GB XML and transferManager.uploadDirectory for the upload of the 2 Mio small XML files. Download takes ~2-5min, splitting ~20-40min, and upload ~1-2h. Uploading each single XML piece (using putObject and not using transferManager) is not an option as it takes ~46h. So it's mainly a performance issue. |
@zoewangg in our use case we are downloading ~1 million files with the size of 1~100kb each. |
Hi @zzz8307 we released a fix in |
@zoewangg I'm experiencing an issue with uploading a single 12GB file via TransferManger & CRT client and the container ran out of memory. I'm trying your fix right now. Fingers crossed! |
It's progressing a bit further, but still not good enough. I'm using a 6gb container & uploading a 12gb single file to s3 and it just OOM :( |
Using |
@thai-op this issue tracks memory issue for download directory method specifically. |
Hi @zoewangg , i've tested the latest version and the memory issue has been fixed. big thanks for the efforts! |
Awesome, thanks for verifying. Closing the issue. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
Describe the bug
When downloading a directory from S3 using
S3TransferManager.downloadDirectory()
that contains hundreds of thousands of files then it fails withOutOfMemoryError
.Expected Behavior
S3TransferManager can work fine no matter how many files or how big the file is.
Current Behavior
OOM after downloading some of the files.
Reproduction Steps
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.24.13
JDK version used
java version "11.0.22" 2024-01-16 LTS Java(TM) SE Runtime Environment 18.9 (build 11.0.22+9-LTS-219) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.22+9-LTS-219, mixed mode)
Operating System and version
Windows 10
The text was updated successfully, but these errors were encountered: