feat(archives): gzip archived recordings #1112

lkonno · 2022-10-12T19:48:13Z

Fixes #732

Add the .gz extension to the compressed files.
Set CRYOSTAT_DISABLE_ARCHIVE_COMPRESS to true to disable the compression.
Files uploaded are also compressed.

lkonno · 2022-10-12T20:01:19Z

@andrewazores I had to let the compress method as public in the RecordingArchiveHelper due to RecordingsPostHandler which seems to need a refactor to move the saveRecording to RecordingArchiveHelper.
I was planning to move the method as is, but wanted to check if it should be better to open a new issue to work it in another PR.

andrewazores · 2022-10-12T20:48:04Z

@andrewazores I had to let the compress method as public in the RecordingArchiveHelper due to RecordingsPostHandler which seems to need a refactor to move the saveRecording to RecordingArchiveHelper. I was planning to move the method as is, but wanted to check if it should be better to open a new issue to work it in another PR.

Thanks for checking. I think that work can be included in this PR.

lkonno · 2022-10-14T13:13:01Z

@andrewazores I am unsure about the tests that I can create for this feature. Should it be an integration test to test when the flag CRYOSTAT_DISABLE_ARCHIVE_COMPRESS is set?

andrewazores · 2022-10-14T13:37:38Z

Our itest setup doesn't handle setting env vars for the Cryostat container itself right now.

I think the updated itests that check that the recordings can be downloaded with the .gz extension and that after unzipping those files the result is a valid JFR file are good enough.

src/main/java/io/cryostat/recordings/RecordingArchiveHelper.java

andrewazores

Looks good and working well. Only "broken" thing is that as it stands before this PR, I can download a recording .jfr from archives and re-upload that directly into the archives again. With this PR, the file I download will be a .jfr.gz (unless it is an old uncompressed file from before this PR), but I cannot upload a .jfr.gz because the RecordingsPostHandler tries to strip the .jfr filename extension, which it incorrectly handles when it ends in .jfr.gz and then it fails to match the filename against the expected pattern.

Maybe we can/should just drop the filename pattern requirement altogether. It was always an arbitrary requirement that didn't really do anything except make it annoying for a user to try to use Cryostat to analyze a JFR file they already collected from elsewhere.

lkonno · 2022-10-19T19:50:16Z

Looks good and working well. Only "broken" thing is that as it stands before this PR, I can download a recording .jfr from archives and re-upload that directly into the archives again. With this PR, the file I download will be a .jfr.gz (unless it is an old uncompressed file from before this PR), but I cannot upload a .jfr.gz because the RecordingsPostHandler tries to strip the .jfr filename extension, which it incorrectly handles when it ends in .jfr.gz and then it fails to match the filename against the expected pattern.

There is a validation in the upload checking if the file is a valid flight recording. I will decompress the uploaded file for the validation.

                        try (var is = new BufferedInputStream(new FileInputStream(recordingFile))) {
                            var supplier = FlightRecordingLoader.createChunkSupplier(is);
                            var chunks = FlightRecordingLoader.readChunkInfo(supplier);
                            if (chunks.size() < 1) {
                                throw new InvalidJfrFileException();
                            }
                        }

Maybe we can/should just drop the filename pattern requirement altogether. It was always an arbitrary requirement that didn't really do anything except make it annoying for a user to try to use Cryostat to analyze a JFR file they already collected from elsewhere.

I agree to do not use the extension, less classes and tests will be affected by this change this way.

andrewazores · 2022-10-21T15:13:19Z

Working mostly well and code looks well-written. However, I notice that ex. GET /api/v1/targets responds like this:

$ https :8181/api/v1/recordings Authorization:"Basic $(echo user:pass | base64)"
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 268
content-type: text/plain

[
    {
        "downloadUrl": "https://localhost:8181/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9093%2Fjmxrmi/es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr",
        "metadata": {
            "labels": {
                "rule": "ruletwo",
                "template.name": "Continuous",
                "template.type": "TARGET"
            }
        },
        "name": "es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr",
        "reportUrl": "https://localhost:8181/api/beta/reports/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9093%2Fjmxrmi/es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr",
        "size": 337537
    }
]

but, if I actually download the https://localhost:8181/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9093%2Fjmxrmi/es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr file, it is gzipped. This is expected really, but in that case the file should have a .gz extension to indicate the compression, or else the file should be decompressed before sending it to the client on download request.

To preserve compatibility I think decompressing the file first actually makes the most sense to implement right now. Later on someone can circle back around to implement content-based negotiation so that the server will serve the compressed asset if the client requests it or says they can accept it, otherwise it should fall back to serving the uncompressed asset.

lkonno · 2022-11-03T17:26:14Z

To preserve compatibility I think decompressing the file first actually makes the most sense to implement right now.

@andrewazores I was trying to find a solution that does not let temp files, but I could not find any other solution. In this implementation, I am replacing the non-compressed file with the compressed file. To decompress the file for downloading, I was thinking to create it as a temp file as it is not be possible to delete it immediately after the file is requested and the download can take time. Any thoughts? Do you have any suggestion?

andrewazores · 2022-11-03T17:32:47Z

If there is a compressed file on disk, you can return it uncompressed to the client by 1) decompressing it to a temp file and serving the temp file to the client, 2) decompressing the file and streaming this as a response to the client.

sounds like what you're trying to do now, and it will work, but as you say it means that there will be some latency before the decompressed temp file can start being served to the client. If the file is relatively large then this latency can be long and perhaps the client cancels the request. Also, the temp file should be cleaned up after the request completes.
means we can't simply stream the file from disk to socket and use super-optimized I/O, but we do have at least one extant example of reading an InputStream and piping that to the client response:

cryostat/src/main/java/io/cryostat/net/web/http/api/v1/TargetRecordingGetHandler.java

Line 141 in 6288a60

try (final InputStream is = stream.get();

. This is the code that handles streaming a JFR file from the target JVM over JMX into Cryostat's memory, then repackaging that into an HTTP response stream to the client. Swapping out the JFR over JMX stream for a unGZIP file stream should work, I think.

…veHelper

github-actions · 2022-11-07T13:54:52Z

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1112-a3278f5e07c3900e85685f597db7be3d10f30f1a sh smoketest.sh

andrewazores · 2022-11-07T14:40:18Z

src/main/java/io/cryostat/recordings/RecordingArchiveHelper.java

+            if (!env.hasEnv(Variables.DISABLE_ARCHIVE_COMPRESS) && isGzip(recordingPath)) {
+                return unGzip(recordingPath);
+            }
+            recordingFile = Files.readAllBytes(recordingPath);


This will load the entire recording file into RAM within Cryostat's process, which is probably within a container with a relatively small memory allocation. If the requested recording file is close to that same size or larger then this operation will fail because there is not enough memory available to allocate. I think we need to find a way to return a Stream from this method and pump that Stream out to the HTTP response in the endpoint handler.

I am trying to use the pipeFromInput as TargetRecordingGetHandler but it is not returning response and always returning an empty file. Would you mind looking and help me to find what is wrong?

Sure, I'll see if I can find a bit of time to look into this tomorrow or Friday. If not then I am on PTO the week after, but I can take a look the week after that if you're still blocked.

Ah, I think I see what might be going wrong. The TargetRecordingGetHandler implementation needs an open JMX connection to the target since it wants to stream the recording data over JMX, through Cryostat, and back into a pipe to the requesting client over HTTP. Therefore, it creates a ConnectionDescriptor object that represents the JMX Service URL of the application and any required JMX credentials. It gets the JMX Service URL from the :targetId path parameter of the request.

In this case, we want to do something similar with streaming an InputStream to the client over HTTP, but it's just backed by a simple file on disk rather than by a JMX connection to a target application. So we don't actually need or want a JMX connection to be opened. But, because the OutputToReadStream class was created with that JMX-to-HTTP piping in mind, it requires a TargetConnectionManager and ConnectionDescriptor. So in the modified handler you are also creating a ConnectionDescriptor, but this actually is an invalid descriptor because the request URL has no :targetId path parameter - instead it has a :sourceTarget that would be equivalent, but again it's unnecessary. I think what happens is that this invalid ConnectionDescriptor gets passed to the TargetConnectionManager, and then inside OutputToReadStream any time the method checkConnection is called it silently fails because the descriptor is invalid.

So in the end I think the fix will be to create another class very similar to OutputToReadStream, but without the dependency on a ConnectionDescriptor and the TargetConnectionManager. I will see if I can commit directly to your PR branch here and help out.

@lkonno I think I got it working, at least the immediate bug we were just talking about. I was able to download an archived recording through the web-client UI.

lkonno added the feat New feature or request label Oct 12, 2022

lkonno requested a review from andrewazores October 12, 2022 19:49

lkonno marked this pull request as draft October 12, 2022 19:55

lkonno force-pushed the compress-archived-recordings branch 2 times, most recently from 13a9400 to 2538e26 Compare October 14, 2022 13:10

lkonno marked this pull request as ready for review October 14, 2022 14:09

andrewazores reviewed Oct 17, 2022

View reviewed changes

src/main/java/io/cryostat/recordings/RecordingArchiveHelper.java Outdated Show resolved Hide resolved

andrewazores reviewed Oct 17, 2022

View reviewed changes

andrewazores added the needs-documentation label Oct 18, 2022

andrewazores assigned dfitzmau Oct 18, 2022

lkonno marked this pull request as draft October 19, 2022 20:01

lkonno force-pushed the compress-archived-recordings branch 2 times, most recently from 7ea4d96 to 4ae7559 Compare October 21, 2022 14:16

lkonno marked this pull request as ready for review October 21, 2022 14:54

andrewazores added the safe-to-test label Oct 26, 2022

lkonno marked this pull request as draft November 3, 2022 23:57

lkonno added 6 commits November 7, 2022 10:36

add variable for compress option

18994a6

gzip archived recording files

2fd6bd6

edited the readme file

92121be

fixing tests

c11428f

moving the saveRecording from RecordingsPostHandler to RecordingArchi…

bcc9cd8

…veHelper

fixing integration test

f4b248c

lkonno added 4 commits November 7, 2022 10:37

removing the .gz extesion

172a4f7

spotless

6f44d5f

decompressing the file for validation

28857f3

decompressing file for download

a3278f5

lkonno force-pushed the compress-archived-recordings branch from fca5360 to a3278f5 Compare November 7, 2022 13:38

lkonno marked this pull request as ready for review November 7, 2022 13:55

andrewazores reviewed Nov 7, 2022

View reviewed changes

lkonno marked this pull request as draft November 9, 2022 12:54

lkonno and others added 7 commits November 16, 2022 21:03

use pipeFromInput for archive download

80ae0fa

guard against blank targetIds, sourceTargets, recordingNames

a03d9d9

rename class

c2d9def

replace inputstream-to-http pipe impl

bbdd6f2

remove redundant response end

49628b7

fixup! guard against blank targetIds, sourceTargets, recordingNames

814fc20

refactor to remove duplicate implementation

cbe36d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(archives): gzip archived recordings #1112

feat(archives): gzip archived recordings #1112

lkonno commented Oct 12, 2022 •

edited

lkonno commented Oct 12, 2022

andrewazores commented Oct 12, 2022

lkonno commented Oct 14, 2022

andrewazores commented Oct 14, 2022

andrewazores left a comment

lkonno commented Oct 19, 2022

andrewazores commented Oct 21, 2022

lkonno commented Nov 3, 2022

andrewazores commented Nov 3, 2022

github-actions bot commented Nov 7, 2022

andrewazores Nov 7, 2022

lkonno Nov 17, 2022

andrewazores Nov 17, 2022

andrewazores Nov 18, 2022

andrewazores Nov 18, 2022

lkonno Nov 18, 2022

feat(archives): gzip archived recordings #1112

Are you sure you want to change the base?

feat(archives): gzip archived recordings #1112

Conversation

lkonno commented Oct 12, 2022 • edited

lkonno commented Oct 12, 2022

andrewazores commented Oct 12, 2022

lkonno commented Oct 14, 2022

andrewazores commented Oct 14, 2022

andrewazores left a comment

Choose a reason for hiding this comment

lkonno commented Oct 19, 2022

andrewazores commented Oct 21, 2022

lkonno commented Nov 3, 2022

andrewazores commented Nov 3, 2022

github-actions bot commented Nov 7, 2022

andrewazores Nov 7, 2022

Choose a reason for hiding this comment

lkonno Nov 17, 2022

Choose a reason for hiding this comment

andrewazores Nov 17, 2022

Choose a reason for hiding this comment

andrewazores Nov 18, 2022

Choose a reason for hiding this comment

andrewazores Nov 18, 2022

Choose a reason for hiding this comment

lkonno Nov 18, 2022

Choose a reason for hiding this comment

lkonno commented Oct 12, 2022 •

edited