Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(archives): gzip archived recordings #1112

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

lkonno
Copy link
Contributor

@lkonno lkonno commented Oct 12, 2022

Fixes #732

  • Add the .gz extension to the compressed files.
  • Set CRYOSTAT_DISABLE_ARCHIVE_COMPRESS to true to disable the compression.
  • Files uploaded are also compressed.

@lkonno lkonno added the feat New feature or request label Oct 12, 2022
@lkonno lkonno marked this pull request as draft October 12, 2022 19:55
@lkonno
Copy link
Contributor Author

lkonno commented Oct 12, 2022

@andrewazores I had to let the compress method as public in the RecordingArchiveHelper due to RecordingsPostHandler which seems to need a refactor to move the saveRecording to RecordingArchiveHelper.
I was planning to move the method as is, but wanted to check if it should be better to open a new issue to work it in another PR.

@andrewazores
Copy link
Member

@andrewazores I had to let the compress method as public in the RecordingArchiveHelper due to RecordingsPostHandler which seems to need a refactor to move the saveRecording to RecordingArchiveHelper. I was planning to move the method as is, but wanted to check if it should be better to open a new issue to work it in another PR.

Thanks for checking. I think that work can be included in this PR.

@lkonno lkonno force-pushed the compress-archived-recordings branch 2 times, most recently from 13a9400 to 2538e26 Compare October 14, 2022 13:10
@lkonno
Copy link
Contributor Author

lkonno commented Oct 14, 2022

@andrewazores I am unsure about the tests that I can create for this feature. Should it be an integration test to test when the flag CRYOSTAT_DISABLE_ARCHIVE_COMPRESS is set?

@andrewazores
Copy link
Member

Our itest setup doesn't handle setting env vars for the Cryostat container itself right now.

I think the updated itests that check that the recordings can be downloaded with the .gz extension and that after unzipping those files the result is a valid JFR file are good enough.

@lkonno lkonno marked this pull request as ready for review October 14, 2022 14:09
Copy link
Member

@andrewazores andrewazores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and working well. Only "broken" thing is that as it stands before this PR, I can download a recording .jfr from archives and re-upload that directly into the archives again. With this PR, the file I download will be a .jfr.gz (unless it is an old uncompressed file from before this PR), but I cannot upload a .jfr.gz because the RecordingsPostHandler tries to strip the .jfr filename extension, which it incorrectly handles when it ends in .jfr.gz and then it fails to match the filename against the expected pattern.

Maybe we can/should just drop the filename pattern requirement altogether. It was always an arbitrary requirement that didn't really do anything except make it annoying for a user to try to use Cryostat to analyze a JFR file they already collected from elsewhere.

@lkonno
Copy link
Contributor Author

lkonno commented Oct 19, 2022

Looks good and working well. Only "broken" thing is that as it stands before this PR, I can download a recording .jfr from archives and re-upload that directly into the archives again. With this PR, the file I download will be a .jfr.gz (unless it is an old uncompressed file from before this PR), but I cannot upload a .jfr.gz because the RecordingsPostHandler tries to strip the .jfr filename extension, which it incorrectly handles when it ends in .jfr.gz and then it fails to match the filename against the expected pattern.

There is a validation in the upload checking if the file is a valid flight recording. I will decompress the uploaded file for the validation.

                        try (var is = new BufferedInputStream(new FileInputStream(recordingFile))) {
                            var supplier = FlightRecordingLoader.createChunkSupplier(is);
                            var chunks = FlightRecordingLoader.readChunkInfo(supplier);
                            if (chunks.size() < 1) {
                                throw new InvalidJfrFileException();
                            }
                        }

Maybe we can/should just drop the filename pattern requirement altogether. It was always an arbitrary requirement that didn't really do anything except make it annoying for a user to try to use Cryostat to analyze a JFR file they already collected from elsewhere.

I agree to do not use the extension, less classes and tests will be affected by this change this way.

@lkonno lkonno marked this pull request as draft October 19, 2022 20:01
@lkonno lkonno force-pushed the compress-archived-recordings branch 2 times, most recently from 7ea4d96 to 4ae7559 Compare October 21, 2022 14:16
@lkonno lkonno marked this pull request as ready for review October 21, 2022 14:54
@andrewazores
Copy link
Member

Working mostly well and code looks well-written. However, I notice that ex. GET /api/v1/targets responds like this:

$ https :8181/api/v1/recordings Authorization:"Basic $(echo user:pass | base64)"
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 268
content-type: text/plain

[
    {
        "downloadUrl": "https://localhost:8181/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9093%2Fjmxrmi/es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr",
        "metadata": {
            "labels": {
                "rule": "ruletwo",
                "template.name": "Continuous",
                "template.type": "TARGET"
            }
        },
        "name": "es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr",
        "reportUrl": "https://localhost:8181/api/beta/reports/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9093%2Fjmxrmi/es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr",
        "size": 337537
    }
]

but, if I actually download the https://localhost:8181/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9093%2Fjmxrmi/es-andrewazor-demo-Main_auto_ruletwo_20221021T150648Z.jfr file, it is gzipped. This is expected really, but in that case the file should have a .gz extension to indicate the compression, or else the file should be decompressed before sending it to the client on download request.

To preserve compatibility I think decompressing the file first actually makes the most sense to implement right now. Later on someone can circle back around to implement content-based negotiation so that the server will serve the compressed asset if the client requests it or says they can accept it, otherwise it should fall back to serving the uncompressed asset.

@lkonno
Copy link
Contributor Author

lkonno commented Nov 3, 2022

To preserve compatibility I think decompressing the file first actually makes the most sense to implement right now.

@andrewazores I was trying to find a solution that does not let temp files, but I could not find any other solution. In this implementation, I am replacing the non-compressed file with the compressed file. To decompress the file for downloading, I was thinking to create it as a temp file as it is not be possible to delete it immediately after the file is requested and the download can take time. Any thoughts? Do you have any suggestion?

@andrewazores
Copy link
Member

If there is a compressed file on disk, you can return it uncompressed to the client by 1) decompressing it to a temp file and serving the temp file to the client, 2) decompressing the file and streaming this as a response to the client.

  1. sounds like what you're trying to do now, and it will work, but as you say it means that there will be some latency before the decompressed temp file can start being served to the client. If the file is relatively large then this latency can be long and perhaps the client cancels the request. Also, the temp file should be cleaned up after the request completes.

  2. means we can't simply stream the file from disk to socket and use super-optimized I/O, but we do have at least one extant example of reading an InputStream and piping that to the client response:

    . This is the code that handles streaming a JFR file from the target JVM over JMX into Cryostat's memory, then repackaging that into an HTTP response stream to the client. Swapping out the JFR over JMX stream for a unGZIP file stream should work, I think.

@lkonno lkonno marked this pull request as draft November 3, 2022 23:57
@lkonno lkonno force-pushed the compress-archived-recordings branch from fca5360 to a3278f5 Compare November 7, 2022 13:38
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2022

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1112-a3278f5e07c3900e85685f597db7be3d10f30f1a sh smoketest.sh

@lkonno lkonno marked this pull request as ready for review November 7, 2022 13:55
if (!env.hasEnv(Variables.DISABLE_ARCHIVE_COMPRESS) && isGzip(recordingPath)) {
return unGzip(recordingPath);
}
recordingFile = Files.readAllBytes(recordingPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will load the entire recording file into RAM within Cryostat's process, which is probably within a container with a relatively small memory allocation. If the requested recording file is close to that same size or larger then this operation will fail because there is not enough memory available to allocate. I think we need to find a way to return a Stream from this method and pump that Stream out to the HTTP response in the endpoint handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to use the pipeFromInput as TargetRecordingGetHandler but it is not returning response and always returning an empty file. Would you mind looking and help me to find what is wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll see if I can find a bit of time to look into this tomorrow or Friday. If not then I am on PTO the week after, but I can take a look the week after that if you're still blocked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see what might be going wrong. The TargetRecordingGetHandler implementation needs an open JMX connection to the target since it wants to stream the recording data over JMX, through Cryostat, and back into a pipe to the requesting client over HTTP. Therefore, it creates a ConnectionDescriptor object that represents the JMX Service URL of the application and any required JMX credentials. It gets the JMX Service URL from the :targetId path parameter of the request.

In this case, we want to do something similar with streaming an InputStream to the client over HTTP, but it's just backed by a simple file on disk rather than by a JMX connection to a target application. So we don't actually need or want a JMX connection to be opened. But, because the OutputToReadStream class was created with that JMX-to-HTTP piping in mind, it requires a TargetConnectionManager and ConnectionDescriptor. So in the modified handler you are also creating a ConnectionDescriptor, but this actually is an invalid descriptor because the request URL has no :targetId path parameter - instead it has a :sourceTarget that would be equivalent, but again it's unnecessary. I think what happens is that this invalid ConnectionDescriptor gets passed to the TargetConnectionManager, and then inside OutputToReadStream any time the method checkConnection is called it silently fails because the descriptor is invalid.

So in the end I think the fix will be to create another class very similar to OutputToReadStream, but without the dependency on a ConnectionDescriptor and the TargetConnectionManager. I will see if I can commit directly to your PR branch here and help out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lkonno I think I got it working, at least the immediate bug we were just talking about. I was able to download an archived recording through the web-client UI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lkonno lkonno marked this pull request as draft November 9, 2022 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Stretch Goals
Development

Successfully merging this pull request may close these issues.

[Task] Compression for Archived Recordings
3 participants