Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(auto-rules): PeriodicArchiver scans archives on startup #551

Merged
merged 82 commits into from Aug 11, 2021
Merged

fix(auto-rules): PeriodicArchiver scans archives on startup #551

merged 82 commits into from Aug 11, 2021

Conversation

hareetd
Copy link
Contributor

@hareetd hareetd commented Jul 6, 2021

By scanning existing archives on startup, this fix allows pruning of old recordings from previous calls to PeriodicArchiver for a given rule.

Fixes #542

Depends on #557

@hareetd
Copy link
Contributor Author

hareetd commented Jul 6, 2021

I requested a review so I know if I'm on the right track before I start updating the corresponding unit tests.

@andrewazores andrewazores changed the title Ensure Automatic Rules PeriodicArchiver scans existing archives on startup fix(auto-rules): PeriodicArchiver scans archives on startup Jul 6, 2021
@hareetd
Copy link
Contributor Author

hareetd commented Jul 7, 2021

@andrewazores After adding Janelle's fork as a remote repo (in order to get her branch), I'm assuming I'd have to wait until she's pushed the required changes before rebasing on top of her branch?

@andrewazores
Copy link
Member

@andrewazores After adding Janelle's fork as a remote repo (in order to get her branch), I'm assuming I'd have to wait until she's pushed the required changes before rebasing on top of her branch?

You'd have to rebase after each change she makes to her branch anyway, up until eventually her branch is merged into main, at which point you would rebase onto main as well. If her branch doesn't yet contain the changes you need then you can still rebase on top of it for now, and then re-rebase later after her branch is updated with the changes.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2021

🎉 Great news! Looks like all the dependencies have been resolved:

💡 To add or remove a dependency please update this issue/PR description.

Brought to you by Dependent Issues (:robot: ). Happy coding!

@hareetd
Copy link
Contributor Author

hareetd commented Jul 8, 2021

Before I make more changes, I'm going to rebase on to upstream/main now that Janelle's PR has been merged. During this interactive rebase, I noticed my commit history contains Janelle's commits first (up to a certain point before the branch got merged and deleted) and then my commits (due to the rebasing I did).

Since upstream/main already contains all of Janelle's commits, should I drop her commits from my branch, keeping only my commits, thereby minimizing merge conflicts during the rebase?

@andrewazores
Copy link
Member

andrewazores commented Jul 8, 2021

Before I make more changes, I'm going to rebase on to upstream/main now that Janelle's PR has been merged. During this interactive rebase, I noticed my commit history contains Janelle's commits first (up to a certain point before the branch got merged and deleted) and then my commits (due to the rebasing I did).

Since upstream/main already contains all of Janelle's commits, should I drop her commits from my branch, keeping only my commits, thereby minimizing merge conflicts during the rebase?

Yes exactly, you should drop all of the commits in your branch that are from the old base. Those changes (roughly) are now in main, so your branch should not contain any commits that redo the same changes over again, or else you'll have some strange conflicts to try to resolve where the commits probably just end up empty. git doesn't know what happened to those commits because they got squashed into one larger commit, so when it uses the commit hashes to determine the rebase, it doesn't know that the first series of commits in your branch represent logical changes that are already in the new base.

@hareetd
Copy link
Contributor Author

hareetd commented Jul 8, 2021

Nice, that makes sense. This was a good way for me to get more comfortable with Git.

@andrewazores andrewazores added this to In progress in API v2 via automation Jul 9, 2021
@andrewazores andrewazores added this to In progress in Recording Triggers via automation Jul 9, 2021
@hareetd
Copy link
Contributor Author

hareetd commented Jul 13, 2021

Could you take a look at just the RecordingArchiveHelper and PeriodicArchiver to see if my changes make sense?

Edit: Also, for the various archived recording handlers (GET, DELETE, etc.) I'm thinking of changing them so that the target service URI is now a request parameter, allowing me to properly index searches in the restructured directory. Does that seem right?

@andrewazores
Copy link
Member

Edit: Also, for the various archived recording handlers (GET, DELETE, etc.) I'm thinking of changing them so that the target service URI is now a request parameter, allowing me to properly index searches in the restructured directory. Does that seem right?

You mean to add the target's service URI as a path parameter on the request? That would be a breaking API change, so it shouldn't be done to the existing V1 handlers.

If I'm understanding correctly, the concern is that ex. DELETE /api/v1/recordings/:recordingName doesn't necessarily correlate the request to a target's service URI, since the recording file name may not contain the service URI to be parsed out and used for reference.

This is no different from the current state of the archives really, though - right now we might have archives/targetA_foo.jfr, archives/targetB_foo.jfr, but with the subdirectory reorganization here it would be archives/targetA_hash/targetA_foo.jfr, archives/targetB_hash/targetB_foo.jfr. So, there's just one additional level of directories to search through to find the matching recording file, unless the client can tell us what targetA or targetA_hash are. But the problem with that is that the client does not know those values either, since the client can only be presumed to know about targetA_foo.jfr because they did a GET /api/v1/recordings, and got back the bare list of recording names.

I do think that along with subdividing the archives by target it would make sense to mirror that change in the API, but it would need to be done in V2 handlers. PATCH 'save' /api/v2/targets/:targetId/recordings/:recordingName could create an archived recording at archives/targetId_hash/recordingName_timestamp.jfr, for example, and then DELETE /api/v2/targets/:targetId/archives/:filename could be used for deleting that recording, or GET to download it, etc.

If we had that then the unstructured archives used by the V1 handlers could be loosened up further and not require uploads to follow any filename convention, and this could be used to allow users to upload arbitrary JFR files for analysis using jfr-datasource/grafana. They can already do this, really, but the filename convention is a minor barrier that makes that a bit annoying.

Anyway, getting on a tangent. For the scope of this issue/PR, I don't think we want any V1 API behaviour to be changed - everything done in here should be an internal implementation detail. If it sets us up for V2 (or V3) enhancements later like what I outlined above, then great.

@hareetd
Copy link
Contributor Author

hareetd commented Jul 13, 2021

Alright, thanks for the thorough explanation, makes sense.

@hareetd hareetd marked this pull request as ready for review August 3, 2021 18:04
Hareet Dhillon added 21 commits August 11, 2021 11:31
…elper; replace old, static RecordingNotFoundException class with new, separate class
@hareetd
Copy link
Contributor Author

hareetd commented Aug 11, 2021

Looks good to go, the only new changes are the two commits above, the first of which is due to messing up the rebase when bringing in your changes from #599.

Edit: On that note, during an interactive rebase is it fine if I make any changes I see fit when resolving conflicts or should I stick to only pressing the "Accept current changes", "Accept incoming changes", "Accept both changes", etc. commands? I'm worried I might mess something up regarding the commit history if I move stuff around or add/delete too much.

@andrewazores
Copy link
Member

Edit: On that note, during an interactive rebase is it fine if I make any changes I see fit when resolving conflicts or should I stick to only pressing the "Accept current changes", "Accept incoming changes", "Accept both changes", etc. commands? I'm worried I might mess something up regarding the commit history if I move stuff around or add/delete too much.

It's best to just resolve the conflicts during the rebase so that each commit in the series can re-apply. If there are any further adjustments required - in this example, updating tests because a method was added to an interface in the branch you're rebasing onto - it's usually easiest/cleanest to do as you did and add that as an additional commit on top. The more you do to adjust the commits in series during conflict resolution, the more likely those changes are to cause further conflicts with the next commits in the series.

API v2 automation moved this from In progress to Ready Aug 11, 2021
Recording Triggers automation moved this from In progress to Ready Aug 11, 2021
@andrewazores andrewazores merged commit 2a26702 into cryostatio:main Aug 11, 2021
API v2 automation moved this from Ready to Done Aug 11, 2021
Recording Triggers automation moved this from Ready to Done Aug 11, 2021
@hareetd hareetd deleted the scan-existing-archives-on-startup branch August 11, 2021 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
API v2
  
Done
Development

Successfully merging this pull request may close these issues.

Automatic Rules PeriodicArchiver should scan existing archives on startup
3 participants