Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-13105 Implement a FlowRegistryClient that use GitHub for version… #8765

Closed
wants to merge 5 commits into from

Conversation

bbende
Copy link
Contributor

@bbende bbende commented May 7, 2024

…ing flows

Summary

NIFI-13105

This PR adds a new FlowRegistryClient that uses GitHub API to version control flows. Currently the client is configured with the repo information, including a branch and optional path in the repo to use, otherwise the root of the repo is used.

Buckets are directories underneath the location in the repo. The client will check for a bucket named Default, and if it does not exist it will be created. Additional buckets can be created by adding additional directories to the repo outside of NiFi.

The branch is planned to become part of the version control screens when saving/importing which is tracked by. NIFI-13127. Until then, the branch that is configured in the client will be used for all operations, so in order to work on different branches, it currently requires creating separate registry clients pointing at the same repo, but different branches.

It is also possible to import flows from a public repo that you don't have permissions to by not providing an access token. This will allow import and change version, but attempting to save a change will result in a "not found" exception which is GitHub's behavior when you specify a branch or file that you don't have access to, even though it may be in the repo.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

UI Contributions

  • NiFi is modernizing its UI. Any contributions that update the current UI also need to be implemented in the new UI.

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @bbende! This looks great overall, I just noted a few minor syntax-level recommendations. All of the new class files appear to have extra padding in the license header, and a couple other very minor things.

@bbende
Copy link
Contributor Author

bbende commented May 7, 2024

Thanks @exceptionfactory ! Pushed updates, let me know of anything else

Copy link
Contributor

@simonbence simonbence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new registry client this is something I think many people waited for :) I have a couple of comments if you could look them through. I did not yet run the client but I am planning before closing the PR

final String branch = flowLocation.getBranch();
final String filePath = getSnapshotFilePath(flowLocation);

final RegisteredFlowSnapshot existingSnapshot = getSnapshot(filePath, branch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NiFiRegistry implemnetation utilizes a NoSuchFlowException for the cases the flow could not ne find. I think it would be beneficial to have something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into it, but mentioned in other comments how we don't really know if it's the bucket, file, or branch that doesn't exist.

}

private void throwPathOrBranchNotFound(final String path, final String branch) throws FlowRegistryException {
throw new FlowRegistryException("Path [" + path + "] or Branch [" + branch + "] not found");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on which part of the path is not existing (bucket, flow, version), there are specific FlowRegistryException subclasses for the case. Also as the new concept "Branch" was added, it might worth to add a specific exception for that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the sub-classes, I wasn't aware of those. There might be a few cases where we could leverage one of them, but I don't think we can use it here because we don't actually know which part doesn't exist. It could be any one of the bucket directory, the snapshot file name, or the branch. Unless we make a lot of individual requests to check each piece, but I don't want to do that because then we end up making lots of remote calls to GitHub for one single operation, and during background synchronize of many versioned PGs, it will end up being thousands of calls to GitHub.

@@ -5357,6 +5355,8 @@ public RegisteredFlow registerVersionedFlow(final String registryId, final Regis
}

try {
final String generatedId = registry.generateFlowId(flow.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be something I would prefer not to expose outside of the node. This should be an internal implementation detail for the registry node (or implementation). Is there any particular reason this leaves the boundaries of a registry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the framework generated a UUID in service facade before calling the registerFlow method in registry client, but for GitHub we don't want a UUID, so this just allows the framework to ask the client to generate the id, and the default behavior can return a UUID like before, and the GitHub impl can do something different.

- Populate storageLocation and implement isStorageLocationApplicable
- Determine read/write permissions for the repo based on the client credentials
- Use the permissions to populate bucket permissions
- Change logic for creating default bucket to only happen if no other buckets and if write permissions
- Verify permissions accordingly in all client methods based on read or write
- Consolidate getFilenames and getDirectoryNames to helper method
@bbende
Copy link
Contributor Author

bbende commented May 8, 2024

@simonbence I pushed up some improvements based on your review, I think the permissions part is working much better now that I figured out how get the permissions that the client has to the repo, and can use them to set the bucket permissions.

Items addressed:

  • Populate storageLocation and implement isStorageLocationApplicable
  • Determine read/write permissions for the repo based on the client credentials
  • Use the permissions to populate bucket permissions
  • Change logic for creating default bucket to only happen if no other buckets and if write permissions
  • Verify permissions accordingly in all client methods based on read or write
  • Consolidate getFilenames and getDirectoryNames to helper method

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work on this new feature @bbende, and thanks for the detailed review @simonbence. I reviewed the latest updates and tested out the changes, looks good! +1 merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants