Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for build.spec.source.url #512

Merged
merged 3 commits into from Jan 20, 2021
Merged

Add validation for build.spec.source.url #512

merged 3 commits into from Jan 20, 2021

Conversation

xiujuan95
Copy link
Contributor

@xiujuan95 xiujuan95 commented Dec 8, 2020

Fix issue: #464

  • Add a new package git to verify if remote repository exists. Here, leverage go-git package and call List func to do the same thing with git ls-remote -p ${source_url} command to check it;

If a valid remote public repository exists, nothing will be returned when run this command. Otherwise, it will return an error like below:

xiangxiulis-mbp:build xiangxiuli$ kubectl get build
NAME                                        REGISTERED   REASON                 BUILDSTRATEGYKIND      BUILDSTRATEGYNAME   CREATIONTIME
buildpack-nodejs-build           False        repository not found   ClusterBuildStrategy   buildpacks-v3       2m27s

  • Introduce a new annotation build.build.dev/verify.repository. If the value of it is true, then build controller will go to check sourceUrl, otherwise, it won't check it;

  • Introduce two new build failed reasons repository not found and invalid source url:

xiangxiulis-mbp:build xiangxiuli$ k get build
NAME                                        REGISTERED   REASON                 BUILDSTRATEGYKIND      BUILDSTRATEGYNAME   CREATIONTIME
buildpack-nodejs-build         False        repository not found   ClusterBuildStrategy   buildpacks-v3       2m27s
buildpack-nodejs-build-invalidURL   False        invalid source url     ClusterBuildStrategy   buildpacks-v3       116s
  • Add two unit tests to verify valid remote public repositories exist or not;

  • Add several integration tests, mainly cover below scenario:

    • Check a invalid remote public repo;
    • Check source url is invalid, such as foobar;
    • Skip check private git repo;

Note: this PR mainly focus on verifying a public repo.

@xiujuan95
Copy link
Contributor Author

/test 4.4-unit

Copy link
Contributor

@HeavyWombat HeavyWombat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the validation. I think this and possible other validation can be really helpful for end-user happiness. I would like to ask you to look into an alternative solution for the actual Git interaction.

pkg/git/git.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/apis/build/v1alpha1/build_types.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git_test.go Outdated Show resolved Hide resolved
test/integration/build_to_git_test.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
@coreydaley
Copy link
Member

It makes a pull request much easier to review if you split it into two commits, one for the code that you are writing, and one for vendoring dependencies. Otherwise we find ourselves combing through 400+ files.

@xiujuan95
Copy link
Contributor Author

It makes a pull request much easier to review if you split it into two commits, one for the code that you are writing, and one for vendoring dependencies. Otherwise we find ourselves combing through 400+ files.

@coreydaley Thanks for your suggestions! Now, I split it into two. Pls take a look again, thanks!

@coreydaley
Copy link
Member

It makes a pull request much easier to review if you split it into two commits, one for the code that you are writing, and one for vendoring dependencies. Otherwise we find ourselves combing through 400+ files.

@coreydaley Thanks for your suggestions! Now, I split it into two. Pls take a look again, thanks!

Awesome, thank you!

Copy link
Contributor

@qu1queee qu1queee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiujuan95 nice work. There is still some pieces of this PR to improve, but we are close. Sorry for my late reply. I didn´t check the integration-tests yet, but lets address the rest of my comments first.

pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git.go Show resolved Hide resolved
pkg/git/git_test.go Outdated Show resolved Hide resolved
pkg/git/git_test.go Outdated Show resolved Hide resolved
test/catalog.go Show resolved Hide resolved
test/integration/build_to_git_test.go Outdated Show resolved Hide resolved
@xiujuan95
Copy link
Contributor Author

/test 4.4-unit

pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
test/build_samples.go Outdated Show resolved Hide resolved
@xiujuan95
Copy link
Contributor Author

@coreydaley @qu1queee There are many many comments. I think I have refined code according to your all comments. Pls take a look again, thanks! If I am missing anything, pls let me know!


// AnnotationBuildRefRepository is a label key that tells the Build Controller to check whether remote repository exists or not.
// If its value is true, controller checks remote repository, otherwise, it won't be checked
AnnotationBuildRefRepository = "build.build.dev/verify.repository"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant name doesn't really signify what it is being used for, how about something like "AnnotationBuildVerifyRepository" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

pkg/git/git.go Outdated
}

// ValidateGitURLFormat validates the given URL format is valid and public or not
func ValidateGitURLFormat(url string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, this function is not really validating anything, it is parsing the url and returning whether it is a public, private, or invalid repository, so the function name seems a bit off. How about something like "IdentifyGitURLType" or "ParseGitURLType" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

//
// SPDX-License-Identifier: Apache-2.0

package integration_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be "package integration"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep use integration_test package name.

Copy link
Contributor

@qu1queee qu1queee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiujuan95 thanks, we should be close to finish. There is one more concern here. The current PR does not enforce the GIT validation by default. If we want the Build controller to validate a Build spec.source.url, users need to define the annotation.

I believe we need to deal with the GIT validation in the same way as we do with the secrets and strategies validation which is by default. But, because of @gabemontero concerns, we should be flexible, therefore the annotation should be a way to allow users to disable the validation if desired. I know this contradicts my statement in #464 (comment) , but now that I see the PR I think we should opt-in by default, and opt-out if desired via the annotation. @gabemontero fyi.

@xiujuan95
Copy link
Contributor Author

xiujuan95 commented Dec 17, 2020

The current PR does not enforce the GIT validation by default. If we want the Build controller to validate a Build spec.source.url, users need to define the annotation.

Just for your info, I realized this annotation will be alway added and set to be true by UI and CLI by default, so our users don't need to define this annotation by themselves. But for shipwright build users, yes, they indeed need to add this annotation by themselves.

Besides, if we validate sourceURL by default, then most of our test cases will be fail unless we updates all sample cases to add this annotation and set it to be true.

So @qu1queee , I think current way is fine for us. Only the value of this annotation is true, Build will go to verify sourceURL.

@xiujuan95
Copy link
Contributor Author

@coreydaley @qu1queee I add a new commit to update build document, pls take a look, thanks!

@qu1queee
Copy link
Contributor

@xiujuan95 im not sure I fully understand your concerns.

Besides, if we validate sourceURL by default, then most of our test cases will be fail unless we updates all sample cases to add this annotation and set it to be true.

why the above? if we validate by default, the annotation does not need to be set it at all, we would use the annotation only to disable the validation, not to enable.

Just for your info, I realized this annotation will be alway added and set to be true by UI and CLI by default, so our users don't need to define this annotation by themselves. But for shipwright build users, yes, they indeed need to add this annotation by themselves.

I dont think is true. I think in general there is a thinking that UI/CLI can abstract complexity from users, e.g. adding an annotation in Builds or Secrets during creation , thinking which we should use only if needed. For example, for the build.build.dev/referenced.secret we really needed users to add this on their secrets(we cannot do this because of performance implications in our controllers), and there we acknowledge that UI/CLI can remove this complexity by adding the annotation on the secrets creation in behalf of users. Another example is build.build.dev/build-run-deletion annotation in Build, where we cannot enable this annotation by default, because then we will defining for users a behaviour that is only the user responsibility to define. But for build.build.dev/verify.repository, I´m not sure why we need to delegate this responsibility to UI/CLI, specially when there is no real argument to do this. We can enable this validation by default, so why we wouldn´t do this. Not doing this will force UI/CLI users to abstract this complexity(annotation definition), but as I said, there is no real argument to delegate this to them. The only argument I see is around avoiding this validation, e.g. concerns on doing a remote call or similar, therefore the flexibility to disable this validation via an annotation.

@xiujuan95
Copy link
Contributor Author

why the above? if we validate by default, the annotation does not need to be set it at all, we would use the annotation only to disable the validation, not to enable.

@qu1queee @zhangtbj @SaschaSchwarze0 I am fine about validating sourceURL by default. But I still want to say if we validate it by default, we need to update all our test samples and add this annotation and set it to be false.

This is because we don't set the annotation by default, that means b.GetAnnotations()[build.AnnotationBuildVerifyRepository] is an empty string, however, Build also will validate sourceURL by default.

Under this situation, because for all our test samples, sourceURLs only are fake, then Build will validate these failed because sourceURL is invalid or not found.

For example, when we create a build with an invalid URL: https://github.com/shipwright-io/build/blob/master/test/catalog.go#L55-L78 , Build will validate URL by default. Then build will be fail because of invalidate sourceURL. However, this sample is created for validating secret. So ideally, we expect Build will skip validate URL for this test case, so we need to update this sample and build.build.dev/verify.repository : "false" annotation to it. Otherwise, this build will be fail because of an unexpected reason. For other test cases, we need to do the same action.

Not sure if I express my concerns clearly or not?

@zhangtbj zhangtbj removed their request for review January 9, 2021 05:37
@qu1queee qu1queee removed their request for review January 14, 2021 10:47
@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 14, 2021
@qu1queee
Copy link
Contributor

@bigkevmcd @SaschaSchwarze0 @gabemontero asking for re-review on this PR, we already addressed all the required changes, thanks for this, very valuable feedback.

One comment is that we will be working on a second PR where we are enhancing the overall Status of the Build, there we will improve on the current new error messages we are putting now in Build Status.Reason field.

Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Nice work.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2021
Copy link
Member

@gabemontero gabemontero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have what I think is a minor addition request .... we've certainly seen use of http for the git protocol in some cases with openshift build v1

docs/build.md Outdated
revision: master
```

_Note_: The Build controller only validates two scenarios. The first one where the endpoint uses an `https` protocol, the second one when a `ssh` protocol (_e.g. `git@`_) is defined and none referenced secret was provided(_e.g. source.credentials.name_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically speaking, users can still use http for accessing git repositories ... especially if they are private/local ones vs. actual github.com

I think we should allow for that as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not get why we would rule out http at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We miss this one. I added it, please take a look and resolve this conversation if possible.

pkg/git/git.go Outdated
}

switch endpoint.Protocol {
case httpsProtocol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah correlating with my comment in the doc section, http is a valid, if not primary, use case, that should be easy to add

and this would be the spot, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest that it should log out that it's insecurely accessing the repository when using plain HTTP.

It could be a typo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it. Thanks for the feedback. @bigkevmcd somehow I cannot re-request a review to you, not sure why.

Copy link
Contributor

@HeavyWombat HeavyWombat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider allowing http if possible.

docs/build.md Outdated Show resolved Hide resolved
docs/build.md Outdated
revision: master
```

_Note_: The Build controller only validates two scenarios. The first one where the endpoint uses an `https` protocol, the second one when a `ssh` protocol (_e.g. `git@`_) is defined and none referenced secret was provided(_e.g. source.credentials.name_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not get why we would rule out http at this point.

pkg/controller/build/build_controller.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller_test.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller_test.go Outdated Show resolved Hide resolved
pkg/controller/build/build_controller_test.go Outdated Show resolved Hide resolved
test/integration/build_to_git_test.go Outdated Show resolved Hide resolved
test/integration/build_to_git_test.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 18, 2021
}

switch endpoint.Protocol {
case httpsProtocol, httpProtocol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @bigkevmcd 's suggestion at #512 (comment) is a good idea

So have a case httpProtocol: that prints that message, and then fallsthrough to the httpsProtocol block.

Othwerise I'm good @qu1queee

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bigkevmcd this can only be logged out if the validation fails. Just to be clear, you say that if an http protocol is used, we should enhance remote repository unreachable to be something like insecurely accessing the repository failed, repository unreachable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any time we parse an httpProtocol case, we should log this out as a possibly insecure case.

Copy link
Contributor

@qu1queee qu1queee Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bigkevmcd thanks. I see. Do you mind if I take this enhancement into the other PR we are gonna make available soon for review, which is #536 . At the moment Build have only a Status.Reason which is a message, what we are gonna do is to make Status.Reason a one-word camelCase and introduce Status.Message, see field changes .With Status.Message, we can then embed there the as a possibly insecure case whenever we validate an httpProtocol. I think we have more flexibility on that pr for this suggested change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of incorporating this into the #536 work @qu1queee - good idea

That said, I don't see why that would preclude us from still providing a simple informational log here. Catch the user's attention both in real time (via the log) and post-mortem (via the Build status).

That would be my preference.

…alidates spec.source.url

refine the main logic of build validates sourceURL

return validate error and update error

only return updateError when validate sourceURL failed

rename verify sourceURL annotation and make verifySourceURL by default

update test cases again

revert verifySourceURL

verify sourceURL by default

add unit test for validateSourceURL

return a same error when list remote repo failed

output different errors and update tests

Signed-off-by: Enrique Encalada <encalada@de.ibm.com>
Signed-off-by: Enrique Encalada <encalada@de.ibm.com>
Change verify sourceURL by default

Signed-off-by: Enrique Encalada <encalada@de.ibm.com>
@gabemontero
Copy link
Member

/approve

still have some discussion between @qu1queee @bigkevmcd and myself on the http vs. https validation

@sbose78
Copy link
Member

sbose78 commented Jan 19, 2021

Thanks for your kind reminder! As far as I know, until now, we don't have any users are using Azure. If have in the future, we can modify it again I think.

@xiujuan95 I feel this may not be a fair assumption 🤔

@xiujuan95
Copy link
Contributor Author

I feel this may not be a fair assumption 🤔

@sbose78 Aha, maybe I am missing something, please correct me about this and put your thoughts, thanks a lot!

@SaschaSchwarze0
Copy link
Member

/approve

Not sure why there is no approved tag for this comment from @gabemontero.

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 20, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabemontero, SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [SaschaSchwarze0,gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2021
@openshift-merge-robot openshift-merge-robot merged commit a46cddf into shipwright-io:master Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet