Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to build&push image on PR via label #604

Closed
wants to merge 1 commit into from

Conversation

mszostok
Copy link
Contributor

@mszostok mszostok commented Jun 1, 2022

ISSUE TYPE
  • Feature Pull Request
SUMMARY

Add an option to push the BotKube image automatically on PR. Because of the security reasons, it's enabled only if the build-pr-image label is added—since external users do not have the permission to assign labels, this effectively requires repository owners to manually review the changes first.

Here is recorded demo https://www.loom.com/share/2f0fdc621a144d32861d83946bfff1d7 and repository on which I tested that change: https://github.com/mszostok/botkube/pulls

This PR will solve the problem with manual PR builds, e.g. we had that issue here:

Fixes #590

After merge

A new label build-pr-image needs to be created.

Further improvements

  1. Add build-pr-image automatically, for all trusted members (org collaborators). As a result, we don't need to remember about that.
  2. Add a comment about successfully pushed images directly under PR. It just simplifies the UX by removing one more mouse click.

Alternative solution

The alternative doesn't require manual action. However, it's more complicated.

See: https://github.com/infracloudio/botkube/compare/develop...mszostok:build-images-on-prs-v2?expand=1

To ensure that secrets won't be available for untrusted code, first we need to build the image and share it with the second job, which doesn't check out the untrusted code and can safely push an artifact to ghcr.io.

Security

This article describes it well: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

@mszostok mszostok added enhancement New feature or request ci Related to CI pipelines labels Jun 1, 2022
@pkosiec pkosiec self-assigned this Jun 2, 2022
@pkosiec pkosiec self-requested a review June 2, 2022 07:09
Copy link
Member

@pkosiec pkosiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR with a demo recording and nice that you prepared the alternative approach 👌

Honestly, I'm not a fan of the idea to run the job on label as it requires manual action 🙂 I know we can also automate adding such label, but if we can prevent repushing images without needing reviewer to spot some security issues, without manual action, then why wouldn't we want to go for it? (I mean the alternative approach).

My biggest concern, apart from having the manual action, is that the reviewer is responsible for making sure the PR can be safely built (he/she needs to spot any overrides in Bash scripts, Makefile etc.). With the alternative approach, there's simply no option to mess up with the Packages as pushing is isolated from user.

You wrote the alternative approach is more complex (it saves Docker image and adds another job which pushes the image), but from the projects we worked on in the past, I think this wasn't the problematic part which caused some issues. The most problematic thing was the pull_request_target event (sometimes unexpected) behavior and that's something we still have here in this approach. Also, this is quite questionable whether "saving Docker image as an artifact and pushing it in second job" is more complex than "automatically adding label for org memebers/trusted external contributors" (as we would need to have it to say the solutions are on par).

Also, you said on the demo recording, that sometimes the image build is not necessary, so with that label we can avoid builds when doing e.g. documentation. That's true, but we can also ignore this job for specific paths (e.g. all MD files) automatically in GH Actions job configuration.

So personally I would pick the alternative approach as it can be even simpler in a long run. I would like to have as many things automated as possible, and currently I don't see the advantages of the building images on label (I would say otherwise, now I see even more benefits of the alternative approach). But of course I'm open for further discussion 🙂

@mszostok
Copy link
Contributor Author

mszostok commented Jun 2, 2022

Hi!

You wrote the alternative approach is more complex

yes, not only based on code, but also based on the flow. In the label example, you just have a make release_pr_snapshot and that's all. This script is easy and doesn't require maintaining and understand more complex workflow. In the alternative solution, you need to play with the separated jobs, artifacts, and ensure that the save, load and push are in sync.

My biggest concern, apart from having the manual action, is that the reviewer is responsible for making sure the PR can be safely built

yes, and that's also a good reminder, because even if we will apply the alternative solution, we still need to pay attention to it. It's really important to do not remember about that, if you will approve a PR that where you didn't "(..)spot any overrides in Bash scripts, Makefile" then we are in the same bad position. Hiding that, and forgetting about it, is not a save option too.

we can also ignore this job for specific paths

yes, but from our experience, we also know that it's sometimes hard to keep the list of files up to date.


I will finish the alternative solution and will test that on my fork. I will keep you posted.

@mszostok
Copy link
Contributor Author

mszostok commented Jun 2, 2022

I tested the alternative solution and here is a working version: #611

Here is GitHub job execution: https://github.com/mszostok/botkube/runs/6714112689?check_suite_focus=true

I'm still voting for the first option, as described in the comment above. After testing the alternative solution, I'm also worried about the artifact size. It's 272 MB, see: https://github.com/mszostok/botkube/actions/runs/2429967388

I added if: always() to the delete step, to ensure that even if sth will fail, it will be deleted.

      - name: Delete Docker image artifact
        uses: geekyeggo/delete-artifact@v1
        if: always()
        with:
          name: ${{ matrix.APP }}-${{github.sha}}

I'm not able to check what's the limit here, but it looks like 500 MB: https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions

so it means that no more than 2 concurrent builds cannot be scheduled. It could become problematic in the long run.

@pkosiec
Copy link
Member

pkosiec commented Jun 3, 2022

yes, and that's also a good reminder, because even if we will apply the alternative solution, we still need to pay attention to it. It's really important to do not remember about that, if you will approve a PR that where you didn't "(..)spot any overrides in Bash scripts, Makefile" then we are in the same bad position. Hiding that, and forgetting about it, is not a save option too.

We cannot merge something without review, that's obvious. I was referring to one huge (at least for me) difference: In the alternative approach we can run this job safely before review (which can usually save time spent on review as there might be some build issues which contributor can fix before the review). With the label-based approach we need to review PR every time before running build, which increases time for review. Correct?

When it comes to the artifact size, I think we can think about building multiarch image only on develop, and for PRs just amd64, but we can take care of that later.

I'm sorry, but personally I still don't see any advantages in the label-based approach over the alternative solution. We can discuss it offline though. I would be happy to review the PR with alternative approach. BTW matrix.APP in your job is empty, you can see that in logs: Artifact -ae61a6323f637ee2ed7bd778fbfa7c3252c9db66 has been successfully uploaded!. You can fix that before opening PR 🙂

@mszostok
Copy link
Contributor Author

mszostok commented Jun 3, 2022

With the label-based approach we need to review PR every time before running build, which increases time for review. Correct?

Nope, it's only about image. Build is triggered anyway, see: https://github.com/infracloudio/botkube/blob/7735b836e877858e7054caf027055ff421d2c627/.github/workflows/ci.yml#L42-L56
The other part is to just copy the binary. I don't see value here in testing it.

When it comes to the artifact size, I think we can think about building multiarch image only on develop, and for PRs just amd64, but we can take care of that later.

That will be yet another difference in the setup, which personally, I don't like to introduce.

I'm sorry, but personally I still don't see any advantages in the label-based approach over the alternative solution.

as already mentioned:

  • is more complicated in the overall workflow, see the implementation, so it will be harder to maintain
  • problem with artifact upload size
  • hide still a huge security risk, sth is automatically done without any action. If you forget about that, you can forget that this workflow has a security risk included and merge that into develop. Of course, you can do that anyway, but here we at least have a "reminder" on each PR.

In the alternative approach we can run this job safely before review

yes, and you also don't want to run a docker image on your computer if you don't trust it, right? So the image is not needed before review, it's always a good practice to review the code and later execute it on your computer 🙂

and in both places you have manual action, enabling PR image build on demand via putting a label is a manual action, same as adding approve. In k8s repos, they are also based on manual action ok-to-test label, which is more or less similar. We are not able to remove this threat in both approaches.

If the second approach doesn't require build->save-> upload -> download -> load -> push chain then I would say that it's much better but it's not.

@pkosiec
Copy link
Member

pkosiec commented Jun 3, 2022

Nope, it's only about image.
yes, and you also don't want to run a docker image on your computer if you don't trust it, right? So the image is not needed before review, it's always a good practice to review the code and later execute it on your computer 🙂

Yes, it is just about building image, but as I wrote before, we can detect some image build issues before, which saves time before actual review. I didn't say anything about running such image on my local machine before review. Moreover, at later point it would be great to have Helm chart installed on some local cluster (e.g. k3d) on CI and do some true E2E tests (e.g. with Mattermost/later with RocketChat), to unload maintainers. Which would make the E2E job depending on building image, which would require manual action.

is more complicated in the overall workflow, see the implementation, so it will be harder to maintain

I already responded why your proposed solution is on par when it comes to complexity, or even more complex (if you want to do automatic labeling for org members and external contributors, and also automatic unlabeling once job is ran or canceled). And we have that already working.

BTW even if we went with such option, we would need such automation right now, as I can't imagine that every maintainer is forced to label PR and wait for image build and unlabel PR when a contributor pushes changes 😄

hide still a huge security risk,

This is totally untrue, as an automatically ran job in the alternative solution cannot override any other Docker images. In your labeled-based approach it can, so your labeled-based approach hides a true security risk. People are always the weakest points when it comes to security. That manual action repeated many times over time won't prevent anything - it will end in the opposite. So the alternative approach brings an additional layer of security, which I am voting for.

BTW you need to unlabel the PR from external contributor once the image is built, right? To avoid repushing images when a contributor pushes changes to the branch? This is even more error-prone than I thought.

problem with artifact upload size

And that's first valid point which resonates with me, however firstly you would need to confirm the limit for open source and if this is a limit per job or not. We need to see why this is so big (even if this is multiarch), and even if we split some components to multiple binaries, I don't think we will split Docker images, maybe we will do something like istiod - let's see. So I wouldn't treat that as a blocker as we need to see how BotKube evolves and how many optimizations we can do first.

I'm still voting for a better security and more automation - that is the alternative approach.

@mszostok
Copy link
Contributor Author

mszostok commented Jun 3, 2022

To sum up, I see pros and cons in both approaches, and I'm OK-ish with both of them.

As discussed f2f with Paweł, we would like to ask you @PrasadG193 to check both approaches and let us know about your thoughts 👍 Maybe you know some alternative approach?

@mszostok
Copy link
Contributor Author

Close in favor of #611

@mszostok mszostok closed this Jun 21, 2022
@mszostok mszostok deleted the build-images-on-prs branch June 21, 2022 08:26
mergify bot pushed a commit that referenced this pull request Jun 21, 2022
##### ISSUE TYPE

 - Feature Pull Request

##### SUMMARY

Add an option to push the BotKube image automatically on PR. It's alternative approach for #604.

This PR will solve the problem with manual PR builds, e.g. we had that issue here:
- #601
- #593
- #582
- #583

Example run: https://github.com/mszostok/botkube/runs/6714112689?check_suite_focus=true

Fixes #590 

To ensure that secrets won't be available for untrusted code, first we need to build the image and share it with the second job, which doesn't check out the untrusted code and can safely push an artifact to ghcr.io.

The flow is as follows:
```
Job1: image build -> image save -> artifact upload 
Job2: artifact download -> image load -> image push
```
Job1—runs untrusted code but without write repo perms
Job2—push built image with package write perms

#### Security

This article describes it well: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Related to CI pipelines do not merge enhancement New feature or request hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build Docker image on PRs
2 participants