Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] ”Generate Builder“ step fails #942

Closed
jenstroeger opened this issue Sep 30, 2022 · 24 comments · Fixed by #945
Closed

[bug] ”Generate Builder“ step fails #942

jenstroeger opened this issue Sep 30, 2022 · 24 comments · Fixed by #945
Labels
area:generic Issue with the generic generator type:bug Something isn't working

Comments

@jenstroeger
Copy link

Describe the bug

It looks like the Generate Builder step fails:

Run ./.github/actions/generate-builder/generate-builder.sh
  ./.github/actions/generate-builder/generate-builder.sh
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    BUILDER_BINARY: slsa-generator-generic-linux-amd64
    BUILDER_DIR: internal/builders/generic
    BUILDER_REPOSITORY: slsa-framework/slsa-github-generator
    BUILDER_RELEASE_BINARY: slsa-generator-generic-linux-amd64
    VERIFIER_REPOSITORY: slsa-framework/slsa-verifier
    VERIFIER_RELEASE_BINARY: slsa-verifier-linux-amd64
    VERIFIER_RELEASE_BINARY_SHA256: f92fc4e571949c796d7709bb3f0814a733124b0155e484fad095b5ca68b4cb21
    VERIFIER_RELEASE: v1.1.1
    COMPILE_BUILDER: false
    BUILDER_REF: refs/tags/v1.2.0
    GH_TOKEN: ***
Fetching the builder with ref: refs/tags/v1.2.0
Builder version: v1.2.0
BUILDER_REPOSITORY: slsa-framework/slsa-github-generator
verifier hash computed is f92fc4e571949c796d7709bb3f0814a733124b0155e484fad095b5ca68b4cb21
verifier hash verification has passed
FAILED: SLSA verification failed: could not find a matching valid signature entry
Error: Process completed with exit code 6.

when invoked as a job similar to this public example but in a private repo:

  provenance:
    needs: build
    # The generator should be referenced with a semantic version.
    # The build will fail if we reference it using the commit sha.
    uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v1.2.0
    with:
      base64-subjects: ${{ needs.build.outputs.artifacts-sha256 }}
    permissions:
      actions: read # To read the workflow path.
      id-token: write # To sign the provenance.
      contents: write # To add assets to a release.

To Reproduce

Restarting the failed job continues to fail. Alas, private repo.

Expected behavior

This has worked before and during today’s run failed.

Screenshots

See above.

Additional context

@jenstroeger jenstroeger added status:triage Issue that has not been triaged type:bug Something isn't working labels Sep 30, 2022
@asraa
Copy link
Collaborator

asraa commented Sep 30, 2022

Hi! Thanks for the report: we've seen this and are trying to fix this all the way through.

This is the same issue that @stephenfuqua reported: #876 (comment)

In the meantime, enable compile-builder: true to succeed.

What we know has happened is that Rekor (the transparency log that the builders provenance is recorded on) had breaking changes that broke our verification.

Here's what happened in this particular case when we were verifying the builder provenance's inclusion in the log: we checked verified inclusion proof against the root hash, and then verified the root hash against a current signed tree head. After Rekor sharded (rotated trees) in production, the tree head no longer corresponds to the same tree that the entry was in, which caused a verification error. In particular, the tree size given in the entry's inclusion proof is LARGER than the current tree size after rotation.

This will fix this slsa-framework/slsa-verifier#277, and we'll backport the changes to the release/v1.1 branch of the verifier. I think we'll need to update all builder workflows to stop using verifier v1.1.1. @laurentsimon right?

@jenstroeger
Copy link
Author

Thanks @asraa.

In the meantime, enable compile-builder: true to succeed.

I can’t find that documented here — can you please point at the correct place? Is that an environment variable in env: or is that a parameter in with: and or a part of the job’s args or… ?

@asraa
Copy link
Collaborator

asraa commented Sep 30, 2022

Arg! Good question: I think we purposely omitted it from documentation, since we mostly rely on it to test builders at head that don't have provenance (this option builds from source; and so it will increase the length of time to run a provenance generation)

This is a reuseable workflow input argument (so with). see

@laurentsimon
Copy link
Collaborator

laurentsimon commented Sep 30, 2022

Hi! Thanks for the report: we've seen this and are trying to fix this all the way through.

This is the same issue that @stephenfuqua reported: #876 (comment)

In the meantime, enable compile-builder: true to succeed.

What we know has happened is that Rekor (the transparency log that the builders provenance is recorded on) had breaking changes that broke our verification.

Here's what happened in this particular case when we were verifying the builder provenance's inclusion in the log: we checked verified inclusion proof against the root hash, and then verified the root hash against a current signed tree head. After Rekor sharded (rotated trees) in production, the tree head no longer corresponds to the same tree that the entry was in, which caused a verification error. In particular, the tree size given in the entry's inclusion proof is LARGER than the current tree size after rotation.

This will fix this slsa-framework/slsa-verifier#277, and we'll backport the changes to the release/v1.1 branch of the verifier. I think we'll need to update all builder workflows to stop using verifier v1.1.1. @laurentsimon right?

sounds safe enough. Let's do that.

@jenstroeger
Copy link
Author

In the meantime, enable compile-builder: true to succeed.

It was compile-generator like in the linked code snippet 😉 and it worked. Thanks @asraa!

Furthermore, we’ll refactor our own release actions such that a failure in the SLSA tools won’t break our own workflows (like it did this time). Which brings me to an important question: would you consider these SLSA workflows stable, or not yet? What’s the current roadmap?

@ianlewis
Copy link
Member

ianlewis commented Oct 3, 2022

In the meantime, enable compile-builder: true to succeed.

It was compile-generator like in the linked code snippet 😉 and it worked. Thanks @asraa!

Furthermore, we’ll refactor our own release actions such that a failure in the SLSA tools won’t break our own workflows (like it did this time). Which brings me to an important question: would you consider these SLSA workflows stable, or not yet? What’s the current roadmap?

The workflows themselves are stable but we rely on the public Rekor instance as the transparency log for now and we have unintentionally been the guinea pig/canary for a few issues recently (issues with public server discovered by this project). We may support folks using their own Rekor instance as part of #34 but we haven't prioritized that on the roadmap yet.

@asraa Are there any docs on the current support level/SLA for the public Rekor and/or any docs on future plans wrt SLAs that we can point @jenstroeger to?

@ianlewis ianlewis added area:generic Issue with the generic generator and removed status:triage Issue that has not been triaged labels Oct 3, 2022
@asraa
Copy link
Collaborator

asraa commented Oct 4, 2022

Which brings me to an important question: would you consider these SLSA workflows stable, or not yet? What’s the current roadmap?

Yes -- good question: like Ian said we rely on Sigstore's public instance for the transparency log. It's projected to go fully GA in mid-October, which is why we're seeing a lot of last breaking changes as the API stabilizes.

At GA, there will be an SLA for Rekor: https://docs.google.com/document/d/1lhcnNGA9yuNSt0W72fEiCPhLN4M6frR_eUolycLIc8w/edit#heading=h.nq2r70ogoyyl (you may need to join sigstore-dev@googlegroups.com). The highlights are that there's 99.9% availability for Rekor's endpoint. With regards to the types of errors we saw here (where there were incompatible changes with client code): I expect that after GA Sigstore will not be breaking existing clients.

@behnazh-w
Copy link
Contributor

The slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v1.2.0 is still failing with the builder verification error.

Is there going to be a patch version for the generator_generic_slsa3.yml workflow?

@ianlewis
Copy link
Member

ianlewis commented Oct 4, 2022

The slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v1.2.0 is still failing with the builder verification error.

Is there going to be a patch version for the generator_generic_slsa3.yml workflow?

Yes, we'll cut a patch soon once we've fixed some e2e tests that are failing.

@behnazh-w
Copy link
Contributor

Thanks @ianlewis. So, that would also mean generator_generic_slsa3.yml@v1.2.0 should be deprecated and all the existing workflows updated, right? In terms of semantic versioning, I wonder if the generators should be bumped though.

It's an interesting case because the public Rekor instance hasn't/can't be pinned as a dependency and if there is a breaking change there, we can't really rely on semantic versioning to decide what needs to be updated. I would have expected a pinned version of generator_generic_slsa3.yml@v1.2.0 to be working and available provided that its dependencies are pinned too (that's where a complete SBOM becomes very crucial ) even if a bug is found 🤔

@di
Copy link
Member

di commented Oct 5, 2022

This also broke the sigstore-python release: https://github.com/sigstore/sigstore-python/actions/runs/3186405998/jobs/5196979993

Can I get a rough ETA on when the new version of the action will be cut? Trying to decide if I should roll back provenance generation to get the release out or just wait a bit.

@ianlewis
Copy link
Member

ianlewis commented Oct 5, 2022

Thanks @ianlewis. So, that would also mean generator_generic_slsa3.yml@v1.2.0 should be deprecated and all the existing workflows updated, right? In terms of semantic versioning, I wonder if the generators should be bumped though.

IIUC yes that's right.

It's an interesting case because the public Rekor instance hasn't/can't be pinned as a dependency and if there is a breaking change there, we can't really rely on semantic versioning to decide what needs to be updated. I would have expected a pinned version of generator_generic_slsa3.yml@v1.2.0 to be working and available provided that its dependencies are pinned too (that's where a complete SBOM becomes very crucial ) even if a bug is found 🤔

Yes. Part of the reason is that the public Rekor itself isn't GA but we are relying on it. Unfortunately Rekor is making changes that break clients and will likely continue to do so until it's GA. I added #958 and #959 to add docs on our use of Rekor and to better handle transient errors.

This also broke the sigstore-python release: https://github.com/sigstore/sigstore-python/actions/runs/3186405998/jobs/5196979993

Can I get a rough ETA on when the new version of the action will be cut? Trying to decide if I should roll back provenance generation to get the release out or just wait a bit.

I'm not sure I can commit to anything personally but I'll see if we can get a release out by EOW.

@di
Copy link
Member

di commented Oct 5, 2022

Just wanted to get a sense of whether it was n(hours, days, weeks). But I ended up reverting, so no pressure at all from me!

@jenstroeger
Copy link
Author

I think this conversation illustrates the importance and difficulty of deep supply chain integrity. My workflow failed because the SLSA provenance generator failed because the Rekor service failed…

And the above discussions raise a few more points that should be considered:

  • Does Rekor version its own API, so that tools (like this SLSA provenance generator) can build against a stable service even if/when Rekor changes?
  • Does this SLSA provenance generator bump its own version according to Rekor’s changes, if they impact end-users? What other external services does SLSA itself depend on?

Unfortunately Rekor is making changes that break clients and will likely continue to do so until it's GA.

@ianlewis I think GA is no safeguard against breakage. I’d argue that this SLSA provenance generator should be tolerant to Rekor failures beyond GA. For example @asraa’s proposed fix above uses an undocumented feature, which should probably be documented for users because it decouples from yet another level of dependencies.

Further still, it may even make sense to make compile-generator: true the default because it avoids that extra dependency on Rekor. Personally, I prefer a slower self-contained action over a faster one with dependencies on a potentially failing external service (thus jeopardizing stability). It’s kind-of ironic that a tool checking software supply chain itself depends on a more complex supply chain.

@ianlewis
Copy link
Member

ianlewis commented Oct 6, 2022

I think this conversation illustrates the importance and difficulty of deep supply chain integrity. My workflow failed because the SLSA provenance generator failed because the Rekor service failed…

I agree. We haven't documented well the guarantees that are being provided by using the workflow including dependencies of the workflow like Rekor.

And the above discussions raise a few more points that should be considered:

  • Does Rekor version its own API, so that tools (like this SLSA provenance generator) can build against a stable service even if/when Rekor changes?

Currently it doesn't Rekor isn't providing a lot of API guarantees to clients like us. We are currently working on organizing and providing feedback to the sigstore project regarding this stability.

  • Does this SLSA provenance generator bump its own version according to Rekor’s changes, if they impact end-users? What other external services does SLSA itself depend on?

Rekor deploys new versions of the public server independently of slsa-github-generator. The sigstore team doesn't currently have a way to notify clients like us when there are breaking changes. Long term, I don't think it's feasible for all Rekor clients to coordinate and upgrade every time there is a new version of the server.

Unfortunately Rekor is making changes that break clients and will likely continue to do so until it's GA.

@ianlewis I think GA is no safeguard against breakage. I’d argue that this SLSA provenance generator should be tolerant to Rekor failures beyond GA. For example @asraa’s proposed fix above uses an undocumented feature, which should probably be documented for users because it decouples from yet another level of dependencies.

Part of the reason why it's not documented is because it's not really intended to be used under normal circumstances because it lowers the security provided by the workflow. This is because it allows building from unreleased/untested versions of the workflow (e.g. from HEAD). We only have the input there at all in order to support our pre-submit tests. @asraa suggested it because it's an escape hatch that can unblock users temporarily but it's really an internal API.

While we would like to be resilient to Rekor failures we do need it to create records in the transparency log so that it can be used during verification. #34, #958, #959 are issues we are tracking currently to allow users to mitigate the issue but I think that the core problem won't ultimately be solved without better API guarantees for Rekor's API.

Further still, it may even make sense to make compile-generator: true the default because it avoids that extra dependency on Rekor. Personally, I prefer a slower self-contained action over a faster one with dependencies on a potentially failing external service (thus jeopardizing stability). It’s kind-of ironic that a tool checking software supply chain itself depends on a more complex supply chain.

While it avoids a dependency on Rekor when verifying the builder binaries used by the workflow, it doesn't completely remove the dependency on Rekor because it still needs to be used to create new entries in the transparency log. As mentioned above it also is a security trade off which I don't really want to recommend to users.

@behnazh-w
Copy link
Contributor

This is because it allows building from unreleased/untested versions of the workflow (e.g. from HEAD)

As a new feature, how about allowing building from source with the correct version? Would there still be any security concerns?

@di
Copy link
Member

di commented Oct 13, 2022

Is there some way to follow along to be notified when this is fixed/released? A different issue that hasn't already been closed, maybe?

Edit: Is it #968?

@asraa
Copy link
Collaborator

asraa commented Oct 13, 2022

I’d argue that this SLSA provenance generator should be tolerant to Rekor failures beyond GA.

Two things we've prioritized on this front for the next quarter:

  1. Supporting offline validation; so in case there's a Rekor failure or breakage, we're robust to verification and will continue doing so. [feature] Support offline attestation verification: .sigstore file or persisted SET  #716
  2. Supporting testing against staging to block Rekor releases in case they break us. [feature] Support a configurable Rekor URL #372

While providing a security tool though, we also did require the most robust security verification (with an online verification from Rekor directly) like Ian mentioned. I can see the argument both ways though, and reducing its reliance on an online service or transparency log is helpful for the future.

This is because it allows building from unreleased/untested versions of the workflow (e.g. from HEAD). We only have the input there at all in order to support our pre-submit tests.

@ianlewis @laurentsimon can we support a build from the tag checkout using a secure-checkout that ensures that tag resolves to a safe/attested SHA? This is similar to @behnazh-w suggestion.

Edit: Is it #968?

Yes, I'll add a note on release of v1.3.1 in that title.

@jenstroeger
Copy link
Author

Looks like people keep running into this problem (and post on the SLSA Slack workspace) which hasn’t actually been resolved yet. To also address @di’s request above — could we please reopen this issue until it’s actually been addressed?

@ianlewis ianlewis reopened this Oct 14, 2022
@ianlewis
Copy link
Member

We can keep it open until we've done the release. I plan on going through the release process today. To make a long story short the release process we have is complicated and has a number of unrelated issues which blocked us for a while.

@laurentsimon
Copy link
Collaborator

I’d argue that this SLSA provenance generator should be tolerant to Rekor failures beyond GA.

Two things we've prioritized on this front for the next quarter:

  1. Supporting offline validation; so in case there's a Rekor failure or breakage, we're robust to verification and will continue doing so. [feature] Support offline attestation verification: .sigstore file or persisted SET  #716

+1 to this solution. Some users have complained about the time it takes to build if compile-builder is set to true by default.

@ianlewis @laurentsimon can we support a build from the tag checkout using a secure-checkout that ensures that tag resolves to a safe/attested SHA? This is similar to @behnazh-w suggestion.

Should be do-able.

@di
Copy link
Member

di commented Oct 20, 2022

I see https://github.com/slsa-framework/slsa-github-generator/releases/tag/v1.2.1 exists, should this be closed?

@laurentsimon
Copy link
Collaborator

Let's close

@ianlewis
Copy link
Member

@jenstroeger We finally made a release with the fix for this. Please use v1.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:generic Issue with the generic generator type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants