Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: fix inclusion proof verification flake #956

Merged
merged 2 commits into from Aug 9, 2022

Conversation

asraa
Copy link
Contributor

@asraa asraa commented Aug 5, 2022

Signed-off-by: Asra Ali asraa@google.com

Fixes:

Summary

This fixes a flakey problem with Rekor entry verification.

Every once in a while, we receive errors in validating the inclusion proof from Rekor:

it may either be an error verifying the inclusion proof:

 validating log entry 37: verifying inclusion proof: calculated root:
[73 180 215 55 7 58 247 212 29 155 130 228 142 72 218 20 131 130 3 179 61 150 19 180 243 114 50 157 177 182 130 69]
 does not match expected root:
[237 220 193 22 243 233 200 112 169 46 110 170 96 240 121 161 104 148 224 127 132 219 64 234 88 87 57 75 232 144 114 197]

or may be caught earlier than that with incorrect inclusion proof size:

2022/08/04 15:00:38 validating log entry 25: verifying inclusion proof: wrong proof size 3, want 4

This happens because we

  1. get the current tree size X
  2. ANOTHER client uploads an entry increasing the tree size to Y and then
  3. trillian returns an inclusion proof for the index for tree size X.

See code here

The trillian response for the inclusion proof includes a SignedLogRoot for the tree size at size Y.
https://github.com/google/trillian/blob/44841b0bad99d6b7ed5ab20ff24cfa5ca6add9d3/trillian_log_api.proto#L231-L235

The Rekor server validates the inclusion proof at size X, returning the proof response successfully.

if err := proof.VerifyInclusion(rfc6962.DefaultHasher, uint64(index), root.TreeSize, resp.GetLeaf().MerkleLeafHash, resp.Proof.Hashes, root.RootHash); err != nil {

Rekor returns the proof response with the SignedLogRoot at tree size Y, client attempts to validate with this. This errors out.

return logEntryFromLeaf(ctx, api.signer, tc, leaf, result.SignedLogRoot, result.Proof, tid, api.logRanges)

This PR returns the SignedLogRoot for the tree size in the requested proof.

On a side note, I dislike very much that Trillian's proof response does not contain the requested tree size X.

Release Note

  • bug fix: fix inclusion proof response in Rekor server that causes flakey verification problems

Documentation

@asraa asraa requested a review from a team as a code owner August 5, 2022 21:13
@asraa asraa force-pushed the demonstrate-verification-problem branch from a5340bf to 94eff70 Compare August 5, 2022 21:14
Signed-off-by: Asra Ali <asraa@google.com>

update with fix

Signed-off-by: Asra Ali <asraa@google.com>

fix with root resp

Signed-off-by: Asra Ali <asraa@google.com>

fix

Signed-off-by: Asra Ali <asraa@google.com>

fix

Signed-off-by: Asra Ali <asraa@google.com>

fix

Signed-off-by: Asra Ali <asraa@google.com>

update

Signed-off-by: Asra Ali <asraa@google.com>
@asraa asraa force-pushed the demonstrate-verification-problem branch from 94eff70 to 675eed9 Compare August 5, 2022 21:17
@dlorenc
Copy link
Member

dlorenc commented Aug 5, 2022

Amazing find!

@haydentherapper
Copy link
Contributor

This is really fantastic, great work!

Can you add a comment briefly summarizing this design decision in the client so we have context for why it's designed this way?

Copy link
Member

@bobcallaway bobcallaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch and RCA - just some recommendations to reuse your helper function here.

any thoughts on how we could add regression testing for this?

pkg/api/trillian_client.go Outdated Show resolved Hide resolved
pkg/api/trillian_client.go Outdated Show resolved Hide resolved
@asraa
Copy link
Contributor Author

asraa commented Aug 8, 2022

any thoughts on how we could add regression testing for this?

Adding a regression test that fires two concurrent goroutine uploading and verifying. I'm able to repro the failure at HEAD, and fix here. I had to use 50 uploads/verifies, although I suspect the number can be less.

Signed-off-by: Asra Ali <asraa@google.com>
@codecov-commenter
Copy link

Codecov Report

Merging #956 (ba5726e) into main (102dc64) will increase coverage by 0.14%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #956      +/-   ##
==========================================
+ Coverage   48.20%   48.35%   +0.14%     
==========================================
  Files          61       61              
  Lines        5383     5383              
==========================================
+ Hits         2595     2603       +8     
+ Misses       2506     2500       -6     
+ Partials      282      280       -2     
Impacted Files Coverage Δ
pkg/types/rekord/v0.0.1/entry.go 49.49% <0.00%> (+0.66%) ⬆️
pkg/types/alpine/v0.0.1/entry.go 56.72% <0.00%> (+2.52%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@bobcallaway bobcallaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @asraa!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants