Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve broken cid.Builder testing for CidBuilder #93

Merged
merged 3 commits into from
Nov 23, 2022

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Oct 31, 2022

If you don't use something derived from a cid.Prefix then it'll test execute your hasher to make sure it doesn't error.

The reason for this is to avoid more cases where a panic could occur when encoding a ProtoNode.

@rvagg rvagg requested a review from Jorropo October 31, 2022 12:21
node.go Outdated Show resolved Hide resolved
Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the changes suggested to checking Sum, should we also sanity check values returned by GetCodec and WithCodec? Is there a fixed set of codecs/behaviours we expect the builder to provide when those two functions are called?

If so, let's assert those too while at it.

merkledag_test.go Outdated Show resolved Hide resolved
node.go Outdated Show resolved Hide resolved
node.go Outdated
} else {
// have to test it's a usable hasher directly
// this is only a basic check, there are still ways it may break
_, err := builder.Sum([]byte{0})
Copy link
Member

@masih masih Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @Jorropo's suggestion on using an array here.

Thinking out loud I also wonder if we can do better:
Disclaimer: I am not fully familiar with how this API is used.
We could get the digest value back, read the multihash code from the beginning and call checkHasher function from my earlier suggestion to see if the multi hash code fits?

@rvagg rvagg force-pushed the rvagg/better-broken-cid-builder-testing branch 2 times, most recently from 064ec5a to d910fbc Compare November 22, 2022 03:44
@rvagg
Copy link
Member Author

rvagg commented Nov 22, 2022

@Jorropo @masih see second-to-latest commit incorporating feedback (d910fbc).

The latest commit is not essential and I'd really like feedback on this one before we go ahead with it—I've entirely removed the panic() possibilities in that commit, returning default values. The reason this is less than ideal is because the default values don't relate to the input data, they're just zero-blocks and zero-CIDs, so the error is hidden. The docs tell you how to do it "better" but I doubt that path will be used. But since we can't error on these APIs, perhaps default values are better than panics.


Some more discussion on SetCidBuilder and how rigorous we should be: this is, and should remain (I think) a flexible extension point for this API. Mostly we expect to receive a straightforward factory that can generate CIDs for us; but we should allow for some flexibility in what users might plug in here. An obvious example is the InlineBuilder which can come in various forms—it has conditional logic internally that will return a different type of multihash depending on what it's fed—by default it looks at the length to make that decision but that length is determined by the user and we see various sizes used in practice. It's also conceivable that there are use-cases where other conditional logic may inform different outputs and I don't think we should be too restrictive here in what we expect to be used.

With that in mind, we could go ahead and check the output of a Sum to make sure it's what we expect, but as long as we're doing that then we may as well remove the ability to set a custom builder and just make them set a hasher and CID version and do it ourselves; which removes this as an extension point.

There's an additional problem here that this is set for each instance of a ProtoNode, so if someone's building a large DAG then we're invoking this new checking logic each time a new block is built. For a builder that's not a cid.Prefix type, we're going to be calling Sum() for each new ProtoNode and then again when the actual CID needs to be created, so a CID requires a double hash. Which is not terrible, but it is additional overhead that wasn't there before.

My opinion on this is something like—there's a place here where you could provide a bad CidBuilder that could invoke an error because it's not quite right, but you could just as easily introduce one that has a func (panicBuilder) Sum([]byte) { panic("suckers"); }. I think it's on upstream users to make sure that there's some reasonableness check going on before feeding a builder into this API.

@rvagg rvagg requested a review from willscott November 22, 2022 04:47
node.go Outdated
func (n *ProtoNode) RawData() []byte {
out, err := n.EncodeProtobuf(false)
if err != nil {
panic(err)
return []byte{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • nil slice length is 0 I think returning nil is practically the same as returning an empty slice.

  • If this project already depends on ipfs/go-log maybe log the error since we can't return it?

Suggested change
return []byte{}
return nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it doesn't depend on go-log directly which is why I didn't do that here, but there's a few indirect uses of it and considering where this library will be used it's probably not going to be a big deal to include it; I've added some log entries for it, feels a bit better.

@willscott
Copy link
Contributor

the latest commit to avoid potential for panics seems okay -

  • should we also introduce a parallel method for Cid() that allows the return of an error in addition to the Cid, and deprecate the non-error variant such that we move to a world where we can communicate this failure state?

@rvagg
Copy link
Member Author

rvagg commented Nov 23, 2022

should we also introduce a parallel method for Cid() that allows the return of an error

we're dealing with legacy go-ipld-format etc. interfaces here so it's not going to be practical to introduce new APIs to that and work through all the uses of it, especially since we are trying to deprecate these! the EncodeProtobuf(false) call should be enough to check for errors and that's included in the docs for these API calls if folks want to check.

I've added log messages and returned nil from RawBytes(), although it does make me a bit nervous what potential that has for downstream users. Better than a panic hopefully.

rvagg and others added 3 commits November 24, 2022 08:42
If you don't use something derived from a cid.Prefix then it'll test
execute your hasher to make sure it doesn't error.

The reason for this is to avoid more cases where a panic could occur
when encoding a ProtoNode.

fix: apply code-review feedback

Co-authored-by: Masih H. Derkani <m@derkani.org>
@rvagg rvagg force-pushed the rvagg/better-broken-cid-builder-testing branch from e00a51a to af63637 Compare November 23, 2022 21:43
@github-actions
Copy link

Suggested version: v0.8.1
Comparing to: v0.8.0 (diff)

Changes in go.mod file(s):

diff --git a/go.mod b/go.mod
index 197ce60..2f8ca57 100644
--- a/go.mod
+++ b/go.mod
@@ -12,6 +12,7 @@ require (
 	github.com/ipfs/go-ipld-cbor v0.0.5
 	github.com/ipfs/go-ipld-format v0.3.0
 	github.com/ipfs/go-ipld-legacy v0.1.0
+	github.com/ipfs/go-log/v2 v2.5.1
 	github.com/ipld/go-codec-dagpb v1.3.1
 	github.com/ipld/go-ipld-prime v0.16.0
 	github.com/multiformats/go-multihash v0.2.1
@@ -33,7 +34,6 @@ require (
 	github.com/ipfs/go-ipfs-pq v0.0.2 // indirect
 	github.com/ipfs/go-ipfs-routing v0.2.1 // indirect
 	github.com/ipfs/go-log v1.0.5 // indirect
-	github.com/ipfs/go-log/v2 v2.3.0 // indirect
 	github.com/ipfs/go-metrics-interface v0.0.1 // indirect
 	github.com/ipfs/go-peertaskqueue v0.7.0 // indirect
 	github.com/ipfs/go-verifcid v0.0.1 // indirect
@@ -58,7 +58,7 @@ require (
 	github.com/libp2p/go-netroute v0.1.6 // indirect
 	github.com/libp2p/go-openssl v0.0.7 // indirect
 	github.com/libp2p/go-sockaddr v0.1.1 // indirect
-	github.com/mattn/go-isatty v0.0.13 // indirect
+	github.com/mattn/go-isatty v0.0.14 // indirect
 	github.com/miekg/dns v1.1.41 // indirect
 	github.com/minio/sha256-simd v1.0.0 // indirect
 	github.com/mr-tron/base58 v1.2.0 // indirect
@@ -78,10 +78,10 @@ require (
 	github.com/whyrusleeping/cbor-gen v0.0.0-20200123233031-1cdf64d27158 // indirect
 	go.uber.org/atomic v1.7.0 // indirect
 	go.uber.org/multierr v1.6.0 // indirect
-	go.uber.org/zap v1.16.0 // indirect
+	go.uber.org/zap v1.19.1 // indirect
 	golang.org/x/crypto v0.0.0-20220525230936-793ad666bf5e // indirect
 	golang.org/x/net v0.0.0-20211112202133-69e39bad7dc2 // indirect
-	golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1 // indirect
+	golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c // indirect
 	golang.org/x/text v0.3.6 // indirect
 	golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
 	google.golang.org/grpc v1.33.2 // indirect

gorelease says:

# diagnostics
required module github.com/microcosm-cc/bluemonday@v1.0.1 retracted by module author: Retract older versions as only latest is to be depended upon

# summary
Suggested version: v0.9.0

gocompat says:

(empty)

@rvagg rvagg merged commit 503ecaa into master Nov 23, 2022
@rvagg rvagg deleted the rvagg/better-broken-cid-builder-testing branch November 23, 2022 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants