Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an efficient API to check whether a CID has IDENTITY multihash code #133

Closed
2 tasks done
masih opened this issue Sep 21, 2021 · 3 comments
Closed
2 tasks done
Assignees
Labels
need/triage Needs initial labeling and prioritization

Comments

@masih
Copy link
Member

masih commented Sep 21, 2021

CIDs with multihash code IDENTITY typically require special handling when encountered in blockstores. This is because, such CIDs contain the data within themselves; the data is simply the multihash digest of that CID, since multihash code IDENTITY corresponds to copy hash function.

To handle them gracefully checks are needed to indicate whether a given CID has IDENTITY code or not, and checks would have to run for almost all operations on blockstore API. It is therefore, highly desirable to check as efficiently as possible.

The current APIs offered provide two ways to perform the check:

  1. cid.Prefix().MhType
  2. decode of cid.Hash() via go-multihash API to extract the code

Blockstore implementations would benefit from an API that checks whether a given CID or digest of a CID has IDENTITY code in a "fail-fast" manner. This is where the check would return as fast as possible if a CID is not an IDENTITY without checking for the validity of the CID first, then decoding digest, then comparing multihash code.

The rationale for a "fail-fast" check is:

  1. if a CID does not have IDENTITY multihash code, it doesn't always need to be fully decoded in order for a block to be returned (e.g. when CID is used as key in a map)
  2. the majority of CIDs interacted with are not IDENTITY therefore we want to pay the price of decoding only when we have to, and certainly not for every call to blockstore.

I therefore propose to:

  • Write benchmarks that compare the efficiency of the current APIs when checking for IDENTITY code.
  • Provide an alternative API that aims to improve efficiency for the checks.
@masih masih added the need/triage Needs initial labeling and prioritization label Sep 21, 2021
@welcome

This comment has been minimized.

@BigLep BigLep added this to In Progress in Maintenance Priorities - Go Sep 21, 2021
masih added a commit that referenced this issue Sep 24, 2021
Implement a fail-fast function that checks whether the code of a CID
is `multihash.IDENTITY` or not.

Add benchmarks that compare three ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Has()`
3. The new `Cid.IsIdentity()` API

Fixes #133
@BigLep BigLep moved this from In Progress to In Review in Maintenance Priorities - Go Sep 24, 2021
@masih
Copy link
Member Author

masih commented Sep 27, 2021

As shown by benchmarks in #134 the gains in comparison with using the existing Cid.Prefix are small.

This means that users who wish to check for IDENTITY should use Cid.Prefix since it is more efficient than multihash.Decode.

@masih masih closed this as completed Sep 27, 2021
Maintenance Priorities - Go automation moved this from In Review to Done Sep 27, 2021
masih added a commit that referenced this issue Sep 27, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Has()`

Relates to #133
@masih
Copy link
Member Author

masih commented Sep 27, 2021

To document the efficiency of existing APIs for IDENTITY check, benchmarks are added in #135

masih added a commit that referenced this issue Sep 27, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Hash()`

Relates to #133
masih added a commit that referenced this issue Sep 27, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Hash()`

Relates to #133
masih added a commit that referenced this issue Nov 8, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Hash()`

This benchmark illustrates that using Cid.Prefix is efficient than
`multihash.Decode`. Users wishing to perform such a check should use
`Cid.Prefix`.

Consider that `Cid.Prefix` is already efficient enough and gains are
likely small if a dedicated API for performing this check to be
introduced.

Relates to #133
masih added a commit that referenced this issue Nov 8, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Hash()`

This benchmark illustrates that using Cid.Prefix is efficient than
`multihash.Decode`. Users wishing to perform such a check should use
`Cid.Prefix`.

Consider that `Cid.Prefix` is already efficient enough and gains are
likely small if a dedicated API for performing this check to be
introduced.

Relates to #133
masih added a commit that referenced this issue Nov 8, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Hash()`

This benchmark illustrates that using Cid.Prefix is efficient than
`multihash.Decode`. Users wishing to perform such a check should use
`Cid.Prefix`.

Consider that `Cid.Prefix` is already efficient enough and gains are
likely small if a dedicated API for performing this check to be
introduced.

Relates to #133
masih added a commit that referenced this issue Nov 8, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Hash()`

This benchmark illustrates that using Cid.Prefix is more efficient than
`multihash.Decode`. Users wishing to perform such a check should use
`Cid.Prefix`.

Consider that `Cid.Prefix` is already efficient enough and introducing a
dedicated API for performing this check will likely result in small
gains.

Relates to #133
masih added a commit that referenced this issue Nov 8, 2021
Add benchmarks that compare two ways of checking for
`multihash.IDENTITY` code:
1. `Cid.Prefix().MhType`
2. Decode of `Cid.Hash()`

This benchmark illustrates that using Cid.Prefix is more efficient than
`multihash.Decode`. Users wishing to perform such a check should use
`Cid.Prefix`.

Consider that `Cid.Prefix` is already efficient enough and introducing a
dedicated API for performing this check will likely result in small
gains.

Relates to #133
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant