Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Update MultihashHasher interface so it could support multiple hashing algorithms #252

Open
Gozala opened this issue Apr 18, 2023 · 4 comments

Comments

@Gozala
Copy link
Contributor

Gozala commented Apr 18, 2023

Goal

As per ipld/js-car#123 we do need a solution for verifying hashes in CAR files without shipping all the MultihashHasher-s with the CAR library.

Proposal

  1. Introduce backwards compatible change to the MultihashHasher by introducing second OPTIONAL code argument to the digest method.
    • If omitted it would default to own code.
    • If different code is passed it would error.
  2. Rename MultihashHasher interface to MultihashHasherCase and repurpose it for single algorithm use cases.
  3. Introduce multi algorithm MultihashHasherVariant interface with the same digest method and a method that returns map of MultihashHasherCase it's comprised of.
  4. Define MultihashHasher as union type discriminated by code field.

Here is the the sketch:

interface MultihashHasherCase<Code extends number = number> {
    code: Code
    digest(bytes: Uint8Array, code?: Code): Digest<Code>
}

interface MultihashHasherVariant<Code extends number = number> {
    // We define `code` as optional which can never have a value. This will allow
    // us to `code` as a discriminant in the union
    code?: never

    cases(): Record<Code, MultihashHasherCase<Code>>

    digest <CodeCase extends Code> (bytes: Uint8Array, code?: CodeCase): Digest<CodeCase>
    
    or <Case extends number> (hasher: MultihashHasher<Case>): MultihashHasherVariant<Code|Case>
}

type MultihashHasher<Code extends number = number> =
    | MultihashHasherCase<Code>
    | MultihashHasherVariant<Code>

Note that above makes current MultihashHasher-s compatible with proposed MultihashHasher type that is to say code that will require new type will accept all existing hashers without changes to them.

  • With such a type in place js-car library would be able to require MultihashHasher parameter in order to be able to verify digests.
  • Added MultihashHasherVariant would allow composing many hashers into one.
@Gozala Gozala changed the title Proposal: Define and implement MultihashVerifier interface Proposal: Update MultihashHasher interface so it could support multiple hashing algorithms Apr 18, 2023
@rvagg
Copy link
Member

rvagg commented Apr 24, 2023

OK, I like the direction, the details are worth discussing though:

  1. I don't think this is actually backward compatible, is it? Current MultihashHasher looks like this:
export interface MultihashHasher<Code extends number = number> {
  digest: (input: Uint8Array) => Promise<MultihashDigest<Code>> | MultihashDigest<Code>
  name: string
  code: Code
}

export interface SyncMultihashHasher<Code extends number = number> extends MultihashHasher<Code> {
  digest: (input: Uint8Array) => MultihashDigest<Code>
}

Did you just leave off the Promise return and name for brevity, would they be in the final form? And would MultihashHasherVariant get a name?

  1. Why do we really need the two separate forms? Could we just make them all composable with an or? How would this work when you're starting off with nothing, you have to have an empty Variant that you can or(hasher) on? And what do you see as the utility of cases()? Do we need that?

  2. Could we work on the naming a bit? "Case" is a bit too out of place for a JS/TS code isn't it and is likely to be something that folks trip over. If we could come up with something a bit more idiomatic and less Rusty then it might be easier for outsiders to wrangle. Even "Variant" is a bit odd. I reach for words like "Impl", "Alg", "Concrete", "Actual", "Single" for what you're calling "Case". And "Variant" is more of a "Bundle", "Collection", "Group", "Multi". I'm having a hard time seeing documentation explain this in a straightforward way with the current naming, without having to also explain the naming.

  3. Sync version, do we need to work on one as well to cover all of this? Is that going to end up multiplying the pain of implementing hashers?

@Gozala
Copy link
Contributor Author

Gozala commented Apr 24, 2023

Did you just leave off the Promise return

That was an accident, I didn’t mean to change return type

and name for brevity

We can keep name field, but I do find it on the interface unnecessary and would suggest dropping unless we find it’s actually needed.

Why do we really need the two separate forms?

Every time I try to hide distinction I find myself regretting it for one reason or the other. I think having two distinct when you care and third unified (type union) when you don’t is the best compromise.

?Could we just make them all composable with an or?

We could impose or on MultihashHasherCase but that would imply every hasher alg now needs to implement that as well or import utility. I think exposing empty variant from this lib is a better option. That way external defs only need to implement one function and allow user to import this lib if they need to compose. If they want to implement variant as well they’re free to do so as well

How would this work when you're starting off with nothing, you have to have an empty Variant that you can or(hasher) on? And what do you see as the utility of cases()? Do we need that?

so you could decompose it into individual hashers. Also makes it possible to have or static function that is variant implementation agnostic

@Gozala
Copy link
Contributor Author

Gozala commented Apr 24, 2023

Could we work on the naming a bit? "Case" is a bit too out of place for a JS/TS code isn't it and is likely to be something that folks trip over. If we could come up with something a bit more idiomatic and less Rusty then it might be easier for outsiders to wrangle. Even "Variant" is a bit odd. I reach for words like "Impl", "Alg", "Concrete", "Actual", "Single" for what you're calling "Case". And "Variant" is more of a "Bundle", "Collection", "Group", "Multi". I'm having a hard time seeing documentation explain this in a straightforward way with the current naming, without having to also explain the naming.

Sure I’m fine with whatever names. Out of proposed ones I’d pick

ConcreteMultihashHasher and CompositeMultihashHasher

P.S.: I would use Multi prefix if it was not overloaded in this context

@Gozala
Copy link
Contributor Author

Gozala commented Apr 24, 2023

Sync version, do we need to work on one as well to cover all of this?

Yes, I just left out to reduce noise

Is that going to end up multiplying the pain of implementing hashers?

I don’t think it would even though it would multiply number of types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants