Skip to content

Consider alternative to asCID signaling mechanism #213

Closed
@Gozala

Description

@Gozala
Contributor

As per #212 making asCID enumerable creates a problem, it also seems to be motivating factor for some of the work in #211.

For the background me and @mikeal have chosen cid.asCID === cid as a signaling mechanism that value is CID with following properties:

  1. Signal is retained when transferring value across the JS threads / realms because structural cloning respects self referentiality.
  2. It has very low collision chance - It is very unlikely that codec will be passed a data where cid.asCID === cid is true, while user was meant to pass something unrelated.

At the time we also considered following alternative cid['/'] === cid.bytes (which was inspired by DAG-JSON spec, but ultimately chose cid.asCID === cid because it seemed like it had even less collision opportunity.

Maybe given recent problems it is worth considering switch to cid['/'] === cid.bytes which would address those problems

Activity

achingbrain

achingbrain commented on Oct 18, 2022

@achingbrain
Member

Pulling the below out of #214 (comment) as it's relevant here:

After looking at dag-json spec I'm realizing that / supposed to point to CID string not bytes so maybe this is not ideal. However general point still is:

  1. We need an own (signalling) property with a very low collision probability.
  2. This property should be retained across JS realms (implying it can not be symbol or getter)

So basically cid.wellKnownPropertyName === cid.bytes could check both boxes above and harder it is to type property name less likely it is to collide.

achingbrain

achingbrain commented on Oct 18, 2022

@achingbrain
Member

Does wellKnownPropertyName have to reference bytes? If the property name is '/', can the value be a string instead to be consistent with dag-json?

If consistency with dag-json doesn't add any value, and wellKnownPropertyName has to have a low probability of collision, why not call it something really unidiomatic like ___cid_bytes___?

That said, I've experimented with cid['/'] === cid.bytes locally and it does seem to solve the problem, so I'm happy with that if other people are.

achingbrain

achingbrain commented on Oct 18, 2022

@achingbrain
Member

I've put my cid['/'] === cid.bytes experiment up in #217

achingbrain

achingbrain commented on Oct 18, 2022

@achingbrain
Member

@Gozala do you think you'll have time to look at this? It's blocking updating multiformats in @ipld/dag-cbor, @ipld/dag-json and @ipld/car, which is in turn blocking ipfs-unixfs, ipfs-repo and ipfs itself and I'd really like to get a new version shipped in time for IPFS Camp.

Gozala

Gozala commented on Oct 19, 2022

@Gozala
ContributorAuthor

@Gozala do you think you'll have time to look at this? It's blocking updating multiformats in @ipld/dag-cbor, @ipld/dag-json and @ipld/car, which is in turn blocking ipfs-unixfs, ipfs-repo and ipfs itself and I'd really like to get a new version shipped in time for IPFS Camp.

I have reviewed #217 and provided my feedback. I'd love all those things to upgrade to new multiformats, yet I wonder if there is a way we can unblock that without having to rush this.

I have tried to capture some of the problems I see with the way CID / Link interface is defined and provided sketch of the alternative that I think would work better. It also seems more aligned with discussions that took place elsewhere. I would love to at least get your and @rvagg input on that.

It also may be worth considering name different from ['/'] because it used to be that JS engines were not as well optimized for such property access and that again may have unfortunate performance implications.

If consistency with dag-json doesn't add any value, and wellKnownPropertyName has to have a low probability of collision, why not call it something really unidiomatic like cid_bytes?

I think that is reasonable as well, although I do think that representing CIDs simply as boxed Uint8Array's e.g. { CID: Uint8Array } may have low enough collision probability and does seem to make more sense than having two fields pointing to same Uint8Array.

Finally I also have been thinking of a way to avoid having to box Uint8Arrays as per #214 (comment) and if such solution is found I think that would be the best of all.

achingbrain

achingbrain commented on Oct 19, 2022

@achingbrain
Member

It also may be worth considering name different from ['/'] because it used to be that JS engines were not as well optimized for such property access and that again may have unfortunate performance implications.

I ran a quick test, just accessing / and __cid on an object in a loop 10e9 times and could see no difference, so good news there.

#214 sounds good for further discussion.

3 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @Gozala@achingbrain

      Issue actions

        Consider alternative to `asCID` signaling mechanism · Issue #213 · multiformats/js-multiformats