Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use standard IPLD codec names across the CLI/HTTP API #8471

Closed
aschmahmann opened this issue Sep 30, 2021 · 30 comments · Fixed by #8568
Closed

Use standard IPLD codec names across the CLI/HTTP API #8471

aschmahmann opened this issue Sep 30, 2021 · 30 comments · Fixed by #8568
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature kind/maintenance Work required to avoid breaking changes or harm to project's status quo P1 High: Likely tackled by core team if no one steps up topic/api Topic api topic/docs Documentation
Milestone

Comments

@aschmahmann
Copy link
Contributor

aschmahmann commented Sep 30, 2021

Update 11/26 (@schomatis):

For review: #8568.

Working in draft branch with working notes and annotated code.

First attempt at adding the multicodec option for block put in ipfs/interface-go-ipfs-core#80.

Need to confirm this issue's description is in sync with conclusions from thread here and other multicodec issues.


While the recent #7976 related changes were a big step forwards in terms of both using the standardized codec names from https://github.com/multiformats/multicodec/blob/master/table.csv and making better use of the codecs they've left a couple of our APIs as confusing.

ipfs dag put and ipfs dag get both use codec names correctly as pulled from the table above. However, ipfs block put and ipfs cid codecs both use the older set of names defined in https://github.com/ipfs/go-cid/blob/5640b0169f6284562b1b8636b10e32dd85ea0a8a/cid.go#L85-L112.

We should switch both of those commands to use the correct names.

Some recommendations for particulars about the changes:

  • ipfs cid codecs should contain a --supported flag that lists which codecs are known to go-ipfs per Enumerate available encodings for dag get #8171 (review). This has some tradeoffs if different commands support different subsets of codecs, but for now this seems reasonable.
  • ipfs block put --format should probably be kept and deprecated alongside a new ipfs block put --store-codec and if both flags are used the user should receive an error. This would allow us to not break existing users, but still move forward.
    • Unlike with ipfs dag put where there was a backwards compatible changes anyway it seems useful to not break users here since we shouldn't have to.
@aschmahmann aschmahmann added the kind/enhancement A net-new feature or improvement to an existing feature label Sep 30, 2021
@lidel lidel added kind/maintenance Work required to avoid breaking changes or harm to project's status quo P1 High: Likely tackled by core team if no one steps up labels Sep 30, 2021
@lidel lidel added this to Next Release Backlog in Go IPFS Roadmap Sep 30, 2021
@lidel
Copy link
Member

lidel commented Sep 30, 2021

👍 + when this is done we should also do a quick grep over content at https://docs.ipfs.io and https://proto.school

@lidel lidel added topic/api Topic api topic/docs Documentation labels Sep 30, 2021
@BigLep BigLep moved this from Next Release Backlog to In Progress in Go IPFS Roadmap Oct 1, 2021
@BigLep BigLep added this to In Progress in Maintenance Priorities - Go Oct 1, 2021
@BigLep BigLep added this to the go-ipfs 0.11 milestone Oct 1, 2021
@BigLep BigLep moved this from In Progress to Current Release Backlog in Go IPFS Roadmap Oct 5, 2021
@schomatis
Copy link
Contributor

Looking into this.

@schomatis
Copy link
Contributor

@aschmahmann Looking at the block put command: we don't set the CID prefix directly as dag put does but go through the github.com/ipfs/interface-go-ipfs-core abstraction (which I'm not sure what's its purpose nor see much documentation about).

Should the added --store-codec just bypass all this and construct the CID prefix locally as dag put does? (As you said this would go in parallel with the current --format option which would still be using interface-go-ipfs-core; the two being mutually exclusive.)

@aschmahmann
Copy link
Contributor Author

@schomatis I suspect there are some changes to be made both in go-ipfs and in interface-go-ipfs-core. Generally interface-go-ipfs-core is meant as an abstraction that can be used without coupling too tightly to the underlying IPFS implementation (e.g. https://github.com/ipfs/go-ipfs-http-client uses it as well).

Some things that are probably needed here:

  • To support ipfs cid codecs we need a list of all the IPLD codecs. This list can either be manually generated (as it is in go-cid), or a likely better approach is to modify https://github.com/multiformats/go-multicodec to allow retrieving codes by type (e.g. the IPLD type). WDYT @mvdan ?
    • This could then be leveraged inside of go-cid to generate the list of CIDs on init by depending on go-multicodec
  • To support ipfs cid codecs --supported we can for now leverage the ipld-prime global codec registry https://github.com/ipld/go-ipld-prime/blob/b9a89e847312334af141788b303fd78b4f076fe3/multicodec/defaultRegistry.go#L23
  • Special support around handling the old name types is needed (particular attention to be paid to the names v0, protobuf, cbor). Supporting these names in interface-go-ipfs-core by having two options there (format and store-codec) is ok, although it's also fair game to make interface-go-ipfs-core and go-ipfs-http-client make a breaking change to only take the new format and push the legacy handling to the commands in go-ipfs itself since changing types in Go is an easy compile time error, whereas changing bash scripts or HTTP requests only give run time errors.

@mvdan
Copy link
Contributor

mvdan commented Nov 10, 2021

go-multicodec currently does not have a way to enumerate all known codecs, but we could certainly look into that. I've raised multiformats/go-multicodec#58

It's worth noting that the ipld tag in the multicodec table is broad. It includes CID versions as well as IPLD codecs - some of which are supported by go-ipld-prime out of the box, and some which require external implementations.

@aschmahmann
Copy link
Contributor Author

It's worth noting that the ipld tag in the multicodec table is broad. It includes CID versions as well as IPLD codecs

Thanks for the flag here. I wonder if it's worth adding any more differentiating information into the type? Probably not a big deal since we can just manually ignore the CID codes for now, but using ipld-codec doesn't seem too wacky.

some of which are supported by go-ipld-prime out of the box, and some which require external implementations.

Yep, this is fine. I'm thinking there's some value in figuring out what's a valid IPLD codec separate from what's available. This is what go-cid currently does https://github.com/ipfs/go-cid/blob/28f4a5eab6c28351e4c0c3769a258dd779531ac7/cid.go#L85. Separately we'll want to list what codecs are available, but those can be pulled from the go-ipld-prime registry (whether global or a local one).

@mvdan
Copy link
Contributor

mvdan commented Nov 10, 2021

Thanks for the flag here. I wonder if it's worth adding any more differentiating information into the type? Probably not a big deal since we can just manually ignore the CID codes for now, but using ipld-codec doesn't seem too wacky.

I assume you mean altering the multicodec table in some form. I don't think we should teach go-multicodec about extra information that isn't on the table, because then we'd start getting into the Go implementation having out-of-spec information that other languages might not have or might differ on.

@schomatis
Copy link
Contributor

@aschmahmann Why didn't we use the interface-go-ipfs-core in dag put but we do in block put? (Not challenging the decision but wanting to better understand the purpose and rationale behind that interface to know how to best apply it.)

@schomatis
Copy link
Contributor

Got delayed with sharding, resuming this. Will start pushing some code annotations to evaluate how to move forward with this. There are still some rough edges around the multicodec table I'm wrapping my head around.

@schomatis
Copy link
Contributor

@aschmahmann Follow-up to #8471 (comment): it seems dag put does not accept v0 anymore, is this intended? If this is the case, are we dropping support for v0 in block put?

@aschmahmann
Copy link
Contributor Author

it seems dag put does not accept v0 anymore, is this intended?

Yes, we were forced to make sufficient breaking changes there that this made sense.

If this is the case, are we dropping support for v0 in block put?

No, since we don't have to make any breaking changes to ipfs block put we shouldn't do them.

@aschmahmann
Copy link
Contributor Author

@mvdan perhaps a gotcha in this plan is that 0x0200 is json which is labeled as serialization rather than ipld.

Some options:

  1. Ditch this plan of using the "type" in the table (and likely go full manual)
  2. Change 0x0200 to be ipld
  3. Have the place where we use the logic (e.g. go-cid) manually add 0x0200 and hope nobody else misuses the code table to assume the ipld tag is used for all IPLD codecs
  4. Allow for multiple tags (e.g. serialization,ipld in the table)
  5. Any better ideas?

@mvdan
Copy link
Contributor

mvdan commented Nov 18, 2021

@rvagg and @vmx touched a bit on serialization vs ipld here: multiformats/multicodec#242 (comment)

I don't think we want to change json to be tag == "ipld" for "IPLD codec", because it isn't - you can't just take any IPLD node and roundtrip it through json and expect it to work like any other codec. There's no support for links, for example.

As for your options - I think it depends on what you want. Do you want to allow all known IPLD codecs? Then filter by tag == "ipld". Do you also want to allow encodings that aren't full IPLD codecs, like json? Then filter by tag == "ipld" || tag == "serialization". I imagine you want the former, unless I'm missing some reason why json must be supported as well as dag-json.

Going full manual is always an option, and might be the best in terms of "only accept codec multicodecs that we implement". Again this depends on what go-ipfs wants :) But I think we'd all like to avoid the manual approach in the name of not having to keep a separate list up to date, especially as ipld-prime or other modules might register more codec implementations in the future.

@mvdan
Copy link
Contributor

mvdan commented Nov 18, 2021

TL;DR if you do indeed want to allow json, I would go with your option 4.

@rvagg
Copy link
Member

rvagg commented Nov 18, 2021

Aside: I think I disagree with @vmx on this particular point, json is a valid ipld codec an I'd prefer to list it as such in the table. It's a hobbled codec and can't do bytes or links, but dag-pb is also a hobbled codec, and dag-json is also a hobbled codec (can't do some maps in the '/' key reserved space) and Eric will tell you all about how our map key sorting makes both dag-json and dag-cbor hobbled codecs that can't represent arbitrary user-meaningful map ordering.

IMO if it can meaningfully go in a CID then we could (not "should") call it an ipld codec and list it as such in the table. This serialization thing makes more sense for protobuf which we can't do anything with in IPLD.

@mvdan
Copy link
Contributor

mvdan commented Nov 18, 2021

@rvagg good point, consider me convinced. I'd love for us to reach an agreement there this week and merge a few tiny PRs into the multicodec table, so we can unblock this thread :)

@vmx
Copy link
Member

vmx commented Nov 18, 2021

Aside: I think I disagree with @vmx on this particular point, json is a valid ipld codec an I'd prefer to list it as such in the table

Nope. We agree here :) I also think json is an valid ipld codec. It is not implementing the full data model, but we cannot expect that from a codec. I e.g. want to write a GeoTIFF IPLD ocdec for a long time. It surely won't support the full data model.

@mvdan
Copy link
Contributor

mvdan commented Nov 18, 2021

If either of you send a multicodec PR, I'll approve that. Are there any other serialization entries that should be ipld?

@schomatis
Copy link
Contributor

@aschmahmann Could you please make sure any new conclusions from this discussion is reflected in the issue description, please? Alternatively we can leave this issue for general discussions on the topic and create a new tracking issue for the work I need to land.

@rvagg
Copy link
Member

rvagg commented Nov 19, 2021

@schomatis
Copy link
Contributor

Update 11/19 (@schomatis):

Working in draft branch with working notes and annotated code.

First attempt at adding the multicodec option for block put in ipfs/interface-go-ipfs-core#80.

@mvdan
Copy link
Contributor

mvdan commented Nov 19, 2021

As soon as multiformats/multicodec#244 is merged, I'll update go-multicodec as per multiformats/go-multicodec#58 (comment). That should be what go-ipfs needs here, I think.

mvdan pushed a commit to multiformats/multicodec that referenced this issue Nov 22, 2021
This change moves us toward the "ipld" tag meaning something like
"can logically be used as a codec code in a CID";
with the recognition that this is a bit squishy and there is some amount
of gentle abuse around the edges of CIDs which we willingly turn a blind eye to.

Ref: #242
Ref: ipfs/kubo#8471
@guseggert guseggert mentioned this issue Nov 23, 2021
80 tasks
@BigLep BigLep modified the milestones: go-ipfs 0.11, go-ipfs 0.13 Nov 23, 2021
@mvdan
Copy link
Contributor

mvdan commented Nov 23, 2021

As soon as multiformats/multicodec#244 is merged, I'll update go-multicodec as per multiformats/go-multicodec#58 (comment). That should be what go-ipfs needs here, I think.

PR now up at multiformats/go-multicodec#59; reviews welcome.

@schomatis
Copy link
Contributor

Update 11/26 (@schomatis):

For review: #8568.

@BigLep BigLep moved this from Current Release Backlog to In Progress in Go IPFS Roadmap Dec 2, 2021
@aschmahmann
Copy link
Contributor Author

TODO: Let's deprecate the codec maps in go-cid and create aliases to the newer functions in go-multicodec since they can be updated automatically by codegen.

@BigLep BigLep moved this from In Progress to In Review in Go IPFS Roadmap Jan 5, 2022
@BigLep
Copy link
Contributor

BigLep commented Jan 5, 2022

2022-01-05: there is a PR to look at (#8568 ) and the request to make sure the description is accurate (#8471 (comment) )

@BigLep
Copy link
Contributor

BigLep commented Mar 11, 2022

2022-03-11 conversation: we need to decide on the approach especially in light of the 0.12 migration. @aschmahmann will give input here. That may mean doing something different than ipfs/interface-go-ipfs-core#80, but we'll wait for Adin comments.

@lidel
Copy link
Member

lidel commented Apr 21, 2022

cid codecs and block put fixed in #8568, scheduled to ship in go-ipfs 0.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature kind/maintenance Work required to avoid breaking changes or harm to project's status quo P1 High: Likely tackled by core team if no one steps up topic/api Topic api topic/docs Documentation
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants