Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing api: clarify string in the protocol field #403

Closed
wants to merge 3 commits into from

Conversation

lidel
Copy link
Member

@lidel lidel commented Apr 21, 2023

This PR is based on discussions from last week and this notion.

Aims to clarify

  • what values can be put in the field to improve robustness
  • that protocols without multicodec registered should do to ensure global uniqueness of Protocol and Schema fields

and close #377

This clarifies what values can be put in the field to improve
robustness,  and that protocols without multicodec registered should
ensure global uniqueness.
@lidel lidel requested review from masih and guseggert April 21, 2023 16:12
Copy link
Contributor

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @masih - I think this is a reasonable compromise and language that IPNI software is happy to support.

@@ -54,10 +54,13 @@ Both read and write provider records have a minimal required schema as follows:

Where:

- `Protocol` is the multicodec name of the transfer protocol or an opaque string (for experimenting with novel protocols without a multicodec)
- `Protocol` is a globlly unique opaque string that represents the transfer protocol used by a content provider.
- If a multicodec has been registered for the protocol, it can be referred to by either its `name` or `code` in `0xHEX` notation. Using the `code` is preferred since it is what is sent on the wire and won't break if the plain text `name` label is ever changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly out of energy to care about what we name things, however a couple things I'd like to call out in relation to the fact that this discussion has been had multiple times in the last year alone and it's not obvious to me what's changed/changing.

  • The protocol in Routing V1 IIRC was initially using the code in the table, but that was scrapped because people wanted prettier names
  • When the protocol in Routing V1 was based on the code it was using decimal rather than hex since that's more native to JSON and saves on bytes
  • At the time that it was decimal it was also brought up that the same argument could be made about multiaddrs which are represented in Routing V1 as text rather than as representations of the binary encoding

So high level questions I'd like to see answered for when someone proposes RoutingV2 which ends up revisiting this again:

  1. Why are codes preferred over text?
  2. Why a string hex representation rather than something else like decimal?
  3. Why 0x rather than multibase and f?
  4. Why use text for multiaddrs but not data transfer?

To be clear, I don't feel strongly about the answers to any of these and I think there are viable answers for all of them. However, for when this inevitably ends up revisited in the future (we've done it at least twice in the last year or so) I'd like to have a comment I can point at where someone can respond with "we are doing it differently because in retrospect we disagree with the logic around decision X".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are codes preferred over text?

I think text names have been changed at least one time before (I think for json and dag-json, or something like that) and it caused a painful migration. Names can change, while the codes are static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names can change, while the codes are static.

I mentioned this above already, but if the idea is that names can change why are we using text for multiaddrs? I'm not insisting we have to be consistent here and use text for both or codes for both protocol identifiers and multiaddrs, but this will come up again and for those in support of the change I'd like the reasoning documented which it currently is not.

Co-authored-by: Alex Potsides <alex@achingbrain.net>
@willscott
Copy link
Contributor

Why are codes preferred over text?

We use codes here as they are more compact in storage. Because the wire values are signed, use of codes leads to efficiency in stored and transferred records.

Why a string hex representation rather than something else like decimal?

Since codes are an unbounded numeric space, it is conceivable members may experiment using codes that are outside of the standard decimal representation of javascript. As such, a string representation is stable for transfer across representations.

Why 0x rather than multibase and f?

I would be okay using multibase + f, although I don't know which library I would actually reach for to do the packing/unpacking. In practice, i expect I would just do exact string matching so it wouldn't matter. I think My main argument here is that the "0x" format is what is present in e.g. the multicodec table today, so it's easy to correlate against.

Why use text for multiaddrs but not data transfer?

We do have codes for the compact representation of multiaddrs, so I think this differentiation is more about the encoding for use on the wire here.

I see two differences:

  1. Data transfer records have higher cardinality than multiaddrs - there is one set of multiaddrs per provider, vs many more CIDs, each potentially with a unique data transfer record.
  2. We have more diversity and extensibility of segments within multiaddrs - where there are many multiaddr protocols that are not recognized by all implementations. Names ease human debugging of 'why can't i connect'

@@ -54,10 +54,13 @@ Both read and write provider records have a minimal required schema as follows:

Where:

- `Protocol` is the multicodec name of the transfer protocol or an opaque string (for experimenting with novel protocols without a multicodec)
- `Protocol` is a globally unique opaque string that represents the transfer protocol used by a content provider.
Copy link
Member

@SgtPooki SgtPooki May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globally unique and opaque are somewhat conflicting, aren't they? If it's opaque, how do we guarantee that it's globally unique? maybe i'm misunderstanding how opaque applies here, but maybe opaque is a "golang-esque" terminology that shouldn't be in the spec?

side note: I think we over(mis)use the term "opaque"

@lidel
Copy link
Member Author

lidel commented Oct 3, 2023

Closing, as I believe this is obsoleted by content changes from #417, which moves away from trying to follow names from multiformats/multicodec/table.csv, and specifies nothing more than "protocol" being a string ecosystem implementations agree upon.

My expectation is that the ecosystem will want to add a list of "well known" ones to the spec (even if it is in "notes for implementers" Appendix), but that will happen once we have trustless gateway protocol used in addition to bitswap for a while (Kubo 0.23 will ship experiment, but it will be a few releases before we have client for this and are comfortable enabling it by default).

@lidel lidel closed this Oct 3, 2023
@lidel lidel deleted the clarify-protocol-in-routing-api branch October 3, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify how delegated /routing/v1 API can support new protocols
7 participants