Handle unparsable multiaddrs #70

Stebalien · 2018-07-18T15:57:10Z

Due to the fact that binary multiaddrs don't include protocol definitions, we can't parse them unless we know all the relevant protocols.

However, we can parse a prefix. Therefore, I'd like to propose a special, string-only "unknown" protocol that takes a single multibase encoded argument. That is: `/ip4/1.2.3.4/tcp/123/unknown/bxyz". This would only exist in the string format and would allow us to keep and use multiaddrs we don't fully understand.

This should fix some of the problems described in #6.

alanshaw · 2018-11-13T11:46:25Z

A few things I'd like to explore:

How can we retain the original string? In terms of printing these addrs out, it's more useful to be able to print the representation that we recieved, not a /unknown/whatever even if it cannot be used. The use case is an old js-ipfs-api requesting swarm.addrs to be printed out on a webpage. It might not understand how to parse QUIC addrs but it's preferable to print the QUIC addr than an "unknown" addr
How can we distinguish between a multiaddr that is valid or not? If we can pass it any string protocol - how do we determine if it's valid (at least in our version of the codec table)? What would valid mean now? What do we do when passing a string/buffer to the multiaddr constructor?
If we create a "string only" multiaddr - what happens when we request the buffer version - throw error?

alanshaw · 2018-11-13T11:48:56Z

@jacobheun it might be useful to have your opinion on this?

olizilla · 2018-11-13T12:01:20Z

The requirement that a multiaddr can be represented as a buffer is tricky. Every user of multiaddr has a stale copy of the buffer-to-protocol mapping. It's trivial to syntacitically validate a multiaddr string, but hard to semantically validate that protocols are valid if the definition of valid is their existance in the mapping table.

We will keep running into situations where peers and apps and api clients are running with different versions of multiaddr, with different mapping tables, and older ones unable to parse newer ones.

Having a string only format for multiaddr that doesn't guarantee it can be converted to a buffer would at least tick the points 1 to 4 on the problem statement https://github.com/multiformats/multiaddr#introduction

Stebalien · 2018-11-14T01:05:32Z

I think we're talking past each other a bit here. My primary concern is relay dialing (which should have been in the issue description...). That is, I need to be able to use a multiaddr prefix (encoded) without the rest.

For example: /ip4/1.2.3.4/tcp/1234/p2p/QmRelay/p2p-circuit/ip4/tcp/1234/special_protocol/p2p/QmTarget.

QmRelay and QmTarget may understand special_protocol while I may not.

Basically, I'm worried about creating unnecessary network partitions because we're being too strict in parsing our inputs.

We still need to solve the UX issues.

How can we retain the original string?

We usually don't and can't know the original string (e.g., binary peer address records from the DHT).

How can we distinguish between a multiaddr that is valid or not? If we can pass it any string protocol - how do we determine if it's valid (at least in our version of the codec table)? What would valid mean now? What do we do when passing a string/buffer to the multiaddr constructor?

This proposal tries to cover the use-case where I don't really care about the rest of the multiaddr. Ideally, I'd be able to validate it, but even if I could, I'm just going to pass it off to someone else anyways.

If we create a "string only" multiaddr - what happens when we request the buffer version - throw error?

By string-only, I meant that /unknown/... would only appear in the string representation. When converting back to the binary representation, we'd parse $x/unknown/$y back into ParseMultiaddr($x) + CastMultiaddrBytes(DecodeMultibase($y)). That is, everything after unknown would just be the part of the multiaddr we failed to parse (binary), multibase-encoded.

Now, there is the case where a user gives me /ip4/.../...../someproto and I don't know what someproto is. This proposal doesn't handle that case but I guess we could:

Allocate a special "string" path multicodec.
Use that multicodec and encode the rest of the string as the argument.

That is, if we ran across $x/someproto/$y, we'd encode this to BinaryMultiaddr($x) + StringMultiaddrCodec + Length + $y. When parsing such a multiaddr, implementations would do their best to lower it to a the correct binary format.

However, really, "upgrade your client" may also be a valid solution here.

alanshaw · 2018-11-19T16:34:05Z

Thanks for your thoughts!

That is, if we ran across $x/someproto/$y, we'd encode this to BinaryMultiaddr($x) + StringMultiaddrCodec + Length + $y. When parsing such a multiaddr, implementations would do their best to lower it to a the correct binary format.

Do we need a StringMultiaddrCodec - could we just use the unknown codec?

That aside, I'm interested to see if/how this would work in code - I'll see if I can work up a PR to js-multiaddr and report back :D

Stebalien · 2018-11-19T21:45:52Z

Do we need a StringMultiaddrCodec - could we just use the unknown codec?

So, there wouldn't really be an unknown "codec". Basically, we're trying to handle two inverse cases:

We have a string and need to turn it into binary. In this case, we need a protocol that only exists in binary form. When converting back to a string, the protocol would disappear.
We have a binary address and need to turn it into a string. In this case, we need a protocol that only exists in string form. When encoding to the binary representation, it would disappear as we already know the binary protocol code.

Basically, we need a StringInBinary protocol (StringMultiaddrCodec) and a BinaryInString protocol (the "unknown" protocol).

That aside, I'm interested to see if/how this would work in code - I'll see if I can work up a PR to js-multiaddr and report back :D

The second half (binary in strings) has a PR here: multiformats/go-multiaddr#74.

However, I'm still worried about multiple binary representations of multiaddrs. That's why I'm less sure we want StringInBinary.

That way, we can always tell if something is a path or something else. We may also be able to take advantage of this later to combine a few concepts and get rid of the "multiaddrs look like paths but are totally not" problem. However, we can think about that later. This PR just reserves the code so we don't run into problems later. * Remove the distinction between string/binary multiaddrs. Instead, the "string" will *also* be a valid binary multiaddr. * Define a new multipath spec to combine multiaddrs and other paths. Related to: multiformats/multiaddr#70

ghost · 2019-03-27T15:50:10Z

Could the problem of unknown/new address protocols be solved by the following steps?

require that all new protocols are length-prefixed
require that all implementations support the existing non-length-prefixed and non-value protocols

That'd seem like a simple way forward to me.

ntninja · 2019-03-27T21:22:30Z

@lgierth: How do we know whether a protocol even requires an argument at all? /utp, /quic and about 10 others do not, for instance.
Also why not simply require all protocols to be length-prefixed then? While it would increase many binary representations by 1 byte per protocol, it would make parsing byte representations much simpler in general and solve the “unknown protocol” issue almost as a side-effect. (Although it would not help with the problem parsing unknown string representations, of course.)

ghost · 2019-03-27T22:49:23Z

How do we know whether a protocol even requires an argument at all? /utp, /quic and about 10 others do not, for instance.

Yeah, actually, point 2 above should include most or all existing protocols, not just the length-prefixed ones.

Also why not simply require all protocols to be length-prefixed then?

Backward compatibility. It'd be a huuuuge hassle to fundamentally change existing multiaddr protocols, while it's easy to freeze them as-is and introduce new rules for new protocols.

We must avoid breaking changes at all cost.

Stebalien · 2019-03-28T02:49:18Z

We must avoid breaking changes at all cost.

Note: We can also transition. That is, we can say introduce a new "multiaddr-2" protocol and parse everything after that using the new system. Eventually, we can remove support for multiaddr-1 (the network will forget old multiaddrs pretty quickly). This would give us a chance to revisit this from scratch.

saul-jb · 2022-03-13T21:43:40Z

Has there been any more progress made on this?

This would make multiaddr much more useful for projects that need to add complexity to what is already defined by multiaddr without requiring per-application changes to the table.

Stebalien · 2022-03-20T20:30:23Z

Unfortunately, no. If you're interested, I'd start by forking go-multiaddr and adding an implementation there. I say "fork" because adding support for this will require a non-trivial refactor of everything depending on multiaddrs, so I expect we'll end up with two "flavors" of multiaddrs: partially validated ones and fully validated ones".

This was referenced Jul 18, 2018

feat: Adds ws and wss multiaddr multiformats/go-multiaddr#72

Closed

[WIP] add the "unknown" multiaddr multiformats/go-multiaddr#74

Open

ghost mentioned this issue Aug 16, 2018

Handling unknown codecs ipld/ipld#46

Open

Stebalien mentioned this issue Sep 20, 2018

Address book ordering, and private addresses libp2p/go-libp2p-peerstore#11

Closed

This was referenced Nov 8, 2018

Bug: missing protocol breaks the peers page ipfs/ipfs-webui#878

Closed

fix: handle peer-info validation errors ipfs-inactive/js-ipfs-http-client#887

Merged

Stebalien mentioned this issue Nov 26, 2018

reserve 0x2f (ASCII /) as the *path* multicodec multiformats/multicodec#93

Merged

alanshaw mentioned this issue Jan 17, 2019

Using constants for multicodec ipfs/js-ipfs#1809

Closed

Stebalien mentioned this issue Mar 13, 2019

Concept docs: nat traversal and relay libp2p/docs#24

Merged

This was referenced Jul 18, 2019

fix Deprecated packages and swarm peers convert error ipfs/go-ipfs-http-client#21

Merged

Move all protocol definitions into this package multiformats/go-multiaddr#108

Closed

ntninja mentioned this issue Jan 10, 2020

Document MultiAddr codecs implemented by py-multiaddr #101

Open

saul-jb mentioned this issue Mar 13, 2022

Supporting ZeroMQ #125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle unparsable multiaddrs #70

Handle unparsable multiaddrs #70

Stebalien commented Jul 18, 2018 •

edited

alanshaw commented Nov 13, 2018

alanshaw commented Nov 13, 2018

olizilla commented Nov 13, 2018

Stebalien commented Nov 14, 2018

alanshaw commented Nov 19, 2018

Stebalien commented Nov 19, 2018

ghost commented Mar 27, 2019 •

edited by ghost

ntninja commented Mar 27, 2019

ghost commented Mar 27, 2019 •

edited by ghost

Stebalien commented Mar 28, 2019

saul-jb commented Mar 13, 2022

Stebalien commented Mar 20, 2022

Handle unparsable multiaddrs #70

Handle unparsable multiaddrs #70

Comments

Stebalien commented Jul 18, 2018 • edited

alanshaw commented Nov 13, 2018

alanshaw commented Nov 13, 2018

olizilla commented Nov 13, 2018

Stebalien commented Nov 14, 2018

alanshaw commented Nov 19, 2018

Stebalien commented Nov 19, 2018

ghost commented Mar 27, 2019 • edited by ghost

ntninja commented Mar 27, 2019

ghost commented Mar 27, 2019 • edited by ghost

Stebalien commented Mar 28, 2019

saul-jb commented Mar 13, 2022

Stebalien commented Mar 20, 2022

Stebalien commented Jul 18, 2018 •

edited

ghost commented Mar 27, 2019 •

edited by ghost

ghost commented Mar 27, 2019 •

edited by ghost