Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support direct HTTP retrieval from /https providers #125

Open
lidel opened this issue Apr 22, 2024 · 4 comments
Open

Support direct HTTP retrieval from /https providers #125

lidel opened this issue Apr 22, 2024 · 4 comments

Comments

@lidel
Copy link
Member

lidel commented Apr 22, 2024

This is GO version of ipfs/service-worker-gateway#72.

We want rainbow to benefit from /https providers (example) and use them in addition to bitswap

Ideally, we would be prioritizing HTTP retrieval over bitswap, where possible, as it lowers the cost of content providers, and incentivizes them to configure, expose, and announce HTTPS endpoints.

MVP scope

Focus should be on block (application/vnd.ipld.raw, ?format=raw) requests, as these will always work, across all implementations, and provide the best cachability for HTTP infrastructure we have.

CAR with IPIP-402 may be more involved, and may lead to duplicated block retrievals due to the way loading a page with a dozen of subresources works (all share the same parent, all fetched in parallel, may lead to racy case where parent blocks are fetched multiple times, slowing down page loads)

@hacdias
Copy link
Member

hacdias commented Apr 23, 2024

Before continuing, I want to lay down some notes to make sure we're all on the same page about what needs to be done and about the current challenges with accepting the /https providers.

Most providers with HTTPS multiaddresses are unusable

Most, if not all, providers advertising /https multiaddresses are, standard-speaking, unusable. They do not follow the proper peer schema. We can certainly hammer the code to accept them, but I would rather have the original provider of the records implement the correct schema instead. So, instead of:

{
  "Addrs": ["/dns4/dag.w3s.link/tcp/443/https"],
  "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
  "Metadata": "oBIA",
  "Protocol": "transport-ipfs-gateway-http",
  "Schema": "unknown"
},

We should be getting this:

{
  "Schema": "peer",
  "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
  "Addrs": ["/dns4/dag.w3s.link/tcp/443/https"],
  "Protocols": ["transport-ipfs-gateway-http"]
}

As I said, the code can be hammered to accept this (albeit a bit harder in Go). But I would rather not go that avenue. We already have plans of completely removing support for "Schema": "bitswap" (e.g.: from Pinata) from Boxo. Supporting one more non-standardized schema will just make things more complicated when it doesn't need to be.

Fetching the block via HTTPS

The current flow to fetch a block, from the Blockservice perspective, is as follows:

  1. Blockservice gets asked for a block
  2. Blockservice checks with Blockstore, if it has it, return it. Otherwise,
  3. Blockservice asks the Exchange, which currently is just Bitswap
  4. Bitswap looks out for providers using a routing.ContentRouter. This routing.ContentRouter only has Bitswap-related peers. All other peers are ignored, even if they come from a /routing/v1 endpoint.
  5. Bitswap tries fetching it, returns, etc, etc.

I see a few ways of potentially solving this.

(a) Parallel Exchanges

Create a parallel exchange that calls both Bitswap and a new exchange that can take advantage of the Delegated Routing endpoint results that have non-Bitswap peers.

Challenges I see:

  1. Duplicate HTTP requests to delegated routing endpoints, done by both exchanges.

(b) Smarter Exchange

An exchange where you can register sub-exchanges (or fetchers) for certain protocol types. This exchange would call FindProviders itself, and depending on the results, would parellelize calls to different fetchers (Bitswap, Gateways, etc).

Challenges I see:

  1. We need to already be able to tell the Bitswap client that we know that peer X has the block Y to avoid it doing the FindPeers request again. Maybe it's already possible, but I'm not familiar enough with the code. Needs investigation.
  2. Reconcile Delegated Routing lookups with DHT lookups. Boxo only provides code for the opposite case: delegated routing to Libp2p routers, ignoring every non-bitswap code. . This is already done in someguy, which parallelizes DHT and Delegated Routing endpoints into a Delegated Routing-like interface. We'll likely want to re-use the code.

(b) seems technically more complicated (at least without looking at what is currently possible), but likely better to save duplicated HTTP requests and resources. We can also probably reuse the new RemoteBlockstore from boxo/gateway to fetch remote blocks from the /https peers.

@lidel
Copy link
Member Author

lidel commented Apr 23, 2024

Triage:

  • Most providers with HTTPS multiaddresses are unusable
    • try to clean up cid.contact/routing/v1 responses, switch both http and bitswap providers to modern peer schema so we can remove hacks from boxo/kubo/rainbow
  • look into Smarter Exchange
    • assumption: we always have some peerid and some Addrs, so we cna reuse interfaces from libp2p
    • try to libp2p identify to learn p2p protocols, and use bitswap if present
    • if /http/tls or /https or /http is present, attempt HTTP retrieval instead of bitswap

@hacdias
Copy link
Member

hacdias commented Apr 24, 2024

Update:

@lidel
Copy link
Member Author

lidel commented May 6, 2024

Something we could try, without changing too much, without touching higher level abstractions like exchanges, is doing opportunistic HTTP fetch in boxo/bitswap itself.

Wrote initial thoughts in ipfs/boxo#608 – pinged some folks, looking for feasibility feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants