Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway: batching raw block requests (AKA userland traversal of DAGs with unknown codecs) #427

Open
8 tasks
lidel opened this issue Jul 18, 2023 · 0 comments
Open
8 tasks
Labels
need/triage Needs initial labeling and prioritization

Comments

@lidel
Copy link
Member

lidel commented Jul 18, 2023

Performance gap in trustless retrieval

Something we've identified during user research is the ability to fetch arbitrary raw blocks in a single request.

This comes up in two use cases:

  • content-addressed data built with custom codecs (such as blockchains, bitorrent,)
  • error handling in retrieval clients, sharding/resuming partial download (for example, only fetching specific layer of a DAG)

In both cases the DAG is not traversable on the backend, but the client still is able to retrieve it block by block: reading the root, then learning about child branches, and requesting each of them as application/vnd.ipld.raw.

The downside is the number of unnecessary roundtrips when multiple CIDs could be requested at the same time.

We want to remove the gateway as an innovation choke point and improve performance for in use cases where content-addressable data can't be traversed by the gateway, but can still be retrieved block-by-block.

The need

Specification should include a canonical way for batching multiple application/vnd.ipld.raw in a single request.: asking trustless gateway for N CIDs, and getting related blocks back without spending resources on multiple requests.

It could be a new request-response type, or a clarification around multiplexing present in HTTP/2 and HTTP/3.

TODO

  • benchmark and evaluate if new request-response type is actually needed, or do we send many application/vnd.ipld.raw
    • HTTP/2 suffers from a head-of-line blocking issue on TCP layer, but maybe is enough?
    • HTTP/3 brings true multiplexing with HTTP/3 and QUIC
    • Given the HTTP Caching should cache individual block responses along any CDNs and other HTTP middleware, do we need a solution for HTTP/1.1?
  • IF we don't need a new response type, document best practices around maximizing perf. on HTTP/2 and /3.
  • IF we need a new response type
    • figure out how to ask for mutliple unrelated CIDs in a single request
    • figure out what should be the response format
      • initial idea: return application/vnd.ipld.car with roots being the requested CIDs
    • propose IPIP for https://specs.ipfs.tech/http-gateways/trustless-gateway/
      • include limit of CIDs requested in a single batch
      • include notes on cid ordering / cache control (batch responses should be cachable the same way as application/vnd.ipld.raw is)
@lidel lidel added the need/triage Needs initial labeling and prioritization label Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

1 participant