IPIP-359: Multi gateway client #359

markg85 · 2022-12-16T18:24:04Z

A spec to describe how multi gateway clients - formally known as racing gateways - should behave. This is very much a companion spec to #356.

To refine it's place.
This spec describes how multi gateway clients work and should be implemented.
#356 would have this in it's implementation.

meandavejustice · 2022-12-22T22:03:57Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+## Motivation
+
+When developing an application with IPFS functionality you'd ideally want more then 1 gateway and distribute the requests among N gateways. This spec relies on IPIP-0280 (gateways file).


Leaving a note to remind you to link to IPIP-0280 once it is merged

meandavejustice · 2022-12-22T22:08:03Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+### Keeping the usable gateway list fresh in the background
+
+Getting this list of gateways and maintaining if they should be used can take quite some time. The adviced approach here is to run each request in an async matter where the async flow follows the same flow as the above flowchart.


adviced => advised*

meandavejustice · 2022-12-22T22:09:01Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+### Configuration options
+
+`concurrent requests` Defaults to 10. There must be a way to specify how many concurrent requests the `multi gateway client` does per IPFS request.


I think default should be 6 requests here per => https://docs.diffusiondata.com/cloud/latest/manual/html/designguide/solution/support/connection_limitations.html#:~:text=Most%20modern%20browsers%20allow%20six,with%20any%20server%20or%20proxy.

meandavejustice · 2022-12-22T22:09:54Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+`max simultaneous cids` Defaults to 5. There must be a way to define how many simultaneous IPFS requests the `multi gateway client` can handle at any given time.
+
+`max total gateways in use` Defaults to 25. There must be a way to specify how many total gateways can be used for the `multi gateway client` as a whole.


What is the reason for having a limit here? Seems to me that more would always be better

meandavejustice · 2022-12-22T22:10:35Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+`racing` Defaults to false. There must be a way to specify if `racing` should be used. Racing means the `multi gateway client` will ask at most the number of `concurrent requests` to all download the same data. The one who downloads it first if the one whose output is used, the rest is ignored.
+
+`verify raw` Defaults to true. This tells the `multi gateway client` implementation to verify RAW data as wel as CAR data. Setting this option to true (the default) means the `multi gateway client` is guaranteed to only give back valid data. If this option is set to false then raw data is returned as-is, unverified.


typo: wel => well

meandavejustice · 2022-12-22T22:12:25Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+The data retrieval for a given CID must adhere to the configuration options.
+
+There must be an async way to get the data represented by that CID. While the `multi gateway client` can handle any CID data, in it's default settings all data is being verified. If `verify raw` is set to false then raw data is passed back as-is. CAR data is always verified.


Is there any usecase to have CAR data returned without verifying? Probably not, but if so we should include an option for that as well

I would remove this, and clearly state the spec should always verify received bytes against expected CIDs.
There should not be any footgun that allows MITM/spoofing of user data.

meandavejustice · 2022-12-22T22:14:03Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+`concurrent requests` Defaults to 10. There must be a way to specify how many concurrent requests the `multi gateway client` does per IPFS request.
+
+`max simultaneous cids` Defaults to 5. There must be a way to define how many simultaneous IPFS requests the `multi gateway client` can handle at any given time.


Is there a reason "5" was chosen here? It may make sense to set to 6 also for the reasons above

meandavejustice · 2022-12-22T22:17:44Z

I think there should be more info on the defined behavior when racing is set to false, especially since it is the default.

meandavejustice · 2022-12-22T22:19:59Z

Leaving a note so that we remember to capture #356 (comment)

integrations/MULTI_GATEWAY_CLIENT.md

lidel · 2023-01-19T21:43:12Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+### Finding new gateways
+
+The `gateways` file is parsed to know the initial - bootstrap - gateways. Each line in this file is a single gateway. This list of gateways should be stored internally in this `multi gateway client` implementation.


How lines are separated? Make it clear both \n and \r\n are supported.

How each line should be parsed? (Trim whitespace and parse as URL from https://url.spec.whatwg.org ?)

lidel · 2023-01-19T21:46:40Z

integrations/MULTI_GATEWAY_CLIENT.md

+https://ipfs.io
+```
+
+From this point on the client should iterate over those gateways and request each of them to give a list of [gateways that it knows](#Gateway-returns-list-of-gateways-it-knows). Based on the return, this should result in a vastly bigger list of potentially usable gateways:


Flagging that there is no protocol for this atm.

FYSA there is vaguely similar proposal for ambient discovery of HTTP content routers (IPIP-342), we also talk about HTTP transport based on gateway MTB5.

Unless you plan to wait with this IPIP until we have something, consider removing "gateway discovery" and limit scope to manual management done by client implementaitons.

lidel · 2023-01-19T21:49:34Z

integrations/MULTI_GATEWAY_CLIENT.md

+      G --> H[Store gateway];
+```
+
+The `200ms` threshold here is arbitrarily picked. From a decentralized point of view, 200ms allows you to go roughly halfway across the globe assuming your internet connection is stable. From a data retrieval point of view 200ms can be slow but can be just fine too. For example, if a site loads with 1 connection at a time with each connection having a 200ms latency then you will experience that site to "load slow". But if you load the same site with multiple concurrent connections where "some" might hit the 200ms threshold then you won't see much difference.


200ms allows you to go roughly halfway across the globe assuming your internet connection is stable

A big part of the planet dreams to have latency this low.
I suggest replacing it with a dynamic value based on median latency across all gateways, plus some arbitrary timeout.

lidel · 2023-01-19T21:53:00Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+The data retrieval for a given CID must adhere to the configuration options.
+
+There must be an async way to get the data represented by that CID. While the `multi gateway client` can handle any CID data, in it's default settings all data is being verified. If `verify raw` is set to false then raw data is passed back as-is. CAR data is always verified.


I would remove this, and clearly state the spec should always verify received bytes against expected CIDs.
There should not be any footgun that allows MITM/spoofing of user data.

lidel · 2023-01-19T21:57:34Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+### Request method
+
+There must be a method to allow IPFS data retrieval. The input for this method must be an IPFS url in these forms: `ipfs://<cid>` and `ipns://<cid>`.


If we introduce ipns:// we need to add some paragraphs that answer below questions:

How is IPNS resolved? Does it support DNSLink and IPNS records, or only one of them?

For IPNS record add dependency on IPIP-351 for end-to-end verification of IPNS.

For DNSLink, how should client resolve TXT records? OS resolver? DNS-over-HTTPS? Oblivious DNS?

lidel · 2023-01-19T22:03:00Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+### Security
+
+N/A


Note that clients must verify received blocks before using them, and discard ones which do not match expected CID.

If ipns:// is to be supported, note if / how to handle DNSLinks

lidel · 2023-01-19T22:04:44Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+### Compatibility
+
+N/A


Refer to

existing gateway specs, namely one that returns block and car, which is implemented by Kubo >=0.13.

CARv1 specification (if you use it)

Maybe mention https://github.com/ipfs/specs/blob/main/http-gateways/PATH_GATEWAY.md#only-if-cached-head-behavior as mechanism for prioritizing gateways which already have the data? Shotgunning fetch request to 5 gateways and getting same data 5 times back is super wasteful.

lidel · 2023-01-19T22:11:41Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+Is 2 requests. These count at `max simultaneous cids` where the default is 5 maximum. If there are more then `max simultaneous cids` then those that don't get handled will be put on a queue to be handled as soon as a slot becomes available.
+
+Internally that CID is represented by N different CIDs (each block). Say `bafyA` consists of 100 blocks (simplified depiction):


Does this mean the client always sends the first request for a single block, deserialize it, and then send CAR request for its branches? This is fine for MVP i guess, but it is hard to make a good decision when to swith from block to CAR for a deeper DAG.

Hannah made a demo during MTB5 and had some good ideas about adding option to fetch CAR with non-leave blocks first (metadata), and then fetching leaves with actuald ata at the end – wrote some notes in #348 (comment). It also included byte range requests, which are important for use cases like video seeking.

I feel we should strongly consider adding these parameters to CAR requests, before this IPIP is finalized.
(Ok to PoC implementation with naive Block/full-CAR for now, but we want better spec and implementation at the end of the road).

lidel · 2023-01-19T22:14:19Z

integrations/MULTI_GATEWAY_CLIENT.md

+
+### CAR verification file
+
+Besides verifying for response headers, we should also define which blob we actually expect. Like a "Hello world" or "Hello IPFS".


for a quick heartbeat check, a CAR with single root for a zero-length block will be enough, and won't waste much bandwidth

integrations/MULTI_GATEWAY_CLIENT.md

markg85 mentioned this pull request Dec 16, 2022

IPIP-356: IPFSClient API #356

Draft

Initial draft multi gateway client spec.

e2e80a8

meandavejustice reviewed Dec 22, 2022

View reviewed changes

lidel changed the title ~~IPIP-0000: Multi gateway client~~ IPIP-359: Multi gateway client Jan 19, 2023

lidel reviewed Jan 19, 2023

View reviewed changes

lidel reviewed May 10, 2023

View reviewed changes

integrations/MULTI_GATEWAY_CLIENT.md Outdated Show resolved Hide resolved

chore: editorials

e54f547

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPIP-359: Multi gateway client #359

IPIP-359: Multi gateway client #359

markg85 commented Dec 16, 2022

meandavejustice Dec 22, 2022

meandavejustice Dec 22, 2022

meandavejustice Dec 22, 2022

meandavejustice Dec 22, 2022

meandavejustice Dec 22, 2022

meandavejustice Dec 22, 2022

lidel Jan 19, 2023

meandavejustice Dec 22, 2022

meandavejustice commented Dec 22, 2022

meandavejustice commented Dec 22, 2022

lidel Jan 19, 2023

lidel Jan 19, 2023

lidel Jan 19, 2023

lidel Jan 19, 2023

lidel Jan 19, 2023

lidel Jan 19, 2023

lidel Jan 19, 2023

lidel Jan 19, 2023 •

edited

lidel Jan 19, 2023 •

edited

lidel Jan 19, 2023


		## Motivation

		When developing an application with IPFS functionality you'd ideally want more then 1 gateway and distribute the requests among N gateways. This spec relies on IPIP-0280 (gateways file).


		### Keeping the usable gateway list fresh in the background

		Getting this list of gateways and maintaining if they should be used can take quite some time. The adviced approach here is to run each request in an async matter where the async flow follows the same flow as the above flowchart.


		### Configuration options

		`concurrent requests` Defaults to 10. There must be a way to specify how many concurrent requests the `multi gateway client` does per IPFS request.


		`max simultaneous cids` Defaults to 5. There must be a way to define how many simultaneous IPFS requests the `multi gateway client` can handle at any given time.

		`max total gateways in use` Defaults to 25. There must be a way to specify how many total gateways can be used for the `multi gateway client` as a whole.


		`racing` Defaults to false. There must be a way to specify if `racing` should be used. Racing means the `multi gateway client` will ask at most the number of `concurrent requests` to all download the same data. The one who downloads it first if the one whose output is used, the rest is ignored.

		`verify raw` Defaults to true. This tells the `multi gateway client` implementation to verify RAW data as wel as CAR data. Setting this option to true (the default) means the `multi gateway client` is guaranteed to only give back valid data. If this option is set to false then raw data is returned as-is, unverified.


		The data retrieval for a given CID must adhere to the configuration options.

		There must be an async way to get the data represented by that CID. While the `multi gateway client` can handle any CID data, in it's default settings all data is being verified. If `verify raw` is set to false then raw data is passed back as-is. CAR data is always verified.


		### Finding new gateways

		The `gateways` file is parsed to know the initial - bootstrap - gateways. Each line in this file is a single gateway. This list of gateways should be stored internally in this `multi gateway client` implementation.


		### Request method

		There must be a method to allow IPFS data retrieval. The input for this method must be an IPFS url in these forms: `ipfs://<cid>` and `ipns://<cid>`.


		Is 2 requests. These count at `max simultaneous cids` where the default is 5 maximum. If there are more then `max simultaneous cids` then those that don't get handled will be put on a queue to be handled as soon as a slot becomes available.

		Internally that CID is represented by N different CIDs (each block). Say `bafyA` consists of 100 blocks (simplified depiction):


		### CAR verification file

		Besides verifying for response headers, we should also define which blob we actually expect. Like a "Hello world" or "Hello IPFS".

IPIP-359: Multi gateway client #359

Are you sure you want to change the base?

IPIP-359: Multi gateway client #359

Conversation

markg85 commented Dec 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meandavejustice commented Dec 22, 2022

meandavejustice commented Dec 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Jan 19, 2023 • edited

Choose a reason for hiding this comment

lidel Jan 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Jan 19, 2023 •

edited

lidel Jan 19, 2023 •

edited