Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-462: Ipfs-Path-Affinity on Gateways #462

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

lidel
Copy link
Member

@lidel lidel commented Feb 16, 2024

TLDR

Extends Gateway specs with optional Ipfs-Path-Affinity request header.
That is all.

If header is present in request, gateway can leverage this optional hint to improve content routing.

The idea is that trustless clients like https://www.npmjs.com/package/@helia/verified-fetch making request for a block or car have additional information which could be leveraged by gateway.

Background

Endpoints that implement https://specs.ipfs.tech/http-gateways/trustless-gateway/ may receive requests for a single block, or a CAR request sub-DAG of a bigger tree.

Not every CID is announced today, some providers limit announcements to top-level root CIDs.
Over time, both clients and servers should get smarter about the concept of "affinity": when processing a request for a content path that is deeper than a root CID, leverage parent segments as additional hint for content routing lookup.

cc ipfs/kubo#8676 ipfs/kubo#10251 ipfs/kubo#10365

@lidel lidel requested a review from a team as a code owner February 16, 2024 18:16
@lidel lidel changed the title IPIP-0461: Ipfs-Path-Affinity on Gateways IPIP-0462: Ipfs-Path-Affinity on Gateways Feb 16, 2024
@lidel lidel changed the title IPIP-0462: Ipfs-Path-Affinity on Gateways IPIP-462: Ipfs-Path-Affinity on Gateways Feb 16, 2024
Comment on lines +36 to +37
Introduce `Ipfs-Path-Affinity` HTTP request header to allow HTTP client to
inform gateway about the context of block/CAR request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the format of the data that goes here? Is it ipfs://<cid>/<some>/<path> is it /ipfs/... is it just a CID?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it should be /ipfs/... (or ipfs://cid), not just a CID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much prefer an ipfs:// protocol instead of pathing, but i guess whatever the users is likely to have is better ux

Copy link
Member Author

@lidel lidel Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with content path, as we already talk about them all over the specs.

NOTE: because UnixFS can have whitespace, :, and arbitrary bytes in labels, we have to percent-encode the content path.

Clarified format in trustless-gateway.md (35a5eed)


### Alternatives

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth explaining alternatives here like:

  • Why not just an arbitrary identifier the user could use to establish a relationship between requests?
  • Why only one path affinity rather than more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added answer for the first one in 61518c6
The second one is no longer relevant, because I've updated trustless-gateway.md we allow more than one header to be sent in a request.

Comment on lines +39 to +40
Client asking gateway for a block SHOULD provide a hint about the DAG the block
belongs to, if such information is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only a strict IPLD DAG where we'd recommend this? It seems like you could plausibly do this for a set of related data that aren't explicitly linked via IPLD (e.g. a website that has HTML that loads jpegs from within the same or a different root DAG).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of some chats we've had about it being likely that a provider of "bafyFoo1" would likely also have "bafyFoo2"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way field format is specified, these can be arbitrary content paths. It is up to the client to provide a meaningful hint.

Most of the time it will be the content path the client tries to load, but it could also be /ipns/other-website.

Comment on lines 28 to 29
Not every CID is announced today, some providers limit announcements to
top-level root CIDs due to time and cost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a reminder here that every piece of data that is supposed to be able to be accessed independently should be advertised.

Want to remind people to not walk themselves into silly problems like having data only accessible when passing the affinity header and then being confused when they can't get the data when it's included in another DAG. (e.g. I have the IPFS logo inside the DAG for docs.ipfs.tech but not advertised, but then someone uses the CID for the logo on explore.ipld.io but can't get the data because the affinity is wrong).

Copy link
Member Author

@lidel lidel Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded Motivation section in de0b231 a bit, but I think we need to have a page about advertisement and content routing on https://docs.ipfs.tech rather than specs.


(loose digression, just to clarify what I am optimizing fix for)

While I get what the logo example was supposed to illustrate, it won't be a real world scenario: even when only top dir is announced, and no files, whoever authored explore.ipld.io would have the logo cached locally, and once the DAG is published, the affinity of /ipns/explore.ipld.io will be enough to read it, even if it is still not announced directly.

Saying "announce things you want to access independently" sounds good on paper but does not work for real world use cases such as HTTP Range requests for video streaming or resuming downloads of big files.

Streaming 32GiB video is an example of something where we can mostly agree we should not announce every internal CID, even tho one could say those internal CIDs are "accessed independently" when I paused and resumed the player (generated new range request for specific internal CIDs).

IPFS clients should leverage knowledge of affinity path, and ask providers of the root CID of the parent entity (in this case, the root of UnixFS DAG, or a directory it is in) for the data.

src/ipips/ipip-0462.md Outdated Show resolved Hide resolved
src/ipips/ipip-0462.md Outdated Show resolved Hide resolved
efficiently, resume downloads faster.
- Gateway operators are able to leverage the hint and save resources related to
provider lookup.
- Content providers are able to implement smarter announcement mechanisms,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "smarter announcement mechanisms"?
Is it just a matter of whether only roots or all CIDs are announced?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"roots" isn't really a word here. Root of what? At every layer in the DAG going up you could slice off the top of the tree and declare a new root.

Every piece of content that needs to be independently addressable should be advertised. See https://github.com/ipfs/specs/pull/462/files#r1492996318

So at the very least if you make a block-request for the middle of a tar.gz file (where no part of the file really needs to be addressed on its own) you should be able to find it even if the provider has only advertised the root of the file.

As mentioned in the linked comment I do think we need to be careful not to mislead people though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every piece of content that needs to be independently addressable should be advertised.

Yes, that makes sense in theory, but if we go to the tar.gz example:

So at the very least if you make a block-request for the middle of a tar.gz file (where no part of the file really needs to be addressed on its own) you should be able to find it even if the provider has only advertised the root of the file.

Here, the only independently addressable content is the tar.gz file and for range block-requests in the middle, you'd need to pass the affinity header to be able to fetch that.

If we're on the same page thus far, what would be smarter announcement mechanisms/strategies? Is the idea for those to be codec-aware in the sense that you could tell the node to advertise all UnixFS Files and Directories?

Copy link
Member Author

@lidel lidel Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2color we have some ideas for smarter announcement mechanisms/strategies in ipfs/kubo#8676 and more actionable ipfs/kubo#10365 (which has wip implementation).

ps. we have a concept of entity, it was introduced in IPIP-402, we can use it in discussions like this, to say "only entity root CIDs (see IPIP-402)", making it more specific.

Copy link
Member

@2color 2color Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like "smarter" in this case means "frugal but smart" in the sense that it involves potentially less announcements that are effective enough for routing to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, there are "frugal" things we could do on both client and server to announce less, but be as efficient at retrieval in real world usage patterns (website browsing, video streaming, download resume or parallel downloads etc).

@2color
Copy link
Member

2color commented Feb 19, 2024

What would be the general expectation of a server when a client requests, for example, a binary leaf block that isn't announced, and the affinity header of an announced CID is passed?

A client requests bafk..ufqy and passes bafy...nkuq in the Ipfs-Path-Affinity header.
image

src/ipips/ipip-0462.md Outdated Show resolved Hide resolved
Comment on lines +36 to +37
Introduce `Ipfs-Path-Affinity` HTTP request header to allow HTTP client to
inform gateway about the context of block/CAR request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much prefer an ipfs:// protocol instead of pathing, but i guess whatever the users is likely to have is better ux

Comment on lines +39 to +40
Client asking gateway for a block SHOULD provide a hint about the DAG the block
belongs to, if such information is available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of some chats we've had about it being likely that a provider of "bafyFoo1" would likely also have "bafyFoo2"

src/ipips/ipip-0462.md Outdated Show resolved Hide resolved
src/ipips/ipip-0462.md Outdated Show resolved Hide resolved
src/ipips/ipip-0462.md Outdated Show resolved Hide resolved
What this mean for ecosystem? It should adapt. Over time, both clients and
servers should leverage the concept of "affinity".

## Detailed design
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really feel like this is missing some guidance on what the values should or can be.

With existing wording, it's too vague and could result in significant client pain of having to implement different values for different providers.

Also: we should really speak to the how of implementing this server on the client and server sides: at least some best practices. E.g. how are we going to implement this in @helia/verified-fetch and rainbow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified format in trustless-gateway.md, the gist is: you put the content path that you try to load via block or CAR request.

https://en-wikipedia--on--ipfs-org.ipns.inbrowser.dev/wiki/Books fetching necessary blocks from https://trustless-gateway.link it would have encodeURIComponent('/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/Books') value:

Ipfs-Path-Affinity: %2Fipfs%2Fbafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze%2Fwiki%2FBooks

in every ?format=raw request.

Co-authored-by: Daniel Norman <1992255+2color@users.noreply.github.com>
Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>
lidel added a commit to ipfs/boxo that referenced this pull request Mar 22, 2024
This is first stab at leveraging these hints withing existing
boxo/gateway codebase.

It is pretty blunt, but will enable smart clients fetching sub-DAGs
to work around any content routing gaps

For more info and header semantics see ipfs/specs#462
@lidel
Copy link
Member Author

lidel commented Mar 23, 2024

  • @2color the idea is that the hint would be the content path of parent entity that triggered block request. So if we want to load one of internal blocks of astro.jpg, Ipfs-Path-Affinity would be pointing at /ipfs/bafy..jomu/astro.jpg
  • Addressed some feedback, added description of field format in trustless-gateway.md
  • Opened feat: basic support for Ipfs-Path-Affinity from IPIP-462 boxo#592 with poc implementation for boxo/gateway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🔍 Ready for Final Reviews
Development

Successfully merging this pull request may close these issues.

None yet

5 participants