Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway: fast check if CID is in local datastore cache (only-if-cached) #8783

Closed
lidel opened this issue Mar 10, 2022 · 4 comments · Fixed by #9082
Closed

Gateway: fast check if CID is in local datastore cache (only-if-cached) #8783

lidel opened this issue Mar 10, 2022 · 4 comments · Fixed by #9082
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/gateway Topic gateway

Comments

@lidel
Copy link
Member

lidel commented Mar 10, 2022

Ecosystem context

Raw block and CAR stream Gateway response types are added in #8758 and https://github.com/ipfs/go-ipfs/issues/8769.

This unlocks exciting features to happen in IPFS ecosystem:

  • light clients that fetch data from multiple gateways in trustless fashion
    • Mobile web browsers with low impact on battery
    • IoT devices fetching firmware updates
    • etc.
  • "transport gateways"
    • easier to implement: no HTML hosting, only Block / CAR
    • lower risk, no automated DMCA takedowns

What is missing

Ability to do a quick test if data behind a CID is already present in Gateway's local cache

Essentially an equivalent for what we already can test in CLI/RPC API:

  • ipfs block get --offline /ipfs/{cid} → errors if block is not present locally
  • ipfs dag stat --offline /ipfs/{cid} → errors if full DAG is not present locally

I think doing this only per block should be enough – checking if entire DAG is present may be too expensive / overkill in practice.

Why we need it

  • Enables inexpensive checks that do not trigger data retrieval → reduced cost of running gateway
  • Light clients are able to use multiple gateways more efficiently → improved performance on the client
    • It is fair to assume light clients will have a gateway pool (list). Such client should be able to probe which gateway has the data in local cache, and is able to respond with it immediately (without hitting DHT/retrieval), and send GET request to one of them.

How to implement this

HTTP HEAD request is a good candidate. It does not return any payload, only HTTP headers.

Right now, HEAD request is being used for shallow preload of root blocks: depending on resource type, it usually triggers block fetch events along the requested content path up to the root block of the final path segment. We can't change this, because it works as expected – clients use HEAD to read Content-Length of unixfs files and raw blocks, and that is why root block has to be fetched if it is not present in the local datastore.

This means we need some additional flag to signal we want to do a local datastore check without triggering any additional work.

How to indicate "no-remote-fetch" when sending HTTP HEAD request for /ipfs/{cid}?

Perhaps RFC 7234#only-if-cached?

Cache-Control: only-if-cached could be used for requesting payload only if the gateway already has the data and can return it immediately. If data is not cached locally, and the response requires an expensive remote fetch, a 504 (Gateway Timeout) status code should be returned.

HEAD + Cache-Control: only-if-cached + optional Accept seem to cover the needs of light clients.

@lidel lidel added kind/enhancement A net-new feature or improvement to an existing feature topic/gateway Topic gateway labels Mar 10, 2022
@lidel lidel changed the title Gateway: fast check if CID is in local datastore Gateway: fast check if CID is in local datastore cache Mar 29, 2022
@Jorropo
Copy link
Contributor

Jorropo commented Apr 14, 2022

About the type, I do not any opinion about which Header to use, anyone of them is fine to me.

However about the URL, I think for the exact same reason with support both URL and Header format for the request format, we should support both too.

  • 404 - Not Found in local cache

What about sending a different code to differenciate bad unixfs paths and block not found ?

With your proposal it would be impossible to differenciate between accessing Qmfoo/aaa where Qmfoo has no aaa field. And Qmfoo/aaa where aaa isn't in the blockstore. (without parsing the error type, which is not fun to do)

I think we should use:

  • 410 | Gone

This is a client failure, indicating that the data could not been found and that no redirection is known, which is not true, we know one it would be to use online mode. But I think that out of the 4xx the one that make the most sense.

@lidel
Copy link
Member Author

lidel commented Apr 14, 2022

Makes sense. Also, note from #8880

we would need application/vnd.ipfs.cache.block for checkign a single block, and application/vnd.ipfs.cache.dag for full DAG. The latter check would be equivalent to running ipfs dag stat --offline $CID for every child to see if the entire DAG is present in local store, without making any network requests.

@lidel
Copy link
Member Author

lidel commented May 25, 2022

I think I've found an elegant way of doing this without inventing any new headers or fake content types.
See RFC 7234#only-if-cached.

Cache-Control: only-if-cached could be used for requesting payload only if the gateway already has the data and can return it immediately. If data is not cached locally, and the response requires an expensive remote fetch, a 504 (Gateway Timeout) status code should be returned.

HEAD + Cache-Control: only-if-cached + optional Accept seem to cover the needs of light clients.

We most likely don't want to go with HTTP code 504: anything >499 produces scary red errors in browser console, making it awkward when video player is asking 10 gateways and only 2 have data cached.

This is not error state, but part of our content routing cycle.
We should use a 4XX instead, 412 Precondition Failed sounds like a sensible match – see ipfs/specs@5435910

@lidel lidel changed the title Gateway: fast check if CID is in local datastore cache Gateway: fast check if CID is in local datastore cache (only-if-cached) Jun 7, 2022
lidel added a commit to ipfs/specs that referenced this issue Jun 9, 2022
@lidel lidel self-assigned this Jun 23, 2022
lidel added a commit to ipfs/specs that referenced this issue Jul 1, 2022
* feat: initial HTTP gateway specs

This adds gateway specs under ./http-gateways directory.

The aim is to document _current_ behavior (implementation in go-ipfs 0.13)
and switch the way we do the gateway work to be specs-driven.

Long term goal is to provide language and implementation agnostic
specification that anyone can use to implement compatible gateways.

* gateway: add Content-Range

* gateway: registerProtocolHandler uri router

* CODEOWNERS: add lidel for ./http-gateways

* gateway: resolving an advanced DNSLink chain

* gateway: only-if-cached HEAD behavior

* gateway: suggestions from reviewers

Co-authored-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
Co-authored-by: Vasco Santos <vasco.santos@moxy.studio>
Co-authored-by: Oli Evans <oli.evans@gmail.com>

* gateway: include CIDv1 node in summary

* gateway: reorder URI router section

As suggested in #283 (comment)

* gateway: add Denylists section

* gateway: switch only-if-cached miss to 412

Rationale: ipfs/kubo#8783 (comment)

* gateway: apply suggestions from review

Co-authored-by: Thibault Meunier <thibmeu@users.noreply.github.com>

* gateway: apply suggestions from Cloudflare

#283 (review)

* gateway: add X-Content-Type-Options

* gateway: simplify dnslink summary

https://github.com/ipfs/specs/pull/283/files#r898709569

* gateway: document 412 Precondition Failed

https://github.com/ipfs/specs/pull/283/files#r898686654

* gateway: link to ipld codecs explainer

https://github.com/ipfs/specs/pull/283/files#r898687052

* gateway: stub about handling traversal errors

https://github.com/ipfs/specs/pull/283/files#r892845860

* gateway: expand HTTP caching considerations

* gateway: editorial fixes

Co-authored-by: Steve Loeppky <stvn@loeppky.com>

* gateway: expand on Host header parsing

https://github.com/ipfs/specs/pull/283/files#r898703765

* gateway: editorial fixes

* gateway: X-Forwarded-Proto and X-Forwarded-Host

* gateway: editorial fixes

* gateway: X-Trace-Id

optional header suggested in:
#283 (comment)

rationale: having specific name as a suggestion of 'best practice' in
the specs will simplify debugging across ecosystem

* gateway: Generated HTML with directory index

Synthesis of ideas from:
ipfs/kubo#8455
and
ipfs/kubo#9058

Co-authored-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
Co-authored-by: Vasco Santos <vasco.santos@moxy.studio>
Co-authored-by: Oli Evans <oli.evans@gmail.com>
Co-authored-by: Thibault Meunier <thibmeu@users.noreply.github.com>
Co-authored-by: Steve Loeppky <stvn@loeppky.com>
@lidel
Copy link
Member Author

lidel commented Jul 5, 2022

PR ready for review in #9082

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/gateway Topic gateway
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants