Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gw): Cache-Control: only-if-cached #9082

Merged
merged 1 commit into from Jul 7, 2022
Merged

Conversation

lidel
Copy link
Member

@lidel lidel commented Jul 5, 2022

This PR implements support for requests sent with only-if-cached prerequisite, as documented in Gateway specs:

PATH_GATEWAY.md#cache-control-request-header:

Client can send Cache-Control: only-if-cached to request data only if the gateway already has the data (e.g. in local datastore) and can return it immediately.

If data is not cached locally, and the response requires an expensive remote fetch, a 412 Precondition Failed HTTP status code should be returned by the gateway without any payload or specific HTTP headers.

PATH_GATEWAY.md#only-if-cached-head-behavior:

HTTP client can send HEAD request with Cache-Control: only-if-cached to disable IPFS data transfer and inexpensively probe if the gateway has the data cached.

Implementation MUST ensure that handling only-if-cached HEAD response is fast and does not generate any additional I/O such as IPFS data transfer. This allows light clients to probe and prioritize gateways which already have the data.

Why we need this?

How does this PR work?

Implementation from this PR is minimal: when only-if-cached is present in a request,
the Gateway will try to read block size for requested CID using API in offline mode (only local blockstore).

  • Error means the block is not in the local datastore, and 412 Precondition Failed is returned.
  • Success with HEAD method returns early with 200 OK without doing any additional IO

return true
}
}
return false
Copy link
Contributor

@Jorropo Jorropo Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have no happy path for a GET request so it will exit by that point.

In other words.
Let's assume I only store the root block of a file.
You first stat the root block, no error, request is not HEAD and so you return false, and the request continue as usual.
However when the following code tries to actually send the unixfs file, they will hit the network to fetch the childs.

It's not complient with my interpretation of the gateway spec:

if the gateway already has the data

Well we don't know at that point, we know we have the root but more ? Who knows ?

I think there is two solutions here:

  1. For now, the spec is just making things up until it matches whatever our code do.
    You can add a SHOULD (RFC2119) to the spec sentence accepting that cheap checks might be better.
  2. At that point handleOnlyIfCached return a dagstore object that the rest of the code uses to fetch the data.
    So the rest of the code tries to serve the request as usual but using an offline dagstore.

Sadly I think we cannot do the second option (even tho I like it more) because we cannot send an HTTP error after starting to send the payload (why are we using HTTP again ? browsers .... 😞).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jorropo thanks for noting this.

I've made a conscious decision to only check root block and use block stat --offline for everything (even DAG requests) instead of dag stat --offline as the latter could act as inexpensive DoS vecor (dag stat on Wikipedia DAG does not sound fun, even if all blocks are in the datastore).

The motivation for only-if-cached is to identify gateways which don't have any part of a DAG cached, and prioritize ones that do.

In that spirit, I agree that the spec should provide guidance to implementers, who will have to make a similar decision as I did (or implement custom heuristic/index)

Opened ipfs/specs#297 to clarify spec, but let's merge this PR as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidel I think that the performance gain block stat --offline can be worth it. I'm ok with this just wanted the spec to be in accordance.

I didn't thought you would dag stat --offline
But rather do dagService = offlineBlockservice

So the directory index code (or file one if that a file, ...) would run in offline mode. It wouldn't be a DOS vector since it's really the same thing as usual but without networking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened ipfs/specs#297 to clarify spec, but let's merge this PR as-is.

🥳

Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (need specs fixups but thoses are open)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Gateway: fast check if CID is in local datastore cache (only-if-cached)
2 participants