Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway: CAR handler shouldn't return 200 & a CAR header if data is unavailable #458

Open
rvagg opened this issue Aug 30, 2023 · 4 comments
Labels
effort/days Estimated to take multiple days, but less than a week help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/gateway Issues related to HTTP Gateway

Comments

@rvagg
Copy link
Member

rvagg commented Aug 30, 2023

Background

Consider the straightforward case that a block is unfetchable by the block source—this is a somewhat common occurrence for Boost, where there may be a disconnect between advertised CIDs and those that are available unsealed. So requests may come in for CIDs that are supposed to be there, but the block fetcher returns an error that the block is unavailable.

The Boxo gateway code will return a 200, a set of valid headers, and a CARv1 header with the requested CID in the roots array, but nothing else, and termination of the stream is clean. There is no indication of a problem without parsing the CAR. Of course in the Trustless Gateway paradigm, it's up to the user to validate that the CAR contains the expected blocks, so from that perspective we have what we need to determine whether there's a problem or not. But this does present complications for debugging problems. In particular when debugging Rhea retrieval problems I need to have access to Boost logs on the server side to see what the problem might be, I have no indication from the outside that Boost even thinks that there's a problem.

Desired behaviour

While the spec doesn't cover this, here's what I think should happen and how we built the Lassie HTTP handler:

  1. Wait until there is at least one block to return before setting any headers or a CAR header; only when there is at least one block should we start sending data.
  2. If an error occurs before we get a single block, we return an error response
    a. If we can't get any candidates from the indexer then we can do a 404 with the body no candidates found
    b. Other failures are treated as a "gateway timeout", 504, with the body failed to fetch CID: <error message>.

Code exploration

  • handler#serveCar starts setting headers immediately, with no opportunity to change course if there's a problem:
    setContentDispositionHeader(w, name, "attachment")
  • It defers to BlocksBackend#GetCAR to return a Reader for the CAR and simply does an io.Copy of that data to the output body.
  • BlocksBackend#GetCAR immediately sets up a pipe and starts writing a CAR:
    r, w := io.Pipe()
    go func() {
    cw, err := storage.NewWritable(
    w,
    []cid.Cid{pathMetadata.LastSegment.Cid()},
    car.WriteAsCarV1(true),
    car.AllowDuplicatePuts(params.Duplicates.Bool()),
    )
  • The use of storage.NewWritable with a root will immediately write a CARv1 header to the Writer it's given.

Hence with a valid trustless gateway /ipfs request, we will always get a valid CARv1 header with the root we request, regardless of whether the requested root CID is even fetchable.

In Lassie, we deal with this in two ways, both encapsulated in DeferredWriter which is compatible with github.com/ipld/go-ipld-prime/storage/WritableStorage, like what the github.com/ipld/go-car/v2/storage/CarWriter is. https://github.com/filecoin-project/lassie/blob/main/pkg/storage/deferredcarwriter.go

  1. Don't set up the CAR writer until we have our first Put operation
  2. Provide an OnPut event listener that lets us watch for the first put and set the headers in expectation of a CAR with at least one block.
@rvagg rvagg added the need/triage Needs initial labeling and prioritization label Aug 30, 2023
@aschmahmann
Copy link
Contributor

Wait until there is at least one block to return before setting any headers or a CAR header; only when there is at least one block should we start sending data.

This seems fine. It's some added complexity to give a status code error in one particular case where almost any other case would result in a streaming error. I suspect the same applies to the other functions where we return streamable content from the backend (e.g. Get and GetAll). It's just about how much buffering is expected to be done (and whether that's by the backends or the gateway implementation).

I don't have a strong preference for if the added complexity shows up in the gateway or the backends as long as any modification to the contracts between the components is explicitly defined on the interface.

a. If we can't get any candidates from the indexer then we can do a 404 with the body no candidates found

That's not what the spec indicates 404s are for https://specs.ipfs.tech/http-gateways/path-gateway/#404-not-found so I don't see boxo having that behavior unless there's a change to the spec.

@rvagg
Copy link
Member Author

rvagg commented Aug 30, 2023

I'm not arguing for buffering, there shouldn't be any need to hold bytes up for what I'm suggesting—just don't start anything until you know you can continue with more than just a header.

a. If we can't get any candidates from the indexer then we can do a 404 with the body no candidates found

That's not what the spec indicates 404s

It is in the Lassie case:

missing DAG node

Because it is missing, we have no way of finding it.

I'm not suggesting that the same be done here, just explaining what we're doing.

rvagg added a commit to ipld/frisbii that referenced this issue Sep 2, 2023
rvagg added a commit to ipld/frisbii that referenced this issue Sep 2, 2023
rvagg added a commit to ipld/frisbii that referenced this issue Sep 2, 2023
rvagg added a commit to ipld/frisbii that referenced this issue Sep 2, 2023
rvagg added a commit to ipld/frisbii that referenced this issue Sep 4, 2023
rvagg added a commit to ipld/frisbii that referenced this issue Sep 4, 2023
@lidel
Copy link
Member

lidel commented Sep 4, 2023

iiuc the gist here is to delay sending HTTP status code until the first block is retrieved from the backend (which may be a remote HTTP server, like in Rhea/Saturn).

Waiting for the first block and returning HTTP 5XX on error/missing sounds sensible, but

  • this might increase TTFB for CAR responses in Rhea (I assume we are ok with that), but should not impact TTLB
  • it will be tricky to do with the current IPFSBackend API and how we use it in handler_car.go

After all the Rhea refactors in bogo/gateway we haveIPFSBackend.GetCAR that returns the CAR stream as io.ReadCloser or an error , and handler_car.go acts on it and sets headers etc.

If we want a surgical fix, we would keep GetCAR intact and contain delay logic in handler_car.go. That will save work by avoiding breaking and refactoring two GetCAR implementations: block backend used in kubo here and bifrost-gateway's graph backend here.

a. If we can't get any candidates from the indexer then we can do a 404 with the body no candidates found

nit: on IPFS gateways this produces 502 not 404. PL's Grafana for ipfs.io/Rhea also expects 502 not 404 for routing errors (semantically, the fact that HTTP server at cid.contact can't find providers does not make the content not exist on the planet).

I've filled ipfs/specs#435 to fill the gap is gateway specs around this.

@lidel lidel added kind/enhancement A net-new feature or improvement to an existing feature need/author-input Needs input from the original author topic/gateway Issues related to HTTP Gateway and removed need/triage Needs initial labeling and prioritization labels Sep 18, 2023
@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@lidel lidel added P2 Medium: Good to have, but can wait until someone steps up help wanted Seeking public contribution on this issue effort/days Estimated to take multiple days, but less than a week and removed need/author-input Needs input from the original author kind/stale labels Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/gateway Issues related to HTTP Gateway
Projects
None yet
Development

No branches or pull requests

3 participants