Verify HTTP Car Requests #195

hannahhoward · 2023-04-21T13:09:45Z

Goals

Building on #193 , add verification of CAR blocks as we download.

Implementation

My recommendation for implementation:

open a linear reader on the incoming CAR file from the response body
start a selector traversal based on the parameters of the request
- for the block read opener:
  - each time it is called, read the next block from the CARv1 stream (obviously a blocking read if not data is not yet available from the HTTP response)
  - hash it, verify the bytes match the CID passed to the block read opener
    - if matched, write the block in the into the outgoing HTTP CarV1 from Lassie and return the bytes as a reader to the selector traversal (TeeReader probably works well here)
    - if not matched, error

rvagg · 2023-04-24T06:07:05Z

The way I imagined this being done is similar to how I did the go-car stdin extractor: https://github.com/ipld/go-car/blob/3476971d97cd992991ade75087d9273adcf659e6/cmd/car/extract.go#L378-L444

But the big catch with this approach is buffering.

Currently that code doesn't delete used blocks, I don't think it can because I think it's possible for a Get() to be called for the same CID during a UnixFS unpack; but in our case I think we can ditch blocks as we've read & checked them.
Out-of-order blocks are a big problem that we need to figure out how to deal with. If an upstream provider is giving us blocks in the wrong order, we either need to reject it as "malformed" (I'd love to be able to do this but I can imagine resistance to such an approach), or we need to buffer the blocks until we get the next one we expect. If we allow out-of-order responses then we can reformat the CAR we're sending out to the user based on our own traversal logic so they're always well-formed regardless of the upstream provider, but there's an OOM risk involved in doing this.
What's the current status of discussion on well-formedness of HTTP CARs @willscott? I was hoping that we'd get to a spec with some really clear rules and associated strictness around ordering such that we could even have fixtures that resolve to output CARs that are byte-for-byte perfect matches. It's one of the reasons I was working on feat(test): file/dir name generator that's more realistic #185, to get to some nice complex but shareable fixtures for the various cases.

willscott · 2023-04-24T06:18:09Z

Ideally this should use the same validation that the bifrost-gateway uses.
I think that's happening in ipfs/bifrost-gateway#75

willscott · 2023-04-26T15:52:02Z

coming back to elaborate on 2/ 3:

My understanding of the intention of the trustless gateway spec is to require a strict traversal ordering. That should remove potential for out of order blocks / should allow for deterministic / byte-for-byte output matches.

aschmahmann · 2023-05-02T20:16:01Z

I think that's happening in ipfs/bifrost-gateway#75

I'm not planning to handle this in that PR, but that'd be a follow up (see ipfs/bifrost-gateway#62, it's currently listed as step 4 but maybe 5 will end up coming first).

My understanding of the intention of the trustless gateway spec is to require a strict traversal ordering. That should remove potential for out of order blocks

This should make verification simpler to implement. idk if users should be allowed to ask for more data + hope gzip saves them in order to reduce memory usage from buffering as Rod alluded to. Something to figure out in the spec PR for what people want here.

should allow for deterministic / byte-for-byte output matches.

That's not necessarily true, there are a few more things that might need to be specified beyond traversal order see ipfs/specs#402 (comment) and the related thread.

hannahhoward mentioned this issue Apr 21, 2023

Add an HTTP retrieval protocol #193

Merged

4 tasks

hannahhoward assigned rvagg Apr 21, 2023

rvagg mentioned this issue Apr 26, 2023

HTTP protocol #204

Merged

hannahhoward closed this as completed May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify HTTP Car Requests #195

Verify HTTP Car Requests #195

hannahhoward commented Apr 21, 2023

rvagg commented Apr 24, 2023

willscott commented Apr 24, 2023

willscott commented Apr 26, 2023

aschmahmann commented May 2, 2023

Verify HTTP Car Requests #195

Verify HTTP Car Requests #195

Comments

hannahhoward commented Apr 21, 2023

Goals

Implementation

rvagg commented Apr 24, 2023

willscott commented Apr 24, 2023

willscott commented Apr 26, 2023

aschmahmann commented May 2, 2023