Skip to content

Commit

Permalink
Merge pull request #402 from ipip-402
Browse files Browse the repository at this point in the history
IPIP-402: Partial CAR Support on Trustless Gateways
  • Loading branch information
lidel committed Jul 27, 2023
2 parents 17e46c6 + 917efb9 commit 1b9a954
Show file tree
Hide file tree
Showing 4 changed files with 595 additions and 23 deletions.
14 changes: 7 additions & 7 deletions ipip-template.md
Expand Up @@ -36,13 +36,6 @@ interoperable implementations.
When modifying an existing specification file, this section should provide a
summary of changes. When adding new specification files, list all of them.

## Test fixtures

List relevant CIDs. Describe how implementations can use them to determine
specification compliance. This section can be skipped if IPIP does not deal
with the way IPFS handles content-addressed data, or the modified specification
file already includes this information.

## Design rationale

The rationale fleshes out the specification by describing what motivated
Expand All @@ -67,6 +60,13 @@ Explain the security implications/considerations relevant to the proposed change

Describe alternate designs that were considered and related work.

## Test fixtures

List relevant CIDs. Describe how implementations can use them to determine
specification compliance. This section can be skipped if IPIP does not deal
with the way IPFS handles content-addressed data, or the modified specification
file already includes this information.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
19 changes: 13 additions & 6 deletions src/http-gateways/path-gateway.md
Expand Up @@ -21,6 +21,7 @@ editors:
url: https://hacdias.com/
xref:
- url
- trustless-gateway
tags: ['httpGateways', 'lowLevelHttpGateways']
order: 0
---
Expand Down Expand Up @@ -214,11 +215,13 @@ These are the equivalents:
- `format=cbor``Accept: application/cbor`
- `format=ipns-record``Accept: application/vnd.ipfs.ipns-record`

<!-- TODO Planned: https://github.com/ipfs/go-ipfs/issues/8769
- `selector=<cid>` can be used for passing a CID with [IPLD selector](https://ipld.io/specs/selectors)
- Selector should be in dag-json or dag-cbor format
- This is a powerful primitive that allows for fetching subsets of data in specific order, either as raw bytes, or a CAR stream. Think “HTTP range requests”, but for IPLD, and more powerful.
-->
### `dag-scope` (request query parameter)

Only used on CAR requests, same as :ref[dag-scope] from :cite[trustless-gateway].

### `entity-bytes` (request query parameter)

Only used on CAR requests, same as :ref[entity-bytes] from :cite[trustless-gateway].

# HTTP Response

Expand Down Expand Up @@ -592,7 +595,11 @@ The following response types require an explicit opt-in, can only be requested w
- Raw Block (`?format=raw`)
- Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw).
- CAR (`?format=car`)
- Arbitrary DAG as a verifiable CAR file or a stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car).
- A CAR file or a stream that contains all blocks required to trustlessly verify the requested content path query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) and :cite[trustless-gateway].
- **Note:** by default, block order in CAR response is not deterministic,
blocks can be returned in different order, depending on implementation
choices (traversal, speed at which blocks arrive from the network, etc).
An opt-in ordered CAR responses MAY be introduced in a future IPIP.
- TAR (`?format=tar`)
- Deserialized UnixFS files and directories as a TAR file or a stream, see :cite[ipip-0288].
- IPNS Record
Expand Down
190 changes: 180 additions & 10 deletions src/http-gateways/trustless-gateway.md
Expand Up @@ -4,7 +4,7 @@ description: >
Trustless Gateways are a minimal subset of Path Gateways that allow light IPFS
clients to retrieve data behind a CID and verify its integrity without delegating any
trust to the gateway itself.
date: 2023-03-30
date: 2023-06-20
maturity: reliable
editors:
- name: Marcin Rataj
Expand All @@ -17,25 +17,33 @@ tags: ['httpGateways', 'lowLevelHttpGateways']
order: 1
---

Trustless Gateway is a minimal _subset_ of :cite[path-gateway]
Trustless Gateway is a _subset_ of :cite[path-gateway]
that allows light IPFS clients to retrieve data behind a CID and verify its
integrity without delegating any trust to the gateway itself.

The minimal implementation means:

- data is requested by CID, only supported path is `/ipfs/{cid}`
- no path traversal or recursive resolution, no UnixFS/IPLD decoding server-side
- response type is always fully verifiable: client can decide between a raw block or a CAR stream
- no UnixFS/IPLD deserialization
- for CAR files:
- the behavior is identical to :cite[path-gateway]
- for raw blocks:
- data is requested by CID, only supported path is `/ipfs/{cid}`
- no path traversal or recursive resolution

# HTTP API

A subset of "HTTP API" of :cite[path-gateway].

## `GET /ipfs/{cid}[?{params}]`
## `GET /ipfs/{cid}[/{path}][?{params}]`

Downloads data at specified CID.
Downloads verifiable data for the specified **immutable** content path.

## `HEAD /ipfs/{cid}[?{params}]`
Optional `path` is permitted for requests that specify CAR format (`application/vnd.ipld.car`).

For RAW requests, only `GET /ipfs/{cid}[?{params}]` is supported.

## `HEAD /ipfs/{cid}[/{path}][?{params}]`

Same as GET, but does not return any payload.

Expand All @@ -45,13 +53,13 @@ Downloads data at specified IPNS Key. Verifiable :cite[ipns-record] can be reque

## `HEAD /ipns/{key}[?{params}]`

same as GET, but does not return any payload.
Same as GET, but does not return any payload.

# HTTP Request

Same as in :cite[path-gateway], but with limited number of supported response types.

## HTTP Request Headers
## Request Headers

### `Accept` (request header)

Expand All @@ -67,12 +75,174 @@ Below response types SHOULD to be supported:
Gateway SHOULD return HTTP 400 Bad Request when running in strict trustless
mode (no deserialized responses) and `Accept` header is missing.

## Request Query Parameters

### :dfn[dag-scope] (request query parameter)

Optional, `dag-scope=(block|entity|all)` with default value `all`, only available for CAR requests.

Describes the shape of the DAG fetched the terminus of the specified path whose blocks
are included in the returned CAR file after the blocks required to traverse
path segments.

- `block` - Only the root block at the end of the path is returned after blocks
required to verify the specified path segments.

- `entity` - For queries that traverse UnixFS data, `entity` roughly means return
blocks needed to verify the terminating element of the requested content path.
For UnixFS, all the blocks needed to read an entire UnixFS file, or enumerate a UnixFS directory.
For all queries that reference non-UnixFS data, `entity` is equivalent to `block`

- `all` - Transmit the entire contiguous DAG that begins at the end of the path
query, after blocks required to verify path segments

When present, returned `Etag` must include unique prefix based on the passed scope type.

### :dfn[entity-bytes] (request query parameter)

The optional `entity-bytes=from:to` parameter is available only for CAR
requests.

It implies `dag-scope=entity` and serves as a trustless equivalent of an HTTP
Range Request.

When the terminating entity at the end of the specified content path:

- can be interpreted as a continuous array of bytes (such as a UnixFS file), a
Gateway MUST return only the minimal set of blocks necessary to verify the
specified byte range of that entity.

- When dealing with a sharded UnixFS file (`dag-pb`, `0x70`) and a non-zero
`from` value, the UnixFS data and `blocksizes` determine the
corresponding starting block for a given `from` offset.

- cannot be interpreted as a continuous array of bytes (such as a DAG-CBOR/JSON
map or UnixFS directory), the parameter MUST be ignored, and the request is
equivalent to `dag-scope=entity`.

Allowed values for `from` and `to` follow a subset of section 14.1.2 from
:cite[rfc9110], where they are defined as offset integers that limit the
returned blocks to only those necessary to satisfy the range `[from,to]`:

- `from` value gives the byte-offset of the first byte in a range.
- `to` value gives the byte-offset of the last byte in the range;
that is, the byte positions specified are inclusive.

The following additional values are supported:

- `*` can be substituted for end-of-file
- `entity-bytes=0:*` is the entire file (a verifiable version of HTTP request for `Range: 0-`)
- Negative numbers can be used for referring to bytes from the end of a file
- `entity-bytes=-1024:*` is the last 1024 bytes of a file
(verifiable version of HTTP request for `Range: -1024`)
- It is also permissible (unlike with HTTP Range Requests) to ask for the
range of 500 bytes from the beginning of the file to 1000 bytes from the
end: `entity-bytes=499:-1000`

A Gateway MUST augment the returned `Etag` based on the passed `entity-bytes`.

A Gateway SHOULD return an HTTP 400 Bad Request error when the requested range
cannot be parsed as valid offset positions.

In more nuanced error scenarios, a Gateway MUST return a valid CAR response
that includes enough blocks for the client to understand why the requested
`entity-bytes` was incorrect or why only a part of the requested byte range was
returned:

- If the requested `entity-bytes` resolves to a range that partially falls
outside of the entity's byte range, the response MUST include the subset of
blocks within the entity's bytes.
- This allows clients to request valid ranges of the entity without needing
to know its total size beforehand, and it does not require the Gateway to
buffer the entire entity before returning the response.

- If the requested `entity-bytes` resolves to a zero-length range or falls
fully outside of the entity's bytes, the response is equivalent to
`dag-scope=block`.
- This allows client to produce a meaningful error (e.g, in case of UnixFS,
leverage `Data.blocksizes` information present in the root `dag-pb` block).

- In streaming scenarios, if a Gateway is capable of returning the root block
but lacks prior knowledge of the final component of the requested content
path being invalid or absent in the DAG, a Gateway SHOULD respond with HTTP 200.
- This behavior is a consequence of HTTP streaming limitations: blocks are
not buffered, by the time a related parent block is being parsed and
returned to the client, the HTTP status code has already been sent to the
client.

# HTTP Response

Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway].

## HTTP Response Headers
## Response Headers

### `Content-Type` (response header)

MUST be returned and include additional format-specific parameters when possible.

If a CAR stream was requested, the response MUST include the parameter specifying CAR version.
For example: `Content-Type: application/vnd.ipld.car; version=1`

### `Content-Disposition` (response header)

MUST be returned and set to `attachment` to ensure requested bytes are not rendered by a web browser.

## Response Payload

### Block Response

An opaque bytes matching the requested block CID
([application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)).

The Body hash MUST match the Multihash from the requested CID.

### CAR Response

A CAR stream for the requested
[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
content type, path and optional `dag-scope` and `entity-bytes` URL parameters.

#### CAR version

Value returned in
[`CarV1Header.version`](https://ipld.io/specs/transport/car/carv1/#header)
field MUST match the `version` parameter returned in `Content-Type` header.

#### CAR roots

The behavior associated with the
[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field
is not currently specified.

Clients MAY ignore it.

:::issue

As of 2023-06-20, the behavior of the `roots` CAR field remains an [unresolved item within the CARv1 specification](https://web.archive.org/web/20230328013837/https://ipld.io/specs/transport/car/carv1/#unresolved-items).

:::

#### CAR determinism

The default CAR header and block order in a CAR response is not specified and is non-deterministic.

Clients MUST NOT assume that CAR responses are deterministic (byte-for-byte identical) across different gateways.

Clients MUST NOT assume that CAR includes CIDs and their blocks in the same order across different gateways.

:::issue

In controlled environments, clients MAY choose to rely on undocumented CAR determinism,
subject to the agreement of the following conditions between the client and the
gateway:
- CAR version
- content of [`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field
- order of blocks
- status of duplicate blocks

In the future, there may be an introduction of a convention to indicate aspects
of determinism in CAR responses. Please refer to
[IPIP-412](https://github.com/ipfs/specs/pull/412) for potential developments
in this area.

:::

0 comments on commit 1b9a954

Please sign in to comment.