Skip to content

Commit

Permalink
ipip-0402: car-scope=file → dag-scope=entity & bytes → entity-bytes
Browse files Browse the repository at this point in the history
This change incorporates feedback from Adin, Rod and Juan:

- bytes: #402 (review)
- car-scope: #402 (comment)

I really hope these names will be good enough, but I am running on
artisan, recycled electrons so can do this all day :-)
  • Loading branch information
lidel committed Jul 6, 2023
1 parent 278a277 commit faf4a0b
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 54 deletions.
30 changes: 4 additions & 26 deletions src/http-gateways/path-gateway.md
Expand Up @@ -214,35 +214,13 @@ These are the equivalents:
- `format=cbor``Accept: application/cbor`
- `format=ipns-record``Accept: application/vnd.ipfs.ipns-record`

## Query Parameters for CAR Requests
### `dag-scope` (request query parameter)

The following query parameters are only available for requests made with either a `format=car` query parameter or an `Accept: application/vnd.ipld.car` request header. These parameters modify shape of the IPLD graph returned within the car file.
Only used on CAR requests, same as [dag-scope](/http-gateways/trustless-gateway/#dag-scope-request-query-parameter) from :cite[trustless-gateway]

### `car-scope` (request query parameter)
### `entity-bytes` (request query parameter)

Optional, `car-scope=(block|file|all)` with default value 'all', describes the shape of the dag fetched the terminus of the specified path whose blocks are included in the returned CAR file after the blocks required to traverse path segments.

`block` - Only the root block at the end of the path is returned After blocks required to verify the specified path segments.

`file` - For queries that traverse UnixFS data, `file` roughly means return blocks needed to verify the end of the path as a filesystem entity. In other words, all the blocks needed to 'cat' a UnixFS file at the end of the specified path, or to 'ls' a UnixFS directory at the end of the specified path. For all queries that do not reference non-UnixFS data, `file` is equivalent to `block`

`all` - Transmit the entire contiguous DAG that begins at the end of the path query, after blocks required to verify path segments

### `bytes` (request query parameter)

Optional, `bytes=x:y` with default value `0:*`. When the entity at the end of the specified path can be intepreted as a contingous array of bytes (such as a UnixFS file), returns only the blocks required to verify the specified byte range of said entity. Put another way, the `bytes` parameters can serve as a trustless form of an HTTP range request. If the entity at the end of the path cannot be interpreted as a continguous array of bytes (such as a CBOR/JSON map), this parameter has no effect. Allowed values for `x` and `y` are positive integers where y >= x, which limit the return blocks to needed to satify the range [x, y]. In addition the following additional values are permitted:

- `*` can be substituted for end-of-file
- `?bytes=0:*` is the entire file (i.e. to fulfill HTTP Range Request `x-` requests)
- Negative numbers can be used for referring to bytes from the end of a file
- `?bytes=-1024:*` is the last 1024 bytes of a file (i.e. to fulfill HTTP Range Request `-y` requests)
- It is also permissible (unlike with HTTP Range Requests) to ask for the range of 500 bytes from the beginning of the file to 1000 bytes from the end by `?bytes=499:-1000`

<!-- TODO Planned: https://github.com/ipfs/go-ipfs/issues/8769
- `selector=<cid>` can be used for passing a CID with [IPLD selector](https://ipld.io/specs/selectors)
- Selector should be in dag-json or dag-cbor format
- This is a powerful primitive that allows for fetching subsets of data in specific order, either as raw bytes, or a CAR stream. Think “HTTP range requests”, but for IPLD, and more powerful.
-->
Only used on CAR requests, same as [entity-bytes](/http-gateways/trustless-gateway/#entity-bytes-request-query-parameter) from :cite[trustless-gateway]

# HTTP Response

Expand Down
59 changes: 58 additions & 1 deletion src/http-gateways/trustless-gateway.md
Expand Up @@ -59,7 +59,7 @@ Same as GET, but does not return any payload.

Same as in :cite[path-gateway], but with limited number of supported response types.

## HTTP Request Headers
## Request Headers

### `Accept` (request header)

Expand All @@ -75,6 +75,63 @@ Below response types SHOULD to be supported:
Gateway SHOULD return HTTP 400 Bad Request when running in strict trustless
mode (no deserialized responses) and `Accept` header is missing.

## Request Query Parameters

### :dfn[dag-scope] (request query parameter)

Optional, `dag-scope=(block|entity|all)` with default value `all`, only available for CAR requests.

Describes the shape of the DAG fetched the terminus of the specified path whose blocks
are included in the returned CAR file after the blocks required to traverse
path segments.

- `block` - Only the root block at the end of the path is returned after blocks
required to verify the specified path segments.

- `entity` - For queries that traverse UnixFS data, `entity` roughly means return
blocks needed to verify the terminating element of the requested content path.
For UnixFS, all the blocks needed to read an entire UnixFS file, or enumerate a UnixFS directory.
For all queries that reference non-UnixFS data, `entity` is equivalent to `block`

- `all` - Transmit the entire contiguous DAG that begins at the end of the path
query, after blocks required to verify path segments

When present, returned `Etag` must include unique prefix based on the passed scope type.

### :dfn[entity-bytes] (request query parameter)

Optional, `entity-bytes=from:to` with the default value `0:*`, only available for CAR requests.
Serves as a trustless form of an HTTP Range Request.

When the terminating entity at the end of the specified content path can be
interpreted as a continuous array of bytes (such as a UnixFS file), returns
only the minimal set of blocks required to verify the specified byte range of
said entity.

Allowed values for `from` and `to` are positive integers where `to` >= `from`, which
limit the return blocks to needed to satisfy the range `[from,to]`:

- `from` value gives the byte-offset of the first byte in a range.
- `to` value gives the byte-offset of the last byte in the range; that is,
the byte positions specified are inclusive. Byte offsets start at zero.

If the entity at the end of the path cannot be interpreted as a continuous
array of bytes (such as a DAG-CBOR/JSON map, or UnixFS directory), this
parameter has no effect.

The following additional values are supported:

- `*` can be substituted for end-of-file
- `entity-bytes=0:*` is the entire file (a verifiable version of HTTP request for `Range: 0-`)
- Negative numbers can be used for referring to bytes from the end of a file
- `entity-bytes=-1024:*` is the last 1024 bytes of a file
(verifiable version of HTTP request for `Range: -1024`)
- It is also permissible (unlike with HTTP Range Requests) to ask for the
range of 500 bytes from the beginning of the file to 1000 bytes from the
end: `entity-bytes=499:-1000`

When present, returned `Etag` must include unique prefix based on the passed range.

# HTTP Response

Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway].
Expand Down
68 changes: 41 additions & 27 deletions src/ipips/ipip-0402.md
Expand Up @@ -5,6 +5,10 @@ ipip: proposal
editors:
- name: Hannah Howard
github: hannahhoward
- name: Adin Schmahmann
github: aschmahmann
- name: Rod Vagg
github: rvagg
- name: Marcin Rataj
github: lidel
url: https://lidel.org/
Expand Down Expand Up @@ -39,11 +43,15 @@ Save round-trips, allow more efficient resume and parallel downloads.

The solution is to allow the :cite[trustless-gateway] to support partial
responses by:

- allowing for requesting sub-paths within a DAG, and getting blocks necessary
for traversing all path segments for end-to-end verification
- opt-in `car-scope` parameter that allows for narrowing down returned blocks
to a `block`, `file` (aka logical IPLD entity), or `all` (default)
- opt-in `bytes` parameter that allows for returning only a subset of blocks

- opt-in `dag-scope` parameter that allows for narrowing down returned blocks
to a `block`, `entity` (a logical IPLD entity, such as a file, directory,
CBOR document), or `all` (default)

- opt-in `entity-bytes` parameter that allows for returning only a subset of blocks
within a logical IPLD entity

Details are in :cite[trustless-gateway].
Expand All @@ -66,14 +74,15 @@ Terse rationale for each feature:
- The ability to narrow down CAR response based on logical scope or specific byte
range within an entity comes directly from the types of requests existing
path gateways need to handle.
- `car-scope=block` allows for resolving content paths to the final CID, and
- `dag-scope=block` allows for resolving content paths to the final CID, and
learn its type (unixfs file/directory, or a custom codec)
- `car-scope=file` covers the majority of website hosting needs (returning a
file, or enumerating directory contents)
- `car-scope=all` returns all blocks in a DAG: was the existing behavior and
- `dag-scope=entity` covers the majority of website hosting needs (returning a
file, enumerating directory contents, or any other IPLD entity)
- `dag-scope=all` returns all blocks in a DAG: was the existing behavior and
remains the implicit default
- `bytes=from:to` enables efficient, verifiable analog to HTTP Range Requests
- `entity-bytes=from:to` enables efficient, verifiable analog to HTTP Range Requests
(resuming downloads or seeking within bigger files, such as videos)
- `from` and `to` match the behavior of HTTP Range Requests.

### User benefit

Expand Down Expand Up @@ -121,7 +130,7 @@ introduce additional blocks required for verifying.
As long the client was written in a trustless manner, and follows ring and was discarding
unexpected blocks, this will be a backward-compatible change.

#### CAR format with `bytes` and `car-scope` parameters
#### CAR format with `entity-bytes` and `dag-scope` parameters

These parameters are opt-in, which means no breaking changes.

Expand Down Expand Up @@ -159,7 +168,7 @@ risks, and weak value proposition, as [discussed during IPFS Thing 2022](https:/
#### Additional "Web" Scope

A request for
`/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/?format=car&car-scope=file`
`/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/?format=car&dag-scope=entity`
returns all blocks required for enumeration of the big HAMT `/wiki` directory,
and then an additional request for `index.html` needs to be issued.

Expand All @@ -181,7 +190,7 @@ It is impossible to know if some entity on a sub-path is a file or a directory,
without sending a probe for the root block, which introduces one round-trip overhead
per entity.

This problem is not present in the case of `car-scope=file`, which shifts the
This problem is not present in the case of `dag-scope=entity`, which shifts the
decision to the server, and allows for fetching unknown UnixFS entity with a
single request.

Expand All @@ -197,7 +206,7 @@ The main utility of this scope is saving round-trips when retrieving a specific
entity as a member of a bigger DAG.

To test, request a small file that fits in a single block from a sub-path. The
returned CAR MUST include both the block with the `file` data and blocks
returned CAR MUST include both the block with the file data and all blocks
necessary for traversing from the root CID to the terminating element (all
parents, root CID and a subdirectory below it).

Expand All @@ -213,7 +222,7 @@ Fixtures:

:::

### Testing `car-scope=block`
### Testing `dag-scope=block`

The main utility of this scope is resolving content paths. This means a CAR
response with blocks related to path traversal, and the root block of the
Expand All @@ -227,13 +236,13 @@ Fixtures:

:::example

- TODO(gateway-conformance): `/ipfs/cid/parent/directory?format=car&car-scope=block` (UnixFS directory on a path)
- TODO(gateway-conformance): `/ipfs/cid/parent/directory?format=car&dag-scope=block` (UnixFS directory on a path)

- TODO(gateway-conformance): `/ipfs/cid/parent1/parent2/file?format=car&car-scope=block` (UnixFS file on a path)
- TODO(gateway-conformance): `/ipfs/cid/parent1/parent2/file?format=car&dag-scope=block` (UnixFS file on a path)

:::

### Testing `car-scope=file`
### Testing `dag-scope=entity`

The main utility of this scope is retrieving all blocks related to a meaningful
IPLD entity. Currently, the most popular entity types are:
Expand All @@ -252,48 +261,48 @@ Fixtures:

:::example

- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&car-scope=file`
- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&dag-scope=entity`
- Request a `chunked-dag-pb-file` (UnixFS file encoded with `dag-pb` with
more than one chunk). Returned blocks MUST be enough to deserialize the file.

- TODO(gateway-conformance): `/ipfs/cid/dag-cbor-with-link?format=car&car-scope=file`
- TODO(gateway-conformance): `/ipfs/cid/dag-cbor-with-link?format=car&dag-scope=entity`
- Request a `dag-cbor-with-link` (DAG-CBOR document with CBOR Tag 42 pointing
at a third-party CID). The response MUST include the terminating entity (DAG-CBOR)
and MUST NOT include the CID from the Tag 42 (IPLD Link).

- TODO(gateway-conformance): `/ipfs/cid/flat-directory/file?format=car&car-scope=file`
- TODO(gateway-conformance): `/ipfs/cid/flat-directory/file?format=car&dag-scope=entity`
- Request UnixFS `flat-directory`. The response MUST include the minimal set of
blocks required for enumeration of directory contents, and no blocks that
belong to child entities.

- TODO(gateway-conformance): `/ipfs/cid/hamt-directory/file?format=car&car-scope=file`
- TODO(gateway-conformance): `/ipfs/cid/hamt-directory/file?format=car&dag-scope=entity`
- Request UnixFS `hamt-directory`. The response MUST include the minimal set of
blocks required for enumeration of directory contents, and no blocks that
belong to child entities.

:::

### Testing `car-scope=all`
### Testing `dag-scope=all`

This is the implicit default used when `car-scope` is not present,
This is the implicit default used when `dag-scope` is not present,
and explicitly used in the context of proxy gateway supporting :cite[ipip-0288].

Fixtures:

:::example

- TODO(gateway-conformance): `/ipfs/cid-of-a-directory?format=car&car-scope=all`
- TODO(gateway-conformance): `/ipfs/cid-of-a-directory?format=car&dag-scope=all`
- Request a CID of UnixFS `directory` which contains two files. The response MUST
contain all blocks that can be accessed by recursively traversing all IPLD
Links from the root CID.

- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&car-scope=all`
- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&dag-scope=all`
- Request a CID of UnixFS `file` encoded with `dag-pb` codec and more than
one chunk. The response MUST contain blocks for all `file` chunks.

:::

### Testing `bytes=from:to`
### Testing `entity-bytes=from:to`

This type of CAR response is used for facilitating HTTP Range Requests and
byte seek within bigger entities.
Expand All @@ -302,20 +311,25 @@ byte seek within bigger entities.

Properly testing this type of response requires synthetic DAG that is only
partially retrievable. This ensures systems that perform internal caching
won't pass the test due to the entire DAG being cached.
won't pass the test due to the entire DAG being precached, or fetched in full.

:::

Use of the below fixture is highly recommended:

:::example

- TODO(gateway-conformance): `/ipfs/dag-pb-file?format=car&bytes=40000000000-40000000002`
- TODO(gateway-conformance): `/ipfs/dag-pb-file?format=car&entity-bytes=40000000000-40000000002`

- Request a byte range from the middle of a big UnixFS `file`. The response MUST
contain only the minimal set of blocks necessary for fullfilling the range
request.

- TODO(gateway-conformance): `/ipfs/10-bytes-cid?format=car&entity-bytes=4:-2`

- Request a byte range from the middle of a small file, to -2 bytes from the end.
- (TODO confirm we want keep this -- added since it was explicitly stated as a supported thing in path-gateway.md)

:::

### Copyright
Expand Down

0 comments on commit faf4a0b

Please sign in to comment.