Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-288: TAR Gateway Response Format #288

Merged
merged 29 commits into from Nov 9, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
6be7dd1
docs: add TAR format
hacdias Jun 10, 2022
e7a3572
refactor: cleanup markdown
hacdias Jul 15, 2022
3bdec5f
refactor: cleanup markdown
hacdias Jul 15, 2022
992a0f3
fix: grammar errors
hacdias Jul 22, 2022
60f6891
ipip(tar): apply suggestions from review
lidel Sep 29, 2022
34d51a4
Merge branch 'main' into feat/gateway-tar
hacdias Oct 3, 2022
623b905
ipip(tar): add security notice
hacdias Oct 3, 2022
20ad7dc
ipip(tar): update security concerns to include .car
hacdias Oct 6, 2022
b0efa1f
ipip(tar): update security to include CAR
hacdias Oct 6, 2022
4a7138d
ipip(tar): add error info
hacdias Oct 6, 2022
bff09c4
ipip(tar): remove .car, update error
hacdias Oct 7, 2022
8ea9543
cleanup summary and motivation
hacdias Oct 7, 2022
aff0fb2
add fixture info
hacdias Oct 7, 2022
a29735a
cleanup security
hacdias Oct 7, 2022
c1ebab6
ipip(tar): update wording
hacdias Oct 10, 2022
a00ed4d
ipip(tar): add info about root file/dir
hacdias Oct 10, 2022
46aca5a
Update IPIP/0000-gateway-tar-response-format.md
hacdias Oct 10, 2022
64db0f0
improve test fixtures
hacdias Oct 12, 2022
354719f
Merge branch 'main' into feat/gateway-tar
hacdias Oct 12, 2022
75dc76d
lint path gateway
hacdias Oct 12, 2022
49053ec
fix lint
hacdias Oct 12, 2022
dbd656c
update title
hacdias Oct 12, 2022
3766a6b
ipip(tar): editorial tweaks
lidel Oct 13, 2022
e3bc88b
rfc --> ipip
hacdias Oct 18, 2022
101adf2
must -> should
hacdias Oct 18, 2022
26fb3be
add TAR to response payload
hacdias Oct 19, 2022
8483a3b
Merge branch 'main' into feat/gateway-tar
hacdias Oct 20, 2022
eea310a
Merge branch 'main' into feat/gateway-tar
hacdias Nov 7, 2022
8fe745a
chore: editorial fixes
lidel Nov 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
90 changes: 90 additions & 0 deletions IPIP/0000-gateway-tar-response-format.md
@@ -0,0 +1,90 @@
# IPIP 0000: Gateway TAR Response Format

- Start Date: 2022-06-10
- Related Issues:
- [ipfs/specs/pull/288](https://github.com/ipfs/specs/pull/288)
- [ipfs/go-ipfs/pull/9029](https://github.com/ipfs/go-ipfs/pull/9029)
- [ipfs/go-ipfs/pull/9034](https://github.com/ipfs/go-ipfs/pull/9034)

## Summary

Add TAR response format to the [HTTP Gateway](../http-gateways/).

## Motivation

Currently, the HTTP Gateway only allows the download of single files, or
CAR archives. However, CAR files are sometimes not necessary and users may
want to download entire directories.

An example use case is for the IPFS Web UI, which currently allows users to
download directories using a workaround. This workaround works via an API
that only supports `POST` requests and the Web UI has to store the entire
directory in memory before the user can download it. By introducing TAR files
on the HTTP Gateway, we improve the way of downloading entire directories.

## Detailed design

The solution is to allow the Gateway to support producing TAR archives
by requesting them using either the `Accept` HTTP header or the `format`
URL query.

## Test fixtures

Existing `curl` and `tar` tools can be used by implementers for testing.

Providing static test vectors has little value here, as different TAR libraries may produce
different byte-to-byte files due to unspecified ordering of files and directories inside.
However, there are relevant fixtures for testing certain behaviors. These are
referred by their CID on the following sections.

## Design rationale

The current gateway already supports different response formats via the
`Accept` HTTP header and the `format` URL query. This RFC proposes adding
one more supported format to that list.

### User benefit

Users will be able to directly download UnixFs directories from the gateway. In the Web UI,
for example, we will be able to create a direct link to download the file, instead of using the
API to put the whole file in memory before downloading it, saving resources and avoiding bugs.

CLI users will be able to download a directory with existing tools like `curl` and `tar`.
hacdias marked this conversation as resolved.
Show resolved Hide resolved
hacdias marked this conversation as resolved.
Show resolved Hide resolved

### Compatibility

This RFC is backwards compatible.

### Security

Manually created UnixFS DAGs can be turned into malicious TAR files. For example,
if a UnixFS directory contains a file that points at a relative path outside of
its root, the unpacking of the TAR file may overwrite local files.

In order to prevent this, if the UnixFS directory contains a file whose path
points outside of the root, the TAR file download **must** fail by force-closing
the HTTP connection, leading to a network error.

To test this, we provide two car files:

* ✔ [bafybeibfevfxlvxp5vxobr5oapczpf7resxnleb7tkqmdorc4gl5cdva3y](https://dweb.link/ipfs/bafybeibfevfxlvxp5vxobr5oapczpf7resxnleb7tkqmdorc4gl5cdva3y) is a UnixFS
DAG that contains a file with a relative path that points inside the root directory.
Downloading it as a TAR must work.
* ✘ [bafybeicaj7kvxpcv4neaqzwhrqqmdstu4dhrwfpknrgebq6nzcecfucvyu](https://dweb.link/ipfs/bafybeicaj7kvxpcv4neaqzwhrqqmdstu4dhrwfpknrgebq6nzcecfucvyu) is a UnixFS
DAG that contains a file with a relative path that points outside the root directory.
Downloading it as a TAR must error.

The user should be suggested to use a CAR file if they want to download the raw files.

### Alternatives
hacdias marked this conversation as resolved.
Show resolved Hide resolved

One discussed alternative would be to support uncompressed ZIP files. However, TAR and
TAR-related libraries are already supported and implemented for UnixFS files. Therefore,
the addition of a TAR response format is facilitated.

In addition, we considered supporting [Gzipped TAR](https://github.com/ipfs/go-ipfs/pull/9034).
However, there it may be a vector for DOS attacks since compression requires high CPU power.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
13 changes: 9 additions & 4 deletions http-gateways/PATH_GATEWAY.md
Expand Up @@ -181,6 +181,7 @@ For example:

- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable raw [block](https://docs.ipfs.io/concepts/glossary/#block) to be returned
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable [CAR](https://docs.ipfs.io/concepts/glossary/#car) stream to be returned
- [application/x-tar](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) – returns UnixFS file or a directory as a [TAR](https://en.wikipedia.org/wiki/Tar_(computing)) stream. At the root of the TAR archive, a file or directory, with the CID of the content, is present. Produces 400 Bad Request for content that is not UnixFS.
<!-- TODO: https://github.com/ipfs/go-ipfs/issues/8823
- application/vnd.ipld.dag-json OR application/json – requests IPLD Data Model representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/)
- application/vnd.ipld.dag-cbor OR application/cbor - requests IPLD Data Model representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/)
Expand Down Expand Up @@ -250,6 +251,8 @@ This is a URL-friendly alternative to sending
`Accept: application/vnd.ipld.<format>` header, see [`Accept`](#accept-request-header)
for more details.

In case of `Accept: application/x-tar`, the `?format=` equivalent is `tar`.

<!-- TODO Planned: https://github.com/ipfs/go-ipfs/issues/8769
- `selector=<cid>` can be used for passing a CID with [IPLD selector](https://ipld.io/specs/selectors)
- Selector should be in dag-json or dag-cbor format
Expand Down Expand Up @@ -354,7 +357,7 @@ and CDNs, implementations should base it on both CID and response type:

- By default, etag should be based on requested CID. Example: `Etag: "bafy…foo"`

- If a custom `format` was requested (such as a raw block or a CAR), the
- If a custom `format` was requested (such as a raw block, CAR), the
returned etag should be modified to include it. It could be a suffix.
- Example: `Etag: "bafy…foo.raw"`

Expand All @@ -366,7 +369,9 @@ and CDNs, implementations should base it on both CID and response type:

- When a gateway can’t guarantee byte-for-byte identical responses, a “weak”
etag should be used. For example, if CAR is streamed, and blocks arrive in
non-deterministic order, the response should have `Etag: W/"bafy…foo.car"`
non-deterministic order, the response should have `Etag: W/"bafy…foo.car"`.
If TAR is generated by traversing an UnixFS directory in non-deterministic
order, the response should have `Etag: W/"bafy…foo.tar"`.

- When responding to [`Range`](#range-request-header) request, a strong `Etag`
should be based on requested range in addition to CID and response format:
Expand Down Expand Up @@ -457,7 +462,7 @@ The remainder is an optional `filename` parameter that will be prefilled in the

NOTE: when the `filename` includes non-ASCII characters, the header must
include both ASCII and UTF-8 representations for compatibility with legacy user
agents and existing web browsers.
agents and existing web browsers.

To illustrate, `?filename=testтест.pdf` should produce:
`Content-Disposition inline; filename="test____.jpg"; filename*=UTF-8''test%D1%82%D0%B5%D1%81%D1%82.jpg`
Expand Down Expand Up @@ -614,7 +619,7 @@ IPLD data, starting from that data which the CID identified.
**Note:** Other types of gateway may allow for passing CID by other means, such
as `Host` header, changing the rules behind path splitting.
(See [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md)
and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)).
and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)).

### Traversing remaining path

Expand Down