Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-332: Streaming Error Handling on Web Gateways #332

Closed
wants to merge 11 commits into from
66 changes: 66 additions & 0 deletions IPIP/0000-gateway-error-handling.md
@@ -0,0 +1,66 @@
# IPIP 0000: Streaming Error Handling in HTTP Gateways

- Start Date: 2022-10-12
- Related Issues:
- [ipfs/kubo/pull/9333](https://github.com/ipfs/kubo/pull/9333)
- [mdn/browser-compat-data/issues/14703](https://github.com/mdn/browser-compat-data/issues/14703)

## Summary

Ensure streaming error handling in web gateways is clear and consistent.

## Motivation

Web gateways provide different functionalities where users can download files.
The download of this files is streamed from the server to the client using HTTP.
However, there is no good way of presenting to the client an error that happens
during the stream.

For example, if during the download of a TAR file, the server detects some error
and is not able to continue, the user can get a valid, yet incomplete TAR. However,
the user will not know that the TAR is incomplete. By introducing consistent error
handling, the server attempts to notify the user.

## Detailed design

If the server encounters an error before streaming the contents to the client,
the server must fail with the respective `4xx` or `5xx` HTTP status code (no change).

If the server encounters an error while streaming the contents, the server must
force-close the HTTP connection to the user. This way, the user will receive a
hacdias marked this conversation as resolved.
Show resolved Hide resolved
network error, making it clear that the downloaded file is not valid.

## Test fixtures

There are no relevant test fixures for this IPIP.

## Design rationale

Before starting to stream the body of the response, the server is able to set
an HTTP status code for the error. However, after the HTTP headers are set
and the body started being streamed, there are no clear ways in the HTTP
specification to show an error. Since the gateway is browser-first, it is
important to show an error and avoid users receiving an incomplete file.
Therefore, the server can force-close the HTTP connection, leading to a network
error. This tells the user that an error happened.
Jorropo marked this conversation as resolved.
Show resolved Hide resolved

### User benefit

The user will know that an error happened while receiving the file. Otherwise,
the user might receive incomplete, but still valid, files that could be mistaken
but the real file.

### Compatibility

This RFC is backwards compatible.

### Alternatives

Using [`Trailer`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Trailer) HTTP headers
was considered. However, trailer headers are [not supported in browsers](https://github.com/mdn/browser-compat-data/issues/14703).
In addition, even if trailer headers were supported in browsers, there is no clear
standard for which header would be used to indicate errors.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
29 changes: 19 additions & 10 deletions http-gateways/PATH_GATEWAY.md
@@ -1,6 +1,6 @@
# Path Gateway Specification

![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square)
![Status: Work In Progress](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square)

**Authors**:

Expand Down Expand Up @@ -83,6 +83,7 @@ where client prefers to perform all validation locally.
- [Best practices for HTTP caching](#best-practices-for-http-caching)
- [Denylists](#denylists)
- [Generated HTML with directory index](#generated-html-with-directory-index)
- [Streaming errors](#streaming-errors)

# HTTP API

Expand Down Expand Up @@ -194,7 +195,6 @@ blocks.
Gateway implementations SHOULD be smart enough to require only the minimal DAG subset
necessary for handling the range request.


NOTE: for more advanced use cases such as partial DAG/CAR streaming, or
non-UnixFS data structures, see the `selector` query parameter
[proposal](https://github.com/ipfs/go-ipfs/issues/8769).
Expand Down Expand Up @@ -256,7 +256,6 @@ for more details.
- This is a powerful primitive that allows for fetching subsets of data in specific order, either as raw bytes, or a CAR stream. Think “HTTP range requests”, but for IPLD, and more powerful.
-->


# HTTP Response

## Response Status Codes
Expand Down Expand Up @@ -372,7 +371,6 @@ and CDNs, implementations should base it on both CID and response type:
should be based on requested range in addition to CID and response format:
`Etag: "bafy..foo.0-42`


### `Cache-Control` (response header)

Used for HTTP caching.
Expand Down Expand Up @@ -433,6 +431,7 @@ or optional [`filename`](#filename-request-query-parameter) parameter)
and magic bytes to improve the utility of produced responses.

For example:

- detect plain text file
and return `Content-Type: text/plain` instead of `application/octet-stream`
- detect SVG image
Expand All @@ -446,6 +445,7 @@ Returned when `download`, `filename` query parameter, or a custom response
The first parameter passed in this header indicates if content should be
displayed `inline` by the browser, or sent as an `attachment` that opens the
“Save As” dialog:

- `Content-Disposition: inline` is the default, returned when request was made
with `download=false` or a custom `filename` was provided with the request
without any explicit `download` parameter.
Expand All @@ -457,13 +457,14 @@ The remainder is an optional `filename` parameter that will be prefilled in the

NOTE: when the `filename` includes non-ASCII characters, the header must
include both ASCII and UTF-8 representations for compatibility with legacy user
agents and existing web browsers.
agents and existing web browsers.

To illustrate, `?filename=testтест.pdf` should produce:
`Content-Disposition inline; filename="test____.jpg"; filename*=UTF-8''test%D1%82%D0%B5%D1%81%D1%82.jpg`
- ASCII representation must have non-ASCII characters replaced with `_`
- UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)).
- NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2)

- ASCII representation must have non-ASCII characters replaced with `_`
- UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)).
- NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2)

`Content-Disposition` must be also set when a binary response format was requested:

Expand Down Expand Up @@ -510,8 +511,9 @@ This header is more widely used in [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md

Gateway MUST return a redirect when a valid UnixFS directory was requested
without the trailing `/`, for example:

- response for `https://ipfs.io/ipns/en.wikipedia-on-ipfs.org/wiki`
(no trailing slash) will be HTTP 301 redirect with
(no trailing slash) will be HTTP 301 redirect with
`Location: /ipns/en.wikipedia-on-ipfs.org/wiki/`

### `X-Ipfs-Path` (response header)
Expand Down Expand Up @@ -614,7 +616,7 @@ IPLD data, starting from that data which the CID identified.
**Note:** Other types of gateway may allow for passing CID by other means, such
as `Host` header, changing the rules behind path splitting.
(See [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md)
and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)).
and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)).

### Traversing remaining path

Expand All @@ -628,6 +630,7 @@ low level logical pathing from IPLD:
### Handling traversal errors

Gateway MUST respond with HTTP error when it is not possible to traverse the requested content path:

- [`404 Not Found`](#404-not-found) should be returned when the root CID is valid and traversable, but
the DAG it represents does not include content path remainder.
- Error response body should indicate which part of immutable content path (`/ipfs/{cid}/path/to/file`) is missing
Expand Down Expand Up @@ -655,6 +658,7 @@ Implementations are encouraged to support pluggable denylists to allow IPFS
node operators to opt into not hosting previously flagged content.

Gateway MUST respond with HTTP error when requested CID is on any of active denylists:

- [410 Gone](#410-gone) returned when CID is denied for non-legal reasons, or when the exact reason is unknown
- [451 Unavailable For Legal Reasons](#451-unavailable-for-legal-reasons) returned when denylist indicates that content was blocked on legal basis

Expand Down Expand Up @@ -694,3 +698,8 @@ The usual optimizations involve:
limiting the cost of a single page load.
- The downside of this approach is that it will always be slower than
skipping child block resolution.

## Streaming errors

Gateways MUST force-close HTTP connections if they detect an error while
streaming a file to avoid that users receive incomplete, yet valid, files.