Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: serve CBOR encoded DAG nodes from the gateway #8037

Closed
wants to merge 3 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
63 changes: 51 additions & 12 deletions core/corehttp/gateway_handler.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
package corehttp

import (
"bytes"
"context"
"encoding/json"
"fmt"
"html/template"
"io"
Expand All @@ -21,6 +23,7 @@ import (
"github.com/ipfs/go-cid"
files "github.com/ipfs/go-ipfs-files"
assets "github.com/ipfs/go-ipfs/assets"
format "github.com/ipfs/go-ipld-format"
dag "github.com/ipfs/go-merkledag"
mfs "github.com/ipfs/go-mfs"
path "github.com/ipfs/go-path"
Expand Down Expand Up @@ -264,6 +267,16 @@ func (i *gatewayHandler) getOrHeadHandler(w http.ResponseWriter, r *http.Request
return
}

if resolvedPath.Cid().Prefix().Codec == cid.DagCBOR {
n, err := i.api.Dag().Get(r.Context(), resolvedPath.Cid())
if err != nil {
webError(w, "ipfs dag get "+escapedURLPath, err, http.StatusNotFound)
return
}
i.serveCBOR(w, r, n)
return
}

dr, err := i.api.Unixfs().Get(r.Context(), resolvedPath)
if err != nil {
webError(w, "ipfs cat "+escapedURLPath, err, http.StatusNotFound)
Expand Down Expand Up @@ -309,18 +322,9 @@ func (i *gatewayHandler) getOrHeadHandler(w http.ResponseWriter, r *http.Request
modtime = time.Unix(1, 0)
}

urlFilename := r.URL.Query().Get("filename")
var name string
if urlFilename != "" {
disposition := "inline"
if r.URL.Query().Get("download") == "true" {
disposition = "attachment"
}
utf8Name := url.PathEscape(urlFilename)
asciiName := url.PathEscape(onlyAscii.ReplaceAllLiteralString(urlFilename, "_"))
w.Header().Set("Content-Disposition", fmt.Sprintf("%s; filename=\"%s\"; filename*=UTF-8''%s", disposition, asciiName, utf8Name))
name = urlFilename
} else {
maybeSetContentDispositionHeader(r.URL.Query(), w.Header())
name := r.URL.Query().Get("filename")
if name == "" {
name = getFilename(urlPath)
}
i.serveFile(w, r, name, modtime, f)
Expand Down Expand Up @@ -523,6 +527,25 @@ func (i *gatewayHandler) serveFile(w http.ResponseWriter, req *http.Request, nam
http.ServeContent(w, req, name, modtime, content)
}

func (i *gatewayHandler) serveCBOR(w http.ResponseWriter, r *http.Request, n format.Node) {
w.Header().Set("Cache-Control", "public, max-age=29030400, immutable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid if DAG is fetched from anything other than /ipfs/ like /ipns/ (not immutable). See gateway_handler.go#L318-L319.
This may require a bigger refactor, as we want to avoid separate code paths for unixfs and cbor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

w.Header().Set("Content-Type", "application/json")
Copy link
Member

@lidel lidel Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. We need to make a decision on default behavior before this ships:

  • @warpfork suggested we should aim at returning more user-friendly HTML by default, similar to https://explore.ipld.io, for all non-unixfs DAGs.
    • I kinda agree: we already return custom HTML for directory listings (and not JSON)
    • We want gateways to support everything, not just dag-cbor
  • What if for dag-cbor we return JSON only when request had an explicit Content-Type: application/json header?
    • This way makes semantic sense (I explicitly ask for JSON representation of dag-cbor) and won't break if we add HTML explorer for IPLD universe as the default response
    • For now, if content-type is missing from request, return error saying that "To preview dag-cbor via gateway, request with Content-type: application/json header"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeal pointed out that

there’s a lot of caches out there that aren’t smart about this and you’ll get responses from cache that don’t match the accept header

This is really unfortunate, but likely relevant concern.

I was thinking maybe gateway code could consider file extension if there is one and infer Content-Type from there ? E.g. if content targeted is ipfs://${cid}/metadata.json in dag-cbor format it's probably reasonable to assume that content type application/json should be used. In fact that seems like a reasonable approach to dag-pb as well.

Although I realize this is getting uncomfortable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be ok with streaming the bytes as application/cbor and add a querystring parameter ?json or ?encoding=json to have it converted to JSON with a application/json header.

IMHO we should be serving data from the gateway that is by default the easiest format to consume from a web application. JSON is that. Also, right now we have NO support, and this is significantly better than nothing and what's here is consistent with the CLI.

So I'd prefer to keep what's happening here and maybe add ?encoding=raw for getting the raw cbor bytes.

I like @Gozala's idea but it feels like too many special cases.

Copy link
Member

@lidel lidel Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should keep complexity at bay and avoid special-casing different DAG types too much.
I propose we introduce DAG CBOR support in (IMO) the least controversial way:

  • Stream application/cbor by default
    • Returning the original, space-efficient format makes more sense than JSON
      • Rationale: Gateway should prioritize returning original bytes if possible and avoid conversion in default mode. If someone wants JSON by default, then they should use dag-json – can we add dag-json as part of this PR as well?
    • IIUC dag-cbor is a subset of cbor, so this is a valid content-type (I'd like to avoid introducing proprietary one if we don't have to).
  • JSON requires conversion step, so we should support application/json as an explicit opt-in by either:
    • requesting Content-Type: application/json
      • Rationale: standards matter and we can't be sloppy if we want to be respected by serious vendors and standards bodies
    • OR passing ?enc=json parameter (as a fallback way for use in poorly written software)

Update: I no longer feel as strong about keeping CBOR as default, see #8037 (review)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just require the header and the querystring. The problem you’ll run into with it only being a fallback is that the user, client developer, and the meddling proxy/cache are different actors and can’t always coordinate whether or not to explicitly opt into a feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we should require both?
That is too wacky, and bad for DX/UX: makes it impossible to load JSON via address bar in browser.

If someone uses ductape-based CDN/middleware and Content-Type does not work, then they add ?enc=json, but we should not make redundant opt-in mandatory.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the address bar. It would actually have inconsistent behavior based on your browser plugins because everyone with JSON viewer plugin includes the content type in browser requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue in ipfs/in-web-browsers#182

maybeSetContentDispositionHeader(r.URL.Query(), w.Header())

name := r.URL.Query().Get("filename")
if name == "" {
name = getFilename(r.URL.Path)
}
modtime := time.Unix(1, 0)

b, err := json.Marshal(n)
if err != nil {
internalWebError(w, err)
return
}
http.ServeContent(w, r, name, modtime, bytes.NewReader(b))
}

func (i *gatewayHandler) servePretty404IfPresent(w http.ResponseWriter, r *http.Request, parsedPath ipath.Path) bool {
resolved404Path, ctype, err := i.searchUpTreeFor404(r, parsedPath)
if err != nil {
Expand Down Expand Up @@ -838,3 +861,19 @@ func fixupSuperfluousNamespace(w http.ResponseWriter, urlPath string, urlQuery s
ErrorMsg: fmt.Sprintf("invalid path: %q should be %q", urlPath, intendedPath.String()),
}) == nil
}

// maybeSetContentDispositionHeader sets the Content-Disposition header if a
// "filename" was included in the URL querystring.
func maybeSetContentDispositionHeader(qs url.Values, h http.Header) {
urlFilename := qs.Get("filename")
if urlFilename == "" {
return
}
disposition := "inline"
if qs.Get("download") == "true" {
disposition = "attachment"
}
utf8Name := url.PathEscape(urlFilename)
asciiName := url.PathEscape(onlyAscii.ReplaceAllLiteralString(urlFilename, "_"))
h.Set("Content-Disposition", fmt.Sprintf("%s; filename=\"%s\"; filename*=UTF-8''%s", disposition, asciiName, utf8Name))
}