Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: serve CBOR encoded DAG nodes from the gateway #8037

Closed
wants to merge 3 commits into from

Conversation

alanshaw
Copy link
Member

@alanshaw alanshaw commented Mar 30, 2021

This PR enables the gateway to serve CBOR encoded DAG nodes. They are returned as JSON objects, the same encoding used on the CLI.

Closes #8823 and ipfs/in-web-browsers#182

Copy link
Contributor

@ribasushi ribasushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one suggestion. Good to merge as-is.

core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
@lidel lidel added the topic/gateway Topic gateway label Mar 30, 2021
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am supportive for this, with some caveats:

  • CBOR is not JSON, so the request should explicitly ask for JSON (details inline)
  • Cache-Control needs more nuance (details inline)
  • Needs sharness tests similar to test/sharness/t0115-gateway-dir-listing.sh

Note that we want to support CAR export too: ipfs/in-web-browsers#170

@@ -523,6 +527,25 @@ func (i *gatewayHandler) serveFile(w http.ResponseWriter, req *http.Request, nam
http.ServeContent(w, req, name, modtime, content)
}

func (i *gatewayHandler) serveCBOR(w http.ResponseWriter, r *http.Request, n format.Node) {
w.Header().Set("Cache-Control", "public, max-age=29030400, immutable")
w.Header().Set("Content-Type", "application/json")
Copy link
Member

@lidel lidel Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. We need to make a decision on default behavior before this ships:

  • @warpfork suggested we should aim at returning more user-friendly HTML by default, similar to https://explore.ipld.io, for all non-unixfs DAGs.
    • I kinda agree: we already return custom HTML for directory listings (and not JSON)
    • We want gateways to support everything, not just dag-cbor
  • What if for dag-cbor we return JSON only when request had an explicit Content-Type: application/json header?
    • This way makes semantic sense (I explicitly ask for JSON representation of dag-cbor) and won't break if we add HTML explorer for IPLD universe as the default response
    • For now, if content-type is missing from request, return error saying that "To preview dag-cbor via gateway, request with Content-type: application/json header"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeal pointed out that

there’s a lot of caches out there that aren’t smart about this and you’ll get responses from cache that don’t match the accept header

This is really unfortunate, but likely relevant concern.

I was thinking maybe gateway code could consider file extension if there is one and infer Content-Type from there ? E.g. if content targeted is ipfs://${cid}/metadata.json in dag-cbor format it's probably reasonable to assume that content type application/json should be used. In fact that seems like a reasonable approach to dag-pb as well.

Although I realize this is getting uncomfortable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be ok with streaming the bytes as application/cbor and add a querystring parameter ?json or ?encoding=json to have it converted to JSON with a application/json header.

IMHO we should be serving data from the gateway that is by default the easiest format to consume from a web application. JSON is that. Also, right now we have NO support, and this is significantly better than nothing and what's here is consistent with the CLI.

So I'd prefer to keep what's happening here and maybe add ?encoding=raw for getting the raw cbor bytes.

I like @Gozala's idea but it feels like too many special cases.

Copy link
Member

@lidel lidel Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should keep complexity at bay and avoid special-casing different DAG types too much.
I propose we introduce DAG CBOR support in (IMO) the least controversial way:

  • Stream application/cbor by default
    • Returning the original, space-efficient format makes more sense than JSON
      • Rationale: Gateway should prioritize returning original bytes if possible and avoid conversion in default mode. If someone wants JSON by default, then they should use dag-json – can we add dag-json as part of this PR as well?
    • IIUC dag-cbor is a subset of cbor, so this is a valid content-type (I'd like to avoid introducing proprietary one if we don't have to).
  • JSON requires conversion step, so we should support application/json as an explicit opt-in by either:
    • requesting Content-Type: application/json
      • Rationale: standards matter and we can't be sloppy if we want to be respected by serious vendors and standards bodies
    • OR passing ?enc=json parameter (as a fallback way for use in poorly written software)

Update: I no longer feel as strong about keeping CBOR as default, see #8037 (review)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just require the header and the querystring. The problem you’ll run into with it only being a fallback is that the user, client developer, and the meddling proxy/cache are different actors and can’t always coordinate whether or not to explicitly opt into a feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we should require both?
That is too wacky, and bad for DX/UX: makes it impossible to load JSON via address bar in browser.

If someone uses ductape-based CDN/middleware and Content-Type does not work, then they add ?enc=json, but we should not make redundant opt-in mandatory.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the address bar. It would actually have inconsistent behavior based on your browser plugins because everyone with JSON viewer plugin includes the content type in browser requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue in ipfs/in-web-browsers#182

@@ -523,6 +527,25 @@ func (i *gatewayHandler) serveFile(w http.ResponseWriter, req *http.Request, nam
http.ServeContent(w, req, name, modtime, content)
}

func (i *gatewayHandler) serveCBOR(w http.ResponseWriter, r *http.Request, n format.Node) {
w.Header().Set("Cache-Control", "public, max-age=29030400, immutable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid if DAG is fetched from anything other than /ipfs/ like /ipns/ (not immutable). See gateway_handler.go#L318-L319.
This may require a bigger refactor, as we want to avoid separate code paths for unixfs and cbor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this some more, and playing with this PR, I feel @alanshaw convinced me that defaulting dag-cbor response to application/json contributes to a way better onboarding flow for newcomers and also better interop (numbers game, more tools speak JSON than CBOR, including web browsers).

I created ipfs/in-web-browsers#182 to discuss the overall strategy when it comes to IPLD on gateways, the default behavior is one of open questions there – would like to hear more voices before we commit to this.

As for this PR, needs sharness tests for dag-pb requests ensuring expected response with:

  • default behavior (we may flip it to JSON before merging this)
  • request with Content-Type: application/cbor
  • request with ?enc=cbor
  • request with Content-Type: application/json
  • request with ?enc=json

@alanshaw lmk if you need help with sharness

@alanshaw
Copy link
Member Author

alanshaw commented Apr 7, 2021

I've come to realise a possible reason for CBOR (or other encoding) not being implemented yet is pathing within a node - typically you can /ipfs/{cid}/internal/path to get the value 138 out of a node like { internal: { path: 138 } } - this PR does not take this into account and I'm not sure how it would work right now if we allow returning raw cbor (I guess re-encode 138 in CBOR?).

@lidel
Copy link
Member

lidel commented Apr 7, 2021

@alanshaw iiuc our path traversal semantics, pathing makes sense only when we talk about separate documents, so it should work only for CID tags within dag-cbor. This aspect of this PR lgtm.

Also, pathing makes it impossible to extract more than one value, so I suggested support for graphql-like queries that return only a subset of fields from a document in ipfs/in-web-browsers#182, but that requires more design discussion (needs to be future-proof and compatible with IPLD selectors), lets descope it from this PR.

Remaining work here (so we dont miss anything):

  • informative error when no explicit format is requested
  • add sharness tests
  • make another pass at HTTP headers (ensure we include the same set of headers as for dag-pb)

@lidel
Copy link
Member

lidel commented Apr 8, 2021

@alanshaw After sleeping on this I think we should not return JSON by default, but instead return an error informing user to retry request with explicit format (JSON or CBOR) either via content-type header or ?enc= param.

That ensures devs will build on deterministic response types and enable us to introduce user-friendly UI for exploring DAGs in the future, as suggested in ipfs/in-web-browsers#182 (comment)

@warpfork
Copy link
Member

warpfork commented Apr 8, 2021

I think we should not return JSON by default, but instead return an error informing user to retry request with explicit format (JSON or CBOR) either via content-type header or ?enc= param.

That ensures devs will build on deterministic response types and enable us to introduce user-friendly UI for exploring DAGs in the future

Big agree, 💯 . I think we should leave room for that kind of experience to be built into the Gateway. It would be consistent with what we do for unixfsv1 directories already and we should leave room to carry that design logic further; and for the encoding types, explicit is good.

@anorth
Copy link
Member

anorth commented Apr 27, 2021

Let me join the chorus suggesting against converting objects to JSON by default.

Despite it being almost the only case right now, unixfs is a special case of IPLD data that can be interpreted by a gateway. When a client asks the gateway for some CID and that CID is a unixfs root, the gateway doesn't give the client what they actually asked for (an IPLD node), but infers a request for the DAG data to be traversed and rendered to a flat file (which does not have the same hash). Or analogous behaviour for a directory. This is great for the use case of URLs in hyperlinks, rendering in browsers etc. We don't ask the client to pass headers or query parameters for this "special" rendering behaviour because it's too awkward for the browser URL bar case; so the default is to render.

Imagine some node that the gateway currently doesn't know how to render, but in the future might be told how to render as something a browser can handle. We should preserve a similar default approach of assuming that the request is from a browser URL bar. Maybe some third party has an alternative schema for chunking flat files, and we later add support for it too. We don't want to change the response data and break old clients relying on the default JSON.

Most other contexts outside a browser URL bar have easy capacity to pass parameters describing what content they want (JS code in browsers, server-side code, curl invocations, ...). It seems a bit backwards that "give me the raw data with this CID" is not the no-parameter default, but the browser case determines that. So IMO all the other response types should be explicit, controlled by parameters from the client. Maybe we even need an ?enc= (blank value) to request no transformation, in case the caller doesn't know (or want to work out from CID) what the native encoding is.

When serving IPLD data, another reason not to JSONify everything is that it breaks the verifiability of the content from the CID. It's ok for that to be an explicit request, but unfortunate as a default behaviour. JS code can parse CBOR too.

IMHO we should be serving data from the gateway that is by default the easiest format to consume from a web application.

I would tweak this. We should be serving data from the gateway that is by default the easiest format to consume from a web browser, because the URL bar is the least good place to require parameters describing what the client wants, and is already set by precedent. But web applications are just as capable as other clients of requesting a particular data format – the request and response will both usually be handled by code.

@lidel lidel added topic/design-ux UX strategy, research, not solely visual design topic/devexp Developer Experience topic/interop Interoperability labels Apr 27, 2021
@aschmahmann aschmahmann marked this pull request as draft May 10, 2021 15:38
@aschmahmann
Copy link
Contributor

Switched to draft while this is still undergoing some design discussion and isn't quite ready for mergable-reviewing.

@willscott
Copy link
Contributor

This might become much easier once we have #7995 which will allow for a dag get to use arbitrary ipld-prime codec encodings. cc @aschmahmann for recommendation on if this implies the cmds route for encoding vs as enumerated encoders on the specific subcommands

return
var contentType string
var data []byte
if r.URL.Query().Get("enc") == "json" || r.Header.Get("Content-Type") == "application/json" {
Copy link

@jeremy-im jeremy-im Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be: r.Header.Get("Accept")
"Content-Type" in the request is about the body of the request, not about the response.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... oh, except "Accept" can be a comma-separated list, so the header value needs to be parsed - not just == compared.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to work pretty well:

diff --git a/core/corehttp/gateway_handler.go b/core/corehttp/gateway_handler.go
index 11e5db671..5b847ae10 100644
--- a/core/corehttp/gateway_handler.go
+++ b/core/corehttp/gateway_handler.go
@@ -39,6 +39,7 @@ const (
 )
 
 var onlyAscii = regexp.MustCompile("[[:^ascii:]]")
+var anyJSON = regexp.MustCompile(`^application/(|.+\+)json$`)
 
 // HTML-based redirect for errors which can be recovered from, but we want
 // to provide hint to people that they should fix things on their end.
@@ -537,9 +538,24 @@ func (i *gatewayHandler) serveCBOR(w http.ResponseWriter, r *http.Request, n for
 	}
 	modtime := time.Unix(1, 0)
 
+	useJSON := false
+	if r.URL.Query().Get("enc") == "json" {
+		useJSON = true
+	} else {
+		for _, v := range r.Header.Values("Accept") {
+			for _, a := range strings.Split(v, ",") {
+				mediatype, _, err := mime.ParseMediaType(a)
+				if err == nil && anyJSON.MatchString(mediatype) {
+					useJSON = true
+					break
+				}
+			}
+		}
+	}
+
 	var contentType string
 	var data []byte
-	if r.URL.Query().Get("enc") == "json" || r.Header.Get("Content-Type") == "application/json" {
+	if useJSON {
 		contentType = "application/json"
 		b, err := json.Marshal(n)
 		if err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d'oh! : forgot the label on the outer loop so the break can break all the way out.

@lidel
Copy link
Member

lidel commented Jul 2, 2021

Quick check-in note:

@lidel
Copy link
Member

lidel commented Mar 3, 2022

Note to self: after #8758 lands we revisit this and add dag-cbor and dag-json formats using the newly created reusable code for cache control and content disposition.

@lidel lidel added the status/blocked Unable to be worked further until needs are met label Mar 3, 2022
@BigLep BigLep added this to the TBD milestone Mar 8, 2022
@BigLep
Copy link
Contributor

BigLep commented Mar 23, 2022

@lidel : given #8758 has landed, I assume this is unblocked? I'll put this down for go-ipfs 0.14 or are you thinking to do sooner or suggest someone else in the community take it?

@BigLep BigLep removed the status/blocked Unable to be worked further until needs are met label Mar 23, 2022
@BigLep BigLep modified the milestones: TBD, go-ipfs 0.14 Mar 23, 2022
@lidel
Copy link
Member

lidel commented Mar 24, 2022

@BigLep yes, this is unblocked, I've created #8823 with some additional notes, so it is not blocked on my availability.

Aiming at 0.14 sounds good: ideally we would resolve #8568 before this ships, as people will start using dag-json and dag-cbor way more thanks to this, and we want good UX around it, but in this specific case it is not a hard blocker.

Implementation wise, it should be pretty easy to add on top of refactor that happened in #8758:

https://github.com/ipfs/go-ipfs/blob/5b43672a8733525ad9fab9be03d14d983f3f9914/core/corehttp/gateway_handler.go#L307-L326

Anyone with spare time could do it, given a spare day.
Writing tests will take most of the time here, but we could reuse test fixtures from:

(at minimum, we want to test the real dag-json and dag-cbor + have one for dag-pb represented as dag-json)

@lidel lidel modified the milestones: kubo 0.14, kubo 0.15 Jul 20, 2022
@lidel lidel self-assigned this Jul 26, 2022
@lidel
Copy link
Member

lidel commented Oct 11, 2022

Continued in #9335

@lidel lidel closed this Oct 11, 2022
@hacdias hacdias deleted the feat/gateway-cbor branch May 9, 2023 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/design-ux UX strategy, research, not solely visual design topic/devexp Developer Experience topic/gateway Topic gateway topic/interop Interoperability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gateway support for dag-json and dag-cbor responses