Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unixfs: suggest, but do not strictly mandate unixfs<>dagpb #294

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Jun 27, 2022

The strict coupling poses some problems for IPLD implementations that:

  1. do not retain codec information between the serialization and UnixFS (ADL) reification; and/or
  2. do not have a mechanism to strictly mandate that UnixFS data be only encoded in a particular codec.

A mandate in the specification that strictly defines the layering on top of the codec makes it difficult to implement it as an ADL, which also presents difficulty for using tooling that builds on IPLD lens-style layering.

Ref: ipfs/go-unixfsnode#27
Ref: #271
Ref: ipld/go-car#304


So, this is basically a take-two on #271, but with an extra helping of nuance in an attempt to open the door slightly, but not too far that this complicates life for existing implementations. The main aim here is to allow for the kind of layering approach that we have with go-ipld-prime + ADLs. As per ipld/go-car#304 we already have situations where we have enough distance between block decode and ADL interpret that we can't properly apply a type check. This gives us wriggle-room to pass the data through the generic go-ipld-prime "lens" paradigm and let it come out as UnixFS; even if in practice we tightly couple it when encoding the data and may impose an expectation when decoding, where possible.

But also, it'd be good to have at least the potential for separation so we can treat the UnixFS ADL as a full ADL and not a special-cased beast.

At the same time this PR updates the language of this section to current IPLD parlance.

Any stomach for nuance?

The strict coupling poses some problems for IPLD implementations that:

 1. do not retain codec information between the serialization and UnixFS (ADL)
    reification; and/or
 2. do not have a mechanism to strictly mandate that UnixFS data be _only_
    encoded in a particular codec.

A mandate in the specification that strictly defines the layering on top of
the codec makes it difficult to implement it as an ADL, which also presents
difficulty for using tooling that builds on IPLD lens-style layering.

Ref: ipfs/go-unixfsnode#27
Ref: #271
Ref: ipld/go-car#304
@willscott
Copy link
Contributor

As in #271 I would prefer that the ADL works as an ADL, so i would be in favor of this spec change.

UNIXFS.md Outdated Show resolved Hide resolved
@@ -78,7 +78,7 @@ This `Data` object is used for all non-leaf nodes in Unixfs.

For files that are comprised of more than a single block, the 'Type' field will be set to 'File', the 'filesize' field will be set to the total number of bytes in the file (not the graph structure) represented by this node, and 'blocksizes' will contain a list of the filesizes of each child node.

This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. Typically this is encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementations of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.
This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block must conform to the [DAG-PB Logical Format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block must conform to the [DAG-PB Logical Format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.
This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block **MUST** conform to the [DAG-PB **Logical Format**](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg this works! Adding a little more screamers to make sure people do not miss it, but good enough. Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally highlight SHOULD, MUST, MAY, etc. in a spec but this one doesn't already have that where it uses those words and I didn't want to make this section too different, nor do I want to increase the scope of the PR.
Happy to change it if others feel strongly enough though.

Copy link
Member

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2c:

  • this is a desirable change, directionally (same reasoning as Will)
  • I think this reads pretty great -- the parlance updates are good and clear, and the cautions in all directions about compatibility seem accurate and appropriate.

Well written, @rvagg <3

@willscott
Copy link
Contributor

what else is needed to merge this?

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open the door slightly, but not too far that this complicates life for existing implementations

I am late to the party, so apologies if I miss the point here, but how confident are we the below won't happen?

  1. something like relaxed reification go-unixfsnode#27 allows a dag-pb node to have been serialized to cbor or json, and still be interpreted as unixfs data – it gets merged, every go app supports it now
  2. we relax spec here
  3. people see both spec and go library allow cbor, start producing UnixFS dags with cbor instead of dag-pb because "it works in Kubo and Gateways already, and this way we use cbor/json everywhere"
  4. complicated life begins:
    • someone tries to implement a minimal IPFS node in new language from scratch, and they now need to debug things because their implementation can't correctly read "valid UnixFSv1"
    • legacy node is unable to read new "valid UnixFSv1" created by some alternative IPFS implementation that decided to use cbor instead of pb by default because both specs and most common go-library allow it, and they don't care about "legacy" dag-pb

In my mind, this PR should bump the version to UnixFS v2 because it allows for a new type of UnixFS DAGs to be created, that are no longer interoperable with existing implementations that support v1 and v1.5.
Am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants