Recursive structures of primitive codecs #174

ConsciousCode · 2020-05-14T22:59:23Z

Instead of simply enumerating every possible codec, why don't we define an exhaustive set of primitives and composing operations which can be assigned an id for reduced size? At present they're hardly self-describing, an implementation has to already know how to decode a codec.

In essence, the current multicodec table would be given an additional column representing decomposition into more primitive types, and codes assigned to more primitive types and compositional operations. This way, an implementation that doesn't know a higher-order codec can fetch an up-to-date table and know how to decompose it. A "primitive" is defined as a row that lacks a decomposition column. Initially, all codecs are primitives, but as decomposition encoding improves, they can be redefined as higher-order without affecting older implementations. Older implementations already have logic defined for decomposing former primitives, while newer implementations either define similar special decoders or defer to the decomposition.

For some vague examples of what this might look like:

name	tag	code	composition	description
compact	composition	?	byte_order:{be, le} bit_order:{be, le} bits:int	Bit-for-bit encoding of an integer of arbitrary size
be-uint8	memory	?	compact(be, be, 8)	Big-endian byte order, big-endian bit order, 8-bit unsigned integer
ones	composition	?	bits:int offset:int $encodes:int $value:(2**bits - offset)	One's complement signed integer
ieee-754-binary32	memory	?	sign:bit exponent:ones(be, 8, 127) fraction:compact(be, be, 22)	IEEE 32-bit floating point

The most primitive type would be a bit, and compositions would emerge from operations which consume those bits. Since the multicodec table is already filled with some highly compositional types under this scheme, maybe it'd be better to create a new project?

Stebalien · 2020-05-15T00:00:46Z

What you're describing here is a general-purpose type language. I'd love that doesn't really obviate the need for the codec table in general. In addition to the structural information, the codecs give semantic information.

For example, the "sha256" codec doesn't tell you what the structure of the data is, it tells you that the data is a sha256 digest.

ConsciousCode · 2020-05-15T00:15:20Z

Semantic information can be done via an equivalent of C's typedef. I'd be kind of surprised if no one's made such a language, surely it has plenty of utility? I'm finding it hard to google for

Stebalien · 2020-05-15T00:27:43Z

That's basically what this table is. It's the type declarations without the type definitions. The codes are effectively "short names" for the type names.

You're right, there are already languages like this. For semantic information, there's an entire field of research (search for semantic web). For types, you'd need a language to describe a mapping between arbitrary serialized objects to structured data. Google points to https://en.wikipedia.org/wiki/Data_Format_Description_Language.

warpfork · 2020-05-15T07:24:50Z

+1 to Stebalien's comments.

My favorite tincture when I get runaway thoughts in this area is to read about Gödel's Constructable Universes -- specifically the fact that they're not unique -- and remind myself that Gödel numberings are therefore unavoidable. If one decomposes "primitives" far enough, one always reaches a depth where further increases in generality mean proportional decreases in meaning itself. One reaches a Gödel numbering.

Multicodecs are just jumping to the point and making the numbering at a level that's practical and useful to us in the level of the world where we're worried about distinguishing, well, codecs. The more we decompose it, the longer of a Church encoding we get; and while that's a fascinating exercise, it doesn't help do much...

...until the point one turns around and uses that to build an interpreter. But then, ah, there we have it: we've reached the point where it's effectively inventing a new language. (And signing on to all the joys of designing, building, maintaining, and growing an ecosystem around that language/interpreter!) And if we do that...

At some point, someone might come along and observe that there's nothing particularly special about this encoding we've made versus some other Church encoding someone else has made. Because both encodings are effectively Gödelian Constructable Universes, that observation would be correct! Neither is clearly "purer" than the other. At the same time, if we each did the work particularly effectively, the Shannon entropy of the encoding should be maximum, so each encoding is indistinguishable from any pattern matching on its contents! And so...

Then to distinguish them, you'd need...

A magic number. Prepended somewhere.

... you see how it recurses :) At some point, one simply has to choose a place to stand one's ground.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive structures of primitive codecs #174

Recursive structures of primitive codecs #174

ConsciousCode commented May 14, 2020 •

edited

Stebalien commented May 15, 2020

ConsciousCode commented May 15, 2020 •

edited

Stebalien commented May 15, 2020 •

edited

warpfork commented May 15, 2020 •

edited

Recursive structures of primitive codecs #174

Recursive structures of primitive codecs #174

Comments

ConsciousCode commented May 14, 2020 • edited

Stebalien commented May 15, 2020

ConsciousCode commented May 15, 2020 • edited

Stebalien commented May 15, 2020 • edited

warpfork commented May 15, 2020 • edited

ConsciousCode commented May 14, 2020 •

edited

ConsciousCode commented May 15, 2020 •

edited

Stebalien commented May 15, 2020 •

edited

warpfork commented May 15, 2020 •

edited