Proposal to remove integrity zomes, so that a DNA is just a single Wasm file #2454

maackle · 2023-06-02T19:40:04Z

maackle
Jun 2, 2023
Maintainer

Proposal

In this proposal I question whether we should do away with Integrity Zomes and redefine "DNA" as a single wasm with entry defs, link defs, and validation rules. In other words, the DNA becomes the fundamental unit of composition at the integrity layer, and is expressed as a single Wasm module.

My hypothesis is that we can obtain all the functionality we want from Zomes by two existing tools:

Use Rust crates (and other package management for other languages) to satisfy reusability and modularity
Use Wasm static linking to satisfy wholesale reuse, and cross-language support

This proposal is a call to investigate whether we can replace zomes with these other methods of modularity and composition for the sake of simplicity, so that a DNA consists of just a single Wasm module.

The proposal is not to do this any time soon, or even to do necessarily ever do this, but rather to evaluate the soundness of the idea, regardless of the practical steps it would take to get there, so that we could then imagine what the practical cost/benefit tradeoff is.

In particular, I want to see if we actually lose the ability to do anything useful that we can currently do now but can't do if we get rid of integrity zomes. Feedback on this is particularly welcome.

Note: in this doc, whenever I say "zome", I mean "integrity zome".

Motivations

In the Great Integrity-Coordinator Split, our original concept of Zome got split into two distinct concepts with distinct purposes. This split was done as directly as possible, doing the minimum work to distill a "coordination" layer away from the "integrity" layer, which has been successful to that end. However, both types of "zome" have retained the qualities and constraints of their common parent, and now that both "types" of zome have had a chance to breathe and show how they actually want to be used, it feels like time to drop unnecessary restrictions and simplify both sides of the split.

In particular, the original definition of a zome was that it was (1) a building block of a DNA, and (2) it was meant to be modular and composable with other zomes for code re-use. Now that we've split "zome" into two layers, we see that these two properties are mainly related to one or the other layer, not both:

Only the integrity zome is involved in defining the DNA, meaning the coordinator zomes are completely outside of the DNA (something we missed in the Split). So, it was erroneous to ever think that coordinators make up a DNA.
The modularity and reusability of zomes is much simpler to do with Coordinator logic than Integrity logic:

Expanding on that second point: Composing two coordinators together is straightforward and well-defined: the zome functions are namespaced by zome name anyway, so they don't even really get combined, and then the callbacks have a rule about how they get called when defined in multiple zomes. But gluing two integrity zomes together is a lot more complicated: we have a complex Rust macro to help us keep track of all the entry types introduced by each zome, a special indexing system to include a "global entry type index" into Actions, and even then not every kind of composition is possible. For instance, we cannot specify that a datatype in one zome should be incorporated into a datatype in another zome. Integrity composition is more complex, and needs to be tended to in a more explicit way to unlock the full range of possibilities.

Another motivation is simplicity. Having multiple integrity zomes per DNA creates a lot of complexity:

We need to calculate a zome index for each Action, using some very magical HDK macros, making it harder for devs to write their Wasm without using our HDK, which was an original design goal.
There are also some fundamental implementation details that devs need to track, like whether a given Op runs validation for only its originating zome, or for all zomes in the DNA. This can be confusing, and just today (2023-05-08) a bug was found by Guillem around an inconsistency in whether one zome or all zomes are called for RegisterDelete ops.
In some preliminary conversations with devs, I haven't yet gotten feedback that having multiple zomes is actually useful, and that in one instance, someone needs to either do a funky workaround to make two zomes work together, or just collapse the two into one.
In the conversation about making Coordinators first-class entities, there are lots of details to work out, and simplifying the structure of DNA would simplify the reworking of Coordinators as well.

Finally, there is still a desire to remove some weird naming from Holochain. Some newcomers are turned off by our wacky naming. If we eliminate integrity zomes, and fully flesh out coordinator zomes as their own entities, we can do away with the term "zome" altogether.

Questions

I want to explore the question of if we need integrity zomes by asking and answering some other leading questions first.

What do we want Integrity Zomes to accomplish for us?
Are they accomplishing what we want them to accomplish?
Could what we want to be accomplished be accomplished without zomes?
Do we still need Integrity Zomes?

What do we want Integrity Zomes to accomplish for us?

As I understand it, in the original design, Zomes as units of composition had two purposes:

To allow DNAs to be expressed in modular, reusable parts which can be easily assembled together to create new DNAs
To support different execution contexts, allowing a single DNA to be written in two or more different languages

I understood (1) in its most ideal form to be something like a Zome Store, where devs can find reusable integrity zomes with their own validated entry and link types to include into their DNAs. The zomes would have their own settable properties and would specify their dependencies on other integrity zomes. Validation rules could be extended, entry types of one zome could be included into the entry type of another, etc. Fully composable in all the ways, and completely modular.

(2) is a bit less appealing now that Wasm is our single compilation target, and the HDK has become complex enough that even adding support for another source language is far from easy, but if that were to be done, conceivably it could be useful to reuse some code from another project in a different project written in a different language, facilitated by having two different zomes from two different source languages.

Are Zomes accomplishing what we want them to accomplish?

Both of the above claims of what we expect Zomes to accomplish are questionable.

Zomes are not really reusable building blocks for building arbitrary DNAs. Zome devs are encouraged to include knowledge of their surrounding DNA context, meaning they can't be used in other contexts. Zomes don't have their own Modifiers or manifests, and have to rely on the modifiers and manifest of the DNA they're a part of. Zomes can call dna_info(), and there is a strong pull to even allow calls to app_info(). While a Zome could be written to be a useful standalone building block, we don't encourage that pattern, or communicate that as desirable, or even show how it's done, so we can't expect many instances of that out in the wild. We would expect most zomes to be written for a particular DNA, or maybe a few DNAs within the same project. We would not expect anything like a "zome store" for devs to mix and match bits of functionality to assemble DNAs with.

Zomes do not encapsulate their own context. The only thing zomes provide in terms of reuse is encapsulation of their code.

As for writing in multiple languages, this could be a legitimate use case for zomes. If some complex logic is written in one language and wants to be used in another project written in another language, it would be nice to make that code reusable across projects. However, there is a problem with this: we currently rely on a lot of complex Rust macro magic to make zomes usable at all, and especially to combine the entry types of each zome together into a single DNA. I don't think we can reasonably say that we will support an HDK in every language with a Wasm compilation target that allows for putting those Wasms from different source languages together in the same DNA. I don't think we can even say we will support a single other language any time soon. So, this ability of zomes will remain untapped for a long time, if not indefinitely.

Also, because zomes are not very useful as modular building blocks within a single project, it seems unlikely that a particular project would be written with well-defined modular zomes at all, so the likelihood of being able to simply import some other project's zome into your own without modifying the source code is also unlikely -- you will probably have to isolate the part you want anyway and recompile that borrowed code into a new, smaller wasm.

Could what we want to be accomplished be accomplished without zomes?

Since zomes do not encapsulate their own context and only encapsulate their code, we can look to other solutions which provide that. Code encapsulation can be accomplished very well by using the Crates and Modules system of Rust. Every other modern language has a similar system of modules and packages which will be familiar to users of those languages. So, I claim that these existing systems can replace the code reuse we get from Zomes perfectly well, with some tools for properly exporting and importing entry types similar to what we already have now. I claim that a Rust crate provides just as much encapsulation and reuse as a Zome does.

As for supporting multiple languages, if all we want from zomes is to reuse some code from another project in another language, this may be better accomplished at the level of Wasm itself. If all we want is to be able to reuse some code from an existing Wasm, we could instead provide support for linking the wasms at compile time, to produce a single Wasm from an existing one and our own source code. Then we don't have to reinvent the wheel of providing a Wasm interface to other Wasms in a shared execution context, since Wasm supports this itself using its own module system.

Do we still need Integrity Zomes?

To summarize the above, it seems that Zomes are reinventing two different wheels, but not in a way that provides any extra value. The existing solutions of language-specific package management and linking Wasm modules could cover what zomes provide for us now. If we're not gaining any extra use out of Zomes, we could remove them from our ontology altogether to simplify the mental model of Holochain, simplify dev's experiences, simplify the HDK, and simplify ongoing work to redefine other aspects of the system.

This is just an initial line of thought. It may turn out that there are problems with using crates or linking, and that Zomes actually do provide something else useful. The proposal here is not to definitely do away with Zomes -- it's to first investigate whether removing them is feasible and desirable. If so, we can talk about actually doing it.

LeadJavaliner · 2023-06-06T08:11:16Z

LeadJavaliner
Jun 6, 2023

1 reply

maackle Jun 6, 2023
Maintainer Author

We have an extensive set of tests of the existing zome system, which would need to be adapted to this new system, which would then serve as the comprehensive set of tests for the new system.

Note that this proposed system is a lot simpler than what we currently have, and is actually a subset of our current functionality. Since we already have the ability to use multiple zomes, we inherently have the ability to use a single zome as well, which is what this proposal boils down to. So there is almost no new functionality, it's actually carving away functionality that I believe is unnecessary to provide the expressive power that devs need.

The larger concern is whether this is considered usable and desirable for hApp devs, in terms of expressivity and ease of use.

guillemcordoba · 2023-06-06T08:56:26Z

guillemcordoba
Jun 6, 2023
Collaborator

At first glance @maackle , I'd say this assesment of state of affairs is pretty pretty far away from my experience writing happs, and I think this is a sentiment that could be shared by part of the HC community as well. If you take a look at https://github.com/holochain-open-dev/ and holochain-open-dev.github.io/, you can see how at least I and some others in the community are thinking and developing reusable building blocks in holochain.

It is the case that a lot of the development done right now is done via coding a carefully constructed building block as a combination of UI + coordinator zome + integrity zome, you can see there in the open-dev repo that we already have a bunch of them, dealing with different domains and use cases (profiles, membrane-invitations, peer-status, file-storage, calendar-events, reactions...). Right now the profiles zome is the most used one, being used in 6-7 different projects, saving a bunch of time for people. The key of these modules is actually UI reusability, with custom elements that you can plug and play to take over chunks of your UI and are smart enough to connect to their appropriate holochain backend and read and write from that. This pattern of including UIs directly to your applications only works because those UIs can assume both the data types from the integrity zome and the zome functions to access them of the coordinator zomes. I'd strongly oppose any change that makes it impossible to write such modules with UIs that assume these properties of the backend, or don't implement other ways of doing this. For me this is one of the main things that makes developing HC apps so much fun.

The fact that a zome can access its dna_info and app_info does mean that a zome that is not reusable can be developed, but I'd say that for now it hasn't been a problem at all. Even more, it is beneficial in some cases. For example, in the file-storage module, there are two zomes, the file-storage provider (main one that stores files), and an optional file-storage consumer zome. The only function of the file-storage consumer one is to enable having the file-storage provider zome in another DNA that only high capacity nodes join, where the file-storage consumer zome contains the logic of broadcasting "hey I am a high capacity node" and the bridging logic to the other DNA. In this case, you need calls like this. It is true that the file-storage consumer depends on the file-storage provider, but with good enough documentation I'd say that's a good thing, saving the trouble of implementing this to newer devs.

I understand where your proposal is coming from, but I do think there needs to be more catching up between the core team and the community in terms of the patterns already being developed and used, to go deep in to what works and needs to be preserved, and what doesn't and needs to be changed. I know of others (Connor, Pospi, Nick, etc.) that also have their own opinions and patterns that they implement in their projects.

9 replies

ddd-mtl Jun 7, 2023

My previous reply was related to your assertion @maackle that Zomes are currently not reusable building blocks:

Zomes are not really reusable building blocks for building arbitrary DNAs.

Thats not my experience, and the limitations I do encouter for reusing zomes are related to the current limitations in the HDK.

At minimum, a Zome dev needs to define the enum of LinkTypes, the entry types and the validation code for those entry types.
Thats all Zome specific, so I don't see how you can define that at the DNA level.
Whats currently missing is having a way to define Zome Properties, so a DNA dev can just import the crate and set different properties for validation (ex: minimum length of a profile name).

I don't see why each zome should be its own wasm, and I'm fine with a DNA being shipped as one wasm.
That means you have some kind of DNA crate that imports all zomes and agregates all the LinkTypes and EntryTypes into one enum. This could all be generated from the dna.yaml. Instead of referencing wasm files, you'd be referencing cargo crates.

maackle Jun 7, 2023
Maintainer Author

OK, sorry for misunderstanding that you weren't misunderstanding me! 😄

Zomes are not really reusable building blocks for building arbitrary DNAs.

Yeah, this was not phrased well. I meant something like, zomes are not arbitrarily reusable. You can reuse zomes, but only in some circumstances, if you design them right and don't do certain things. Guillem wrote about that well. I just meant that they are falling short of what I think was originally expected: that you could some day just take a handful of zomes that other people wrote and slap them together unchanged to get a new DNA.

I hear the concern about washing away huge amounts of work on large projects. I would want to consult with all large projects before considering such a change.

I hear the question of how to bundle integrity code and coordinator code together with a UI module, where the coordinator code references entry types from the integrity code, in such a way that the integrity code can be composed with other integrity code and the coordinator code can be composed together with other coordinator code, and that the resulting Coordinator still has all the right references to the DNA entry types. That would need to be solved. The existing system already uses a fair amount of black magic to solve this, so I imagine any other approach would require a similar amount. I will explore that in a separate comment.

My takeaway is that we need to make sure a single integrity zome allows for everything that currently works, and that there's a sane migration path for existing projects, and this thread pointed out to me another thing that needs to work that I hadn't considered.

guillemcordoba Jun 7, 2023
Collaborator

I'm happy with this comment :)

jost-s Jun 13, 2023
Maintainer

I understand this proposal to essentially remove the intermediate layer of zomes, mainly integrity zomes. Currently a DNA consists of zomes which contains any number of crates that contain the zome functions. The zome layer isn't needed and isn't shareable either, because in the DNA manifest you're not pointing to a remote zome file but to a locally compiled one which is based on an external crate. So the zome layer is redundant, it's not needed and doesn't give rise to reusability.

The new structure would be DNA -> crates -> (zome) functions.

maackle Jun 13, 2023
Maintainer Author

I understand this proposal to essentially remove the intermediate layer of zomes, mainly integrity zomes

Yes, and to identify everything else needed to make the same guarantees we have now. They'll only be fully redundant when we do what's needed to make them so. That includes the stuff in this comment, as well as paving the way for static linking of Wasms, and making sure that all looks pretty similar to what we've already got so that the migration path is reasonable.

maackle · 2023-06-07T17:13:20Z

maackle
Jun 7, 2023
Maintainer Author

DISCLAIMER: this comment could have been more appropriate in my personal notes, but I kind of wanted to think out loud. No need to comment on this, and no guarantees that it will make sense.

This comment from @guillemcordoba speaks to another thing that needs to work that I didn't think of yet:

As of right now... You can't build reusable modules in this style with only one integrity zome. This is because when creating entries, the coordinator zome needs the EntryTypes definition (create_entry(EntryTypes::Post(post))), which is found in the integrity zome. And at the time of writing the coordinator zome of the reusable module, the consuming happ's single integrity zome hasn't been built yet, so it's not possible to do (as I said, right now). This could change if instead of the entry types enum the coordinator references a single entry definition id that the rust struct defines, but then we are exposed to clashes of entry def id... If there are black magic rust macro ways of solving this I'd be happy with it.

So I just want to think out loud about this here.

The proposal is to represent the dependency of a DNA on integrity modules, and the dependency of a Coordinator on coordinator modules, through the Rust crate system (or any other language's packaging system). But we have this other kind of dependency too, where an individual coordinator module needs to depend on one or more integrity modules.

This diagram shows both kinds of dependencies. Solid lines are for bundling modules together into the final product (a DNA or Coordinator), and dotted lines show the coordinator-integrity dependencies Guillem highlighted. C means "coordinator module" and Z means "integrity module"

graph TD
    Coordinator -.-> DNA
    Coordinator --> C1
    Coordinator --> C2
    Coordinator --> C3
    DNA ---> Z1
    DNA ---> Z2
    DNA ---> Z3
    C1 -.-> Z1
    C2 -.-> Z2
    C3 -.-> Z3

What this problem amounts to is: how can the dotted line from Coordinator to DNA be properly implemented in terms of all of the dotted lines from a C to a Z?

I think in Rust maybe this can be done with build scripts. The Coordinator crate knows this entire dependency tree, because it is downstream of them all, and the DNA crate knows a subset of this (all the integrity crates). So, overall, we know which integrity crates were included in the DNA in which order, and the names of each crate, as well as knowing which coordinator crates depended on which integrity crates, so I think with all of that info combined, the DNA crate could expose the integrity types in a way that the Coordinator would be able to properly reference, using crate names as the common link (since crate names have to be unique in Cargo.toml). Yes, there would need to be some annotations and a bit of magic, but probably not more than we currently have.

One idea for the right kind of "magic" might be: the coordinator crate would have an injectable HDK, rather than a predefined one. Coordinators can be written against the types in individual integrity crates, but the HDK functions are generic in that they depend on an HDK being injected at some later time. Then, if a bunch of coordinator crates are combined together and used against a DNA, then we need to construct an HDK which is aware of the combination of integrity crates, so that every function which requires an entry type enum or link type enum actually requires an enum of enums, with Into impls from each sub-enum into the larger enum. Then the coordinators which were written against a subset of all entry types will automatically have their function arguments lifted into the over-arching enum by the Into. Put another way, the mapping of subsets of integrity types into the full set of integrity types would be handled by this injectable HDK.

23 replies

jost-s Jul 1, 2023
Maintainer

This explicitness helps my understanding tremendously. I understand your proposal to inject an HDK with the relevant mapping from local entry type to global entry type index, and much prefer it over the existing macro magic.

We have a unique crate name/index and local entry type index. Can't we store both of them for each action, instead of coming up with a mapping?

maackle Jul 3, 2023
Maintainer Author

We have a unique crate name/index and local entry type index

The unique crate name/index is not known until after all integrity modules are composed into a DNA, so still, C1 can't know this part at its own compile time, and has to be told what its crate name / ordering index is after it was compiled

jost-s Jul 3, 2023
Maintainer

That's the crux of it and the part I don't understand. Why does C1 have to know it? And it will be a different one depending on the DNA where C1 is used, so how is that consolidated?

maackle Jul 3, 2023
Maintainer Author

Yes, that's the crux. C1 has to know the full type "path" because it has to be able to construct Actions for the DNA which contain the full "path" (whether a global index, or a crate name + local index, or whatever we choose). That's why that extra info has to be provided to C1 at runtime through some kind of mapping from the subset of types it knows about, into the global set of types for the whole DNA. C1 doesn't have enough info to be able to construct a valid Action at compile time.

To be a little more concrete: Imagine C1 creating a Create action. Create needs an entry_type provided which will be valid for the DNA that it's a part of. It doesn't matter how this is represented: what we're getting at here is that C1 does not have all the info it needs to specify the entry_type at compile time. It needs a way to express the intent to create an Action using one of the entry types it does know about (because that's all it can do), and have that be translated into actually creating the proper Action with the proper entry type.

jost-s Jul 5, 2023
Maintainer

I got it now. Bottom line is that we need the entry type in an Action. Entry types are not guaranteed to be unique across integrities, so we need to capture the entry type relative to its integrity. Therefore there has to be a way to identify an integrity within a DNA, which leads to some kind of mapping as you mentioned. Translating entry types local to integrities to DNA entry types has to happen for every HDK call that deals with entries, both ways, for writing and reading actions.

Instead of a mapping we could use a deterministic quality of an integrity that can be computed, like its hash. That hash could be written to the Action instead of integrity index. However, the hash takes up more space than the index and in terms of code complexity there's not a great deal of reduction.

Your suggestion to equip the HDK with the mapping function seems best to me. I suppose that currently that mapping function resides somewhere in the conductor and passes the zome and entry type index to the HDK functions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal to remove integrity zomes, so that a DNA is just a single Wasm file #2454

{{title}}

Replies: 3 comments 33 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Proposal to remove integrity zomes, so that a DNA is just a single Wasm file #2454

maackle Jun 2, 2023 Maintainer

Proposal

Motivations

Questions

What do we want Integrity Zomes to accomplish for us?

Are Zomes accomplishing what we want them to accomplish?

Could what we want to be accomplished be accomplished without zomes?

Do we still need Integrity Zomes?

Replies: 3 comments · 33 replies

LeadJavaliner Jun 6, 2023

maackle Jun 6, 2023 Maintainer Author

guillemcordoba Jun 6, 2023 Collaborator

ddd-mtl Jun 7, 2023

maackle Jun 7, 2023 Maintainer Author

guillemcordoba Jun 7, 2023 Collaborator

jost-s Jun 13, 2023 Maintainer

maackle Jun 13, 2023 Maintainer Author

maackle Jun 7, 2023 Maintainer Author

jost-s Jul 1, 2023 Maintainer

maackle Jul 3, 2023 Maintainer Author

jost-s Jul 3, 2023 Maintainer

maackle Jul 3, 2023 Maintainer Author

jost-s Jul 5, 2023 Maintainer

maackle
Jun 2, 2023
Maintainer

Replies: 3 comments 33 replies

LeadJavaliner
Jun 6, 2023

maackle Jun 6, 2023
Maintainer Author

guillemcordoba
Jun 6, 2023
Collaborator

maackle Jun 7, 2023
Maintainer Author

guillemcordoba Jun 7, 2023
Collaborator

jost-s Jun 13, 2023
Maintainer

maackle Jun 13, 2023
Maintainer Author

maackle
Jun 7, 2023
Maintainer Author

jost-s Jul 1, 2023
Maintainer

maackle Jul 3, 2023
Maintainer Author

jost-s Jul 3, 2023
Maintainer

maackle Jul 3, 2023
Maintainer Author

jost-s Jul 5, 2023
Maintainer