Skip to content

OBSOLETE MemoryManagedMessage

haberman edited this page May 15, 2011 · 1 revision

This page is obsolete, and is kept around only for historical interest.

The memory-managed message layer uses the same in-memory format as the read-only message, but adds memory management semantics. This makes it possible to mutate message fields that point to strings, arrays, and submessages, but still be guaranteed that the memory will be collected at the appropriate times.

This memory management scheme is specially designed to make it possible for multiple memory managers to reference the same messages without interfering with one another. In practical terms, this means that a Python program could create a protocol message, pass it to C who passes it to Ruby, and Ruby could then modify that message. The changes to the message would then be visible from Python. The message will not be collected until both Ruby and Python decide it is dead and should be collected.

The in-memory scheme for accomplishing this is as follows:

The memory-managed messages here have exactly the same format as the immutable messages in the layer below. The difference is that we use that pointer that sat idle and put it to good use: it is the head of a linked list of references to this message. This is like reference counting in a sense, except that instead of just counting references, we actually keep a record for each reference, which lives in the struct upb_mm_ref data structure. Though it is not depicted above (to keep the graph from getting too busy), the references have a pointer back to the message.

We arrange things this way primarily for the benefit of dynamic languages, who will need an object structure to track the instance of the object in that language. For them, this upb_mm_ref is also an object structure for that language. Since the ref has a pointer back to the message, the language-specific object can handle methods like field accesses by following the pointer back to the actual struct.

The key idea in this scheme is that all of the different languages’ references point to the same message. That is why they can modify the message and see each others changes.

In this scheme, no message is deleted until all of its references disappear. Upb itself never performs any reference-counting or garbage-collection on top of this; that is left to the individual memory managers. For example, Ruby performs mark and sweep garbage collection. If Ruby collects a Ruby reference to a memory-managed message, Ruby will call the memory-managed message code to remove its reference. If that was the last reference to this message, the object is collected.