Skip to content
Joshua Haberman edited this page Aug 28, 2013 · 30 revisions

μpb (or more commonly, “upb”) is an implementation of the Protocol Buffers serialization format released by Google in mid-2008. The Greek letter mu (μ) is the SI prefix for “micro”, which reflects the goal of keeping upb as small as possible while providing a great deal of flexibility and functionality.

upb is written in C99, but has a first-class zero-overhead C++ API also.

Why another Protocol Buffers implementation?

upb has a significantly different design than other Protocol Buffers implementations. upb’s design delivers a unique combination of flexibility and performance that is powered by a JIT compiler.

Traditional protobuf implementations tightly couple the decoder/parser with a data structure representation. You parse a protobuf with an API call like:

my_proto.ParseFromString(serialized);

This is convenient if a fully-populated instance of my_proto is what you want. But what if your use case is any of these:

  • you want an instance of my_proto, but only need a few fields and don’t want to pay the cost of decoding all submessages.
  • you want to decode the protobuf into your own custom data structure.
  • you want to do stream processing of the protobuf.

upb supports all of these use cases by offering an event-based parser, like SAX for XML. The parser calls your handler functions every time a value is parsed off the wire, so you can worry about consuming the data without having to know anything about the wire format (in fact, the same code can consume output from multiple wire formats, for example both binary format and text format).

Everything about the parser is configurable at runtime. You can create/load your own schemas at runtime using the upb::MessageDef, upb::FieldDef, and upb::EnumDef classes. You can create your own tables of handlers using upb::Handlers, one for each field in the upb::MessageDef — any fields you don’t set handlers for will simply cause that field to be skipped. And from the upb::Handlers you can create any number of parsers for any supported wire format — besides protobuf binary and text format, there are plans to support JSON also!

The upb interface is powerful enough that the upb parser is capable of parsing protobufs into Google’s C++ protobuf classes at roughly the same speed as the built-in parsers for those classes (and these are highly optimized parsers that have been meticulously optimized over years of internal use at Google). This attests to both upb’s speed and flexibility.

Support for Dynamic Languages

An explicit goal for upb is to enable simpler/faster dynamic language support for Protocol Buffers. All of the upb classes for representing protobuf schemas (upb::MessageDef et al) use a memory management scheme that makes them much more amenable to wrapping than Google’s google::protobuf::Descriptor classes. upb is much more hands-off about memory management and threading semantics, which should make it easier to integrate into language runtimes.