Skip to content

OBSOLETE StreamParserSerializer

haberman edited this page May 15, 2011 · 1 revision

This page is obsolete, and is kept around only for historical purposes.

The stream parser allows you to parse a protocol buffer byte stream into events, very much like how SAX parsers work for XML. One major improvement over SAX parsers is that you can very efficiently skip submessages because they are length delimited. This is not possible in XML because it is a text-based format.

The stream parser is fully streaming-capable in the sense that you can pass it a buffer of protocol buffer data that only represents a partial protobuf. The parser will parse as much as it can. When you have more data, you can call the stream parser again and it will pick up where it left off, even if it left off in the middle of a bunch of submessages.

Notice that for the most part, the stream parser is just yielding a stream of data events. But you will also notice there is a “tag callback” which is not part of the data stream. The tag callback is also implemented by the client, and serves two very important purposes:

  • it tells the stream parser, given a field number, what the .proto type of that field is. This data is not contained in the stream, because protocol buffers are not fully self-describing. The parser must know this information to correctly continue the parse. For example, if the parser sees a delimited field, it needs to know if it is a string or a submessage.
  • it tells the stream parser whether to parse the field or to skip it. Skipping a field can be significantly more efficient than parsing it.

The parser yields the following events, which are also implemented as user callbacks:

  • value callback, called when a scalar value is encountered.
  • string callback, called when a string value is encountered. The callback is provided a pointer back into the client’s own buffer, so that the client can process the data without copying if desired.
  • start/end submessage callbacks, called when entering or finishing a submessage.

This parser interface is defined in src/upb_parse.h.

TODO: document the stream serializer as well.