The C Code Generator


Table of Contents

Design
The Generated Code
The protobuf-c Library
protobuf-c: the Descriptor structures
protobuf-c: helper structures and typedefs
Buffers
protobuf-c: packing and unpacking messages
Author

Design

The overall goal is to keep the code-generator as simple as possible. Hopefully performance isn't sacrificed to that end!

Anyways, we generate very little code: we mostly generate structure definitions (for example enums and structures for messages) and some metadata which is basically reflection-type data.

The serializing and deserializing is implemented in a library, called libprotobuf-c rather than generated code.

The Generated Code

For each enum, we generate a C enum. For each message, we generate a C structure which can be cast to a ProtobufCMessage.

For each enum and message, we generate a descriptor object that allows us to implement a kind of reflection on the structures.

First, some naming conventions:

  • The name of the type for enums and messages and services is camel case (meaning WordsAreCrammedTogether) except that double-underscores are used to delimit scopes. For example:

          package foo.bar;
          message BazBah {
            int32 val;
          }
         

    would generate a C type Foo__Bar__BazBah.

  • Functions and globals are all lowercase, with camel-case words separated by single underscores. For example:

         Foo__Bar__BazBah *foo__bar__baz_bah__unpack
                               (ProtobufCAllocator  *allocator,
    			    size_t length,
    			    const unsigned char *data);
        

  • Enums values are all uppercase.

  • Stuff we dd to your symbol names will also be separated by a double-underscore. For example, the unpack method above.

We also generate descriptor objects for messages and enums. These are declared in the .h files:

   extern const ProtobufCMessageDescriptor
                     foo__bar__baz_bah__descriptor;
  

The message structures all begin with ProtobufCMessageDescriptor* which is sufficient to allow them to be cast to ProtobufCMessage.

We generate some functions for each message:

  • unpack(). Unpack data for a particular message-format:

         Foo__Bar__BazBah *
         foo__bar__baz_bah__unpack  (ProtobufCAllocator *allocator,
                                     size_t length,
    				 const unsigned char *data);
       

    Note that allocator may be NULL.

  • free_unpacked(). Free a message that you obtained with the unpack method:

         void
         foo__bar__baz_bah__free_unpacked  (Foo__Bar__BazBah *baz_bah,
    				        ProtobufCAllocator *allocator);
       

  • get_packed_size(). Find how long the serialized representation of the data will be: message-format:

         size_t
         foo__bar__baz_bah__get_packed_size 
                            (const Foo__Bar__BazBah *message);
       

  • pack(). Pack message into buffer; assumes that buffer is long enough (use get_packed_size first!).

         size_t
         foo__bar__baz_bah__pack
                            (const Foo__Bar__BazBah *message,
    			 unsigned char *packed_data_out);
       

  • pack_to_buffer(). Pack message into virtualize buffer.

         size_t
         foo__bar__baz_bah__pack_to_buffer
                            (const Foo__Bar__BazBah *message,
    			 ProtobufCBuffer *buffer);
       

The protobuf-c Library

This library is used by the generated code; it includes common structures and enums, as well as functions that most users of the generated code will want.

There are three main components:

  1. the Descriptor structures

  2. helper structures and objects

  3. packing and unpacking code

protobuf-c: the Descriptor structures

For example, enums are described in terms of structures:

    struct _ProtobufCEnumValue
    {
      const char *name;
      const char *c_name;
      int value;
    };

    struct _ProtobufCEnumDescriptor
    {
      const char *name;
      const char *short_name;
      const char *package_name;

      /* sorted by value */
      unsigned n_values;
      const ProtobufCEnumValue *values;

      /* sorted by name */
      unsigned n_value_names;
      const ProtobufCEnumValue *values_by_name;
    };

Likewise, messages are described by:

      struct _ProtobufCFieldDescriptor
      {
        const char *name;
        int id;
        ProtobufCFieldLabel label;
        ProtobufCFieldType type;
        unsigned quantifier_offset;
        unsigned offset;
        void *descriptor;       /* for MESSAGE and ENUM types */
      };
      struct _ProtobufCMessageDescriptor
      {
        const char *name;
        const char *short_name;
        const char *package_name;

        /* sorted by field-id */
        unsigned n_fields;
        const ProtobufCFieldDescriptor *fields;
      };

And finally services are described by:

      struct _ProtobufCMethodDescriptor
      {
        const char *name;
        const ProtobufCMessageDescriptor *input;
        const ProtobufCMessageDescriptor *output;
      };
      struct _ProtobufCServiceDescriptor
      {
        const char *name;
        unsigned n_methods;
        ProtobufCMethodDescriptor *methods;             // sorted by name
      };

protobuf-c: helper structures and typedefs

We defined typedefs for a few types which are used in .proto files but do not have obvious standard C equivalents:

  • a boolean type (protobuf_c_boolean)

  • a binary-data (bytes) type (ProtobufCBinaryData)

  • the various int types (int32_t, uint32_t, int64_t, uint64_t) are obtained by including inttypes.h

We also define a simple allocator object, ProtobufCAllocator that let's you control how allocations are done. This is predominately used for parsing.

There is a virtual buffer facility that only has to implement a method to append binary-data to the buffer. This can be used to serialize messages to different targets (instead of a flat slab of data).

We define a base-type for all messages, for code that handles messages generically. All it has is the descriptor object.

Buffers

One important helper type is the ProtobufCBuffer which allows you to abstract the target of serialization. The only thing that a buffer has is an append method:

   struct _ProtobufCBuffer
   {
     void (*append)(ProtobufCBuffer     *buffer,
                    size_t               len,
                    const unsigned char *data);
   }

ProtobufCBuffer subclasses are often defined on the stack.

For example, to write to a FILE you could make:

   typedef struct
   {
     ProtobufCBuffer base;
     FILE *fp;
   } BufferAppendToFile
   static void my_buffer_file_append (ProtobufCBuffer   *buffer,
                               unsigned         len,
                               const unsigned char *data)
   {
     BufferAppendToFile *file_buf = (BufferAppendToFile *) buffer;
     fwrite (data, len, 1, file_buf->fp);  // XXX: no error handling!
   }

To use this new type of Buffer, you would do something like:

     ...
     BufferAppendToFile tmp;
     tmp.base.append = my_buffer_file_append;
     tmp.fp = fp;
     protobuf_c_message_pack_to_buffer (&message, &tmp);
     ...

A commonly builtin subtype is the BufferSimple which is declared on the stack and uses a scratch buffer provided by the user for its initial allocation. It does exponential resizing. To create a BufferSimple, use code like:

    unsigned char pad[128];
    ProtobufCBufferSimple buf = PROTOBUF_C_BUFFER_SIMPLE_INIT (pad);
    ProtobufCBuffer *buffer = (ProtobufCBuffer *) &simple;
    protobuf_c_buffer_append (buffer, 6, (unsigned char *) "hi mom");
  

You can access the data as buf.len and buf.data. For example,

   assert (buf.len == 6);
   assert (memcmp (buf.data, "hi mom", 6) == 0);
 

To finish up, use:

    PROTOBUF_C_BUFFER_SIMPLE_CLEAR (&buf);
  

protobuf-c: packing and unpacking messages

To pack messages one first computes their packed size, then provide a buffer to pack into.

    size_t protobuf_c_message_get_packed_size
                                     (ProtobufCMessage *message);
    void   protobuf_c_message_pack   (ProtobufCMessage *message,
                                      unsigned char    *out);

Or you can use the "streaming" approach:

    void   protobuf_c_message_pack_to_buffer
                                     (ProtobufCMessage *message,
                                      ProtobufCBuffer  *buffer);

where ProtobufCBuffer is a base object with an append metod. See the section called “Buffers”.

To unpack messages, you should simple call

      ProtobufCMessage *
         protobuf_c_message_unpack (const ProtobufCMessageDescriptor *,
                                    ProtobufCAllocator  *allocator,
				    size_t               len,
				    const unsigned char *data);

If you pass NULL for allocator, then the default allocator will be used.

You can cast the result to the type that matches the descriptor.

The result of unpacking should be freed with protobuf_c_message_free().

Author

Dave Benson.