protobuf-c/doc/c-code-generator.xml

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" >
<article>
 <title>The C Code Generator</title>

 <section>
  <title>Design</title>

<para>The overall goal is to keep the code-generator as simple
as possible.  Hopefully performance isn't sacrificed to that end!</para>

<para>Anyways, we generate very little code: we mostly generate
structure definitions (for example enums and structures
for messages) and some metadata which is basically
reflection-type data.</para>

<para>The serializing and deserializing is implemented in a library,
called libprotobuf-c rather than generated code.</para>

 </section>
 <section>
  <title>The Generated Code</title>
  <para>
   For each enum, we generate a C enum.
   For each message, we generate a C structure
   which can be cast to a <type>ProtobufCMessage</type>.
  </para>
  <para>
   For each enum and message, we generate a descriptor
   object that allows us to implement a kind of reflection
   on the structures.
  </para>
  <para>First, some naming conventions:
   <itemizedlist>
    <listitem><para>
     The name of the type for enums and messages and services
     is camel case (meaning WordsAreCrammedTogether)
     except that double-underscores are used to delimit
     scopes.  For example:
     <programlisting><![CDATA[
      package foo.bar;
      message BazBah {
        int32 val;
      }
     ]]></programlisting>
     would generate a C type <type>Foo__Bar__BazBah</type>.</para>
   </listitem><listitem>
    <para>Functions and globals are all lowercase, with camel-case
    words separated by single underscores.
    For example:
     <programlisting><![CDATA[
     Foo__Bar__BazBah *foo__bar__baz_bah__unpack
                           (ProtobufCAllocator  *allocator,
			    size_t length,
			    const unsigned char *data);
    ]]></programlisting>
    </para>
    </listitem><listitem>
    <para>Enums values are all uppercase.</para>
    </listitem>
    <listitem><para>
     Stuff we dd to your symbol names will also be
     separated by a double-underscore.  For example,
     the unpack method above.</para></listitem>
   </itemizedlist>
  </para>
  <para>
  We also generate descriptor objects for messages
  and enums.  These are declared in the .h files:
  <programlisting><![CDATA[
   extern const ProtobufCMessageDescriptor
                     foo__bar__baz_bah__descriptor;
  ]]></programlisting>
  </para>
  <para>
   The message structures all begin with <type>ProtobufCMessageDescriptor*</type>
   which is sufficient to allow them to be cast to <type>ProtobufCMessage</type>.
  </para>
  <para>
   We generate some functions for each message:
   <itemizedlist>
   <listitem>
   <para><function>unpack()</function>.  Unpack data for a particular
   message-format:
   <programlisting><![CDATA[
     Foo__Bar__BazBah *
     foo__bar__baz_bah__unpack  (ProtobufCAllocator *allocator,
                                 size_t length,
				 const unsigned char *data);
   ]]></programlisting>
   Note that <parameter>allocator</parameter> may be NULL.
   </para>
   </listitem>
   <listitem>
   <para><function>free_unpacked()</function>.  Free a message
   that you obtained with the unpack method:
   <programlisting><![CDATA[
     void
     foo__bar__baz_bah__free_unpacked  (Foo__Bar__BazBah *baz_bah,
				        ProtobufCAllocator *allocator);
   ]]></programlisting>
   </para>
   </listitem>
   <listitem>
   <para><function>get_packed_size()</function>.  Find how long
   the serialized representation of the data will be:
   message-format:
   <programlisting><![CDATA[
     size_t
     foo__bar__baz_bah__get_packed_size
                        (const Foo__Bar__BazBah *message);
   ]]></programlisting>
   </para>
   </listitem>
   <listitem>
   <para><function>pack()</function>.  Pack message
   into buffer; assumes that buffer is long enough (use get_packed_size first!).
   <programlisting><![CDATA[
     size_t
     foo__bar__baz_bah__pack
                        (const Foo__Bar__BazBah *message,
			 unsigned char *packed_data_out);
   ]]></programlisting>
   </para>
   </listitem>
   <listitem>
   <para><function>pack_to_buffer()</function>.  Pack message
   into virtualize buffer.
   <programlisting><![CDATA[
     size_t
     foo__bar__baz_bah__pack_to_buffer
                        (const Foo__Bar__BazBah *message,
			 ProtobufCBuffer *buffer);
   ]]></programlisting>
   </para>
   </listitem>
  </itemizedlist>
 </para>

 </section>

 <section>
  <title>The protobuf-c Library</title>

<para>This library is used by the generated code;
it includes common structures and enums,
as well as functions that most users of the generated code
will want.</para>

<para>
There are three main components:
 <orderedlist>
  <listitem><para>the Descriptor structures</para></listitem>
  <listitem><para>helper structures and objects</para></listitem>
  <listitem><para>packing and unpacking code</para></listitem>
 </orderedlist>
</para>

 </section>
 <section>
  <title>protobuf-c:  the Descriptor structures</title>

<para>For example, enums are described in terms of structures:

<programlisting><![CDATA[
    struct _ProtobufCEnumValue
    {
      const char *name;
      const char *c_name;
      int value;
    };

    struct _ProtobufCEnumDescriptor
    {
      const char *name;
      const char *short_name;
      const char *package_name;

      /* sorted by value */
      unsigned n_values;
      const ProtobufCEnumValue *values;

      /* sorted by name */
      unsigned n_value_names;
      const ProtobufCEnumValue *values_by_name;
    };
]]></programlisting></para>

<para>Likewise, messages are described by:

<programlisting><![CDATA[
      struct _ProtobufCFieldDescriptor
      {
        const char *name;
        int id;
        ProtobufCFieldLabel label;
        ProtobufCFieldType type;
        unsigned quantifier_offset;
        unsigned offset;
        void *descriptor;       /* for MESSAGE and ENUM types */
      };
      struct _ProtobufCMessageDescriptor
      {
        const char *name;
        const char *short_name;
        const char *package_name;

        /* sorted by field-id */
        unsigned n_fields;
        const ProtobufCFieldDescriptor *fields;
      };
]]></programlisting></para>

<para>
And finally services are described by:

<programlisting><![CDATA[
      struct _ProtobufCMethodDescriptor
      {
        const char *name;
        const ProtobufCMessageDescriptor *input;
        const ProtobufCMessageDescriptor *output;
      };
      struct _ProtobufCServiceDescriptor
      {
        const char *name;
        unsigned n_methods;
        ProtobufCMethodDescriptor *methods;             // sorted by name
      };
]]></programlisting></para>

 </section>
 <section>
  <title>protobuf-c:  helper structures and typedefs</title>

<para>We defined typedefs for a few types
which are used in .proto files but do not
have obvious standard C equivalents:
<itemizedlist>
<listitem><para>a boolean type (<type>protobuf_c_boolean</type>)</para></listitem>
<listitem><para>a binary-data (bytes) type (<type>ProtobufCBinaryData</type>)</para></listitem>
<listitem><para>the various int types (<type>int32_t</type>, <type>uint32_t</type>, <type>int64_t</type>, <type>uint64_t</type>)
are obtained by including <filename>inttypes.h</filename></para></listitem>
</itemizedlist>
</para>

<para>We also define a simple allocator object, ProtobufCAllocator
that let's you control how allocations are done.
This is predominately used for parsing.</para>

<para>There is a virtual buffer facility that
only has to implement a method to append binary-data
to the buffer.  This can be used to serialize messages
to different targets (instead of a flat slab of data).</para>

<para>We define a base-type for all messages,
for code that handles messages generically.
All it has is the descriptor object.</para>

<section id="buffers">
 <title>Buffers</title>
 <para>One important helper type is the <type>ProtobufCBuffer</type>
 which allows you to abstract the target of serialization.  The only
 thing that a buffer has is an <function>append</function> method:
<programlisting><![CDATA[
   struct _ProtobufCBuffer
   {
     void (*append)(ProtobufCBuffer     *buffer,
                    size_t               len,
                    const unsigned char *data);
   }
]]></programlisting>
  ProtobufCBuffer subclasses are often defined on the stack.
</para>

<para>
For example, to write to a <type>FILE</type> you could make:
<programlisting><![CDATA[
   typedef struct
   {
     ProtobufCBuffer base;
     FILE *fp;
   } BufferAppendToFile
   static void my_buffer_file_append (ProtobufCBuffer   *buffer,
                               unsigned         len,
                               const unsigned char *data)
   {
     BufferAppendToFile *file_buf = (BufferAppendToFile *) buffer;
     fwrite (data, len, 1, file_buf->fp);  // XXX: no error handling!
   }
]]></programlisting>
</para>

<para>
To use this new type of Buffer, you would do something like:
<programlisting><![CDATA[
     ...
     BufferAppendToFile tmp;
     tmp.base.append = my_buffer_file_append;
     tmp.fp = fp;
     protobuf_c_message_pack_to_buffer (&message, &tmp);
     ...
]]></programlisting>
</para>
<para>
  A commonly builtin subtype is the BufferSimple
  which is declared on the stack and uses a scratch buffer provided by the user
  for its initial allocation.  It does exponential resizing.
  To create a BufferSimple, use code like:
  <programlisting><![CDATA[
    unsigned char pad[128];
    ProtobufCBufferSimple buf = PROTOBUF_C_BUFFER_SIMPLE_INIT (pad);
    ProtobufCBuffer *buffer = (ProtobufCBuffer *) &simple;
    protobuf_c_buffer_append (buffer, 6, (unsigned char *) "hi mom");
  ]]></programlisting>
  You can access the data as buf.len and buf.data. For example,
  <programlisting><![CDATA[
   assert (buf.len == 6);
   assert (memcmp (buf.data, "hi mom", 6) == 0);
 ]]></programlisting>
  To finish up, use:
  <programlisting><![CDATA[
    PROTOBUF_C_BUFFER_SIMPLE_CLEAR (&buf);
  ]]></programlisting>
 </para>
 </section>
 </section>
 <section>
  <title>protobuf-c: packing and unpacking messages</title>

<para>
To pack messages one first computes their packed size,
then provide a buffer to pack into.
<programlisting><![CDATA[
    size_t protobuf_c_message_get_packed_size
                                     (ProtobufCMessage *message);
    void   protobuf_c_message_pack   (ProtobufCMessage *message,
                                      unsigned char    *out);
]]></programlisting>
</para>

<para>
Or you can use the "streaming" approach:
<programlisting><![CDATA[
    void   protobuf_c_message_pack_to_buffer
                                     (ProtobufCMessage *message,
                                      ProtobufCBuffer  *buffer);
]]></programlisting>
where <type>ProtobufCBuffer</type> is a base object with an append metod.
See <xref linkend="buffers" />.
</para>


<para>
To unpack messages, you should simple call
<programlisting><![CDATA[
      ProtobufCMessage *
         protobuf_c_message_unpack (const ProtobufCMessageDescriptor *,
                                    ProtobufCAllocator  *allocator,
				    size_t               len,
				    const unsigned char *data);
]]></programlisting>
If you pass NULL for <parameter>allocator</parameter>, then
the default allocator will be used.
</para>

<para>
You can cast the result to the type that matches
the descriptor.
</para>

<para>
The result of unpacking should be freed with protobuf_c_message_free().
</para>


 </section>
 <section>
  <title>Author</title>
    <para>Dave Benson.</para>
 </section>
</article>