Table of Contents
The overall goal is to keep the code-generator as simple as possible. Hopefully performance isn't sacrificed to that end!
Anyways, we generate very little code: we mostly generate structure definitions (for example enums and structures for messages) and some metadata which is basically reflection-type data.
The serializing and deserializing is implemented in a library, called libprotobuf-c rather than generated code.
For each enum, we generate a C enum. For each message, we generate a C structure which can be cast to a ProtobufCMessage.
For each enum and message, we generate a descriptor object that allows us to implement a kind of reflection on the structures.
First, some naming conventions:
The name of the type for enums and messages and services is camel case (meaning WordsAreCrammedTogether) except that double-underscores are used to delimit scopes. For example:
package foo.bar; message BazBah { int32 val; }
would generate a C type Foo__Bar__BazBah.
Functions and globals are all lowercase, with camel-case words separated by single underscores. For example:
Foo__Bar__BazBah *foo__bar__baz_bah__unpack (ProtobufCAllocator *allocator, size_t length, const unsigned char *data);
Enums values are all uppercase.
Stuff we dd to your symbol names will also be separated by a double-underscore. For example, the unpack method above.
We also generate descriptor objects for messages and enums. These are declared in the .h files:
extern const ProtobufCMessageDescriptor foo__bar__baz_bah__descriptor;
The message structures all begin with ProtobufCMessageDescriptor* which is sufficient to allow them to be cast to ProtobufCMessage.
We generate some functions for each message:
unpack()
. Unpack data for a particular
message-format:
Foo__Bar__BazBah * foo__bar__baz_bah__unpack (ProtobufCAllocator *allocator, size_t length, const unsigned char *data);
Note that allocator
may be NULL.
free_unpacked()
. Free a message
that you obtained with the unpack method:
void foo__bar__baz_bah__free_unpacked (Foo__Bar__BazBah *baz_bah, ProtobufCAllocator *allocator);
get_packed_size()
. Find how long
the serialized representation of the data will be:
message-format:
size_t foo__bar__baz_bah__get_packed_size (const Foo__Bar__BazBah *message);
pack()
. Pack message
into buffer; assumes that buffer is long enough (use get_packed_size first!).
size_t foo__bar__baz_bah__pack (const Foo__Bar__BazBah *message, unsigned char *packed_data_out);
pack_to_buffer()
. Pack message
into virtualize buffer.
size_t foo__bar__baz_bah__pack_to_buffer (const Foo__Bar__BazBah *message, ProtobufCBuffer *buffer);
This library is used by the generated code; it includes common structures and enums, as well as functions that most users of the generated code will want.
There are three main components:
the Descriptor structures
helper structures and objects
packing and unpacking code
For example, enums are described in terms of structures:
struct _ProtobufCEnumValue { const char *name; const char *c_name; int value; }; struct _ProtobufCEnumDescriptor { const char *name; const char *short_name; const char *package_name; /* sorted by value */ unsigned n_values; const ProtobufCEnumValue *values; /* sorted by name */ unsigned n_value_names; const ProtobufCEnumValue *values_by_name; };
Likewise, messages are described by:
struct _ProtobufCFieldDescriptor { const char *name; int id; ProtobufCFieldLabel label; ProtobufCFieldType type; unsigned quantifier_offset; unsigned offset; void *descriptor; /* for MESSAGE and ENUM types */ }; struct _ProtobufCMessageDescriptor { const char *name; const char *short_name; const char *package_name; /* sorted by field-id */ unsigned n_fields; const ProtobufCFieldDescriptor *fields; };
And finally services are described by:
struct _ProtobufCMethodDescriptor { const char *name; const ProtobufCMessageDescriptor *input; const ProtobufCMessageDescriptor *output; }; struct _ProtobufCServiceDescriptor { const char *name; unsigned n_methods; ProtobufCMethodDescriptor *methods; // sorted by name };
We defined typedefs for a few types which are used in .proto files but do not have obvious standard C equivalents:
a boolean type (protobuf_c_boolean)
a binary-data (bytes) type (ProtobufCBinaryData)
the various int types (int32_t, uint32_t, int64_t, uint64_t)
are obtained by including inttypes.h
We also define a simple allocator object, ProtobufCAllocator that let's you control how allocations are done. This is predominately used for parsing.
There is a virtual buffer facility that only has to implement a method to append binary-data to the buffer. This can be used to serialize messages to different targets (instead of a flat slab of data).
We define a base-type for all messages, for code that handles messages generically. All it has is the descriptor object.
One important helper type is the ProtobufCBuffer
which allows you to abstract the target of serialization. The only
thing that a buffer has is an append
method:
struct _ProtobufCBuffer { void (*append)(ProtobufCBuffer *buffer, size_t len, const unsigned char *data); }
ProtobufCBuffer subclasses are often defined on the stack.
For example, to write to a FILE you could make:
typedef struct { ProtobufCBuffer base; FILE *fp; } BufferAppendToFile static void my_buffer_file_append (ProtobufCBuffer *buffer, unsigned len, const unsigned char *data) { BufferAppendToFile *file_buf = (BufferAppendToFile *) buffer; fwrite (data, len, 1, file_buf->fp); // XXX: no error handling! }
To use this new type of Buffer, you would do something like:
... BufferAppendToFile tmp; tmp.base.append = my_buffer_file_append; tmp.fp = fp; protobuf_c_message_pack_to_buffer (&message, &tmp); ...
A commonly builtin subtype is the BufferSimple which is declared on the stack and uses a scratch buffer provided by the user for its initial allocation. It does exponential resizing. To create a BufferSimple, use code like:
unsigned char pad[128]; ProtobufCBufferSimple buf = PROTOBUF_C_BUFFER_SIMPLE_INIT (pad); ProtobufCBuffer *buffer = (ProtobufCBuffer *) &simple; protobuf_c_buffer_append (buffer, 6, (unsigned char *) "hi mom");
You can access the data as buf.len and buf.data. For example,
assert (buf.len == 6); assert (memcmp (buf.data, "hi mom", 6) == 0);
To finish up, use:
PROTOBUF_C_BUFFER_SIMPLE_CLEAR (&buf);
To pack messages one first computes their packed size, then provide a buffer to pack into.
size_t protobuf_c_message_get_packed_size (ProtobufCMessage *message); void protobuf_c_message_pack (ProtobufCMessage *message, unsigned char *out);
Or you can use the "streaming" approach:
void protobuf_c_message_pack_to_buffer (ProtobufCMessage *message, ProtobufCBuffer *buffer);
where ProtobufCBuffer is a base object with an append metod. See the section called “Buffers”.
To unpack messages, you should simple call
ProtobufCMessage * protobuf_c_message_unpack (const ProtobufCMessageDescriptor *, ProtobufCAllocator *allocator, size_t len, const unsigned char *data);
If you pass NULL for allocator
, then
the default allocator will be used.
You can cast the result to the type that matches the descriptor.
The result of unpacking should be freed with protobuf_c_message_free().