OGDL Binary Specification

Version 1.0, December 2005

Objective

This specification defines a binary equivalent of the original OGDL data specification. In this format, nodes are either text or binary streams. Text is defined as a sequence of bytes where characters are encoded in UTF-8. Binaries are arbitrary byte arrays. This specification does not define any other type information other than the text/binary pair. Schemas and types, if wanted, should be defined at a higher level.

Grammar

ogdl-binary ::= header ( level node )* 0x00

level refers to the indentation level or depth, and starts with 1 for the root level.

header ::= 0x01 0x47 0x00
level  ::= variable-length-integer
node   ::= text-node | binary-node

The header itself is a node at the root level with the text 'G'. If the specification changes, the header could contain a version number.

text-node ::= text 0x00
binary-node ::= 0x01 ( length data )* 0x00

Text should be encoded in UTF-8 and cannot begin with 0x01.

length ::= variable-length-integer
data :: byte[length]
variable-length-integer ::=  
   0x0000 - 0x00007F: 0xxxxxxx
   0x0000 - 0x003FFF: 10xxxxxx xxxxxxxx
   0x0000 - 0x1FFFFF: 110xxxxx xxxxxxxx xxxxxxxx
   ...

The integer codification is similar to the UTF-8 codification, and has the objective of optimizing its size, while allowing for arbitrary large integers to be represented.

A basic parser algorithm

1. Read the header.
2. Read one multibyte int. If 0x00, then end.
3. Read one byte. 
   If 0x01 then read a binary node;
   else read UTF-8 text until 0x00.
4. Goto 2.

Binary node:

3.1. Read one multibyte int (=N). If 0x00, end.
3.2. Read N data bytes.
3.3. Goto 3.1.

Changes to this document

See the Change list