2008-10-14

Serializing deeply linked data structures

Upon expanding the test cases for my tree classes in ZUZU.LIB.CONTAINERS, I discovered that in one degenerate case involving a pessimally balanced splay tree, attempting to serialize the tree using the default compiler-generated serialization routines resulted in a stack overflow. The problem was that I had a test case that accesses each element in the tree in order before attempting to clone the tree using serialization. For most self-balancing trees, this is not a problem, but for splay trees, this results in a tree that is as unbalanced as possible -- essentially just a long linked list. Because the compiler-generated object-serialize method recursively serializes the classes fields, serializing the tree nodes blows up the stack. This is a potential problem when serializing any linked data structure that may have arbitrarily large depth.

The way around this problem is to implement an explicit non-recursive object-serialize method and object-deserialize constructor for affected classes. The general algorithm is fairly simple:

  1. Iterate non-recursively over the nodes in the datastructure. For each node, temporarily null out its pointers and serialize the node normally. The SerializeOutputStream will remember the objects and will not dump them out again if the same object is serialized later.
  2. If the number of nodes was not known in advance, serialize out a sentinel value to delimit the end of the nodes.
  3. Iterate over the nodes again in the same order and serialize the fields in order.
When deserializing, just reverse this process.

The following example demonstrates this problem for a simple linked list data structure. Note that in the linked list case the algorithm only requires a single
iteration because the next pointer is always just the next element to be serialized. To see the stack overflow, comment out the object-serialize and object-deserialize members.


To see this, you need to install the Curl RTE.


Note how I used the class version as an optimization to avoid serializing an extra null for each instance.

Fixing this for my tree classes was a little bit more complicated but the principle is the same. You can see my changes here.

No comments: