2008-10-14

Serializing deeply linked data structures

Upon expanding the test cases for my tree classes in ZUZU.LIB.CONTAINERS, I discovered that in one degenerate case involving a pessimally balanced splay tree, attempting to serialize the tree using the default compiler-generated serialization routines resulted in a stack overflow. The problem was that I had a test case that accesses each element in the tree in order before attempting to clone the tree using serialization. For most self-balancing trees, this is not a problem, but for splay trees, this results in a tree that is as unbalanced as possible -- essentially just a long linked list. Because the compiler-generated object-serialize method recursively serializes the classes fields, serializing the tree nodes blows up the stack. This is a potential problem when serializing any linked data structure that may have arbitrarily large depth.

The way around this problem is to implement an explicit non-recursive object-serialize method and object-deserialize constructor for affected classes. The general algorithm is fairly simple:

  1. Iterate non-recursively over the nodes in the datastructure. For each node, temporarily null out its pointers and serialize the node normally. The SerializeOutputStream will remember the objects and will not dump them out again if the same object is serialized later.
  2. If the number of nodes was not known in advance, serialize out a sentinel value to delimit the end of the nodes.
  3. Iterate over the nodes again in the same order and serialize the fields in order.
When deserializing, just reverse this process.

The following example demonstrates this problem for a simple linked list data structure. Note that in the linked list case the algorithm only requires a single
iteration because the next pointer is always just the next element to be serialized. To see the stack overflow, comment out the object-serialize and object-deserialize members.


To see this, you need to install the Curl RTE.


Note how I used the class version as an optimization to avoid serializing an extra null for each instance.

Fixing this for my tree classes was a little bit more complicated but the principle is the same. You can see my changes here.

2008-10-01

An 'unimplemented' syntax

Frequently I find that I want to quickly sketch out the interface of a function or class method and compile it without actually implementing its body. If the function does not return any arguments, I can simply leave the body empty, but if it does return something, I might need to write a fake return statement to make the compiler happy. In either case, I usually want to leave myself a reminder that the code still needs to be implemented. In Curl, this can easily be done using an exception:

{define-proc {foo}:String
{error "not implemented yet"}
}

The compiler knows that the 'error' function will always throw an exception and will therefore not complain that the function lacks a return statement. To create your own function like 'error', you only need to make a procedure that always throws an exception and that has a declared return type of 'never-returns':

{define-proc {unimplemented}:never-returns
{error "not implemented yet"}
}

I have done one better than this by creating an 'unimplemented' syntax in the ZUZU.LIB.SYNTAX package that uses Curl's 'this-function' syntax to add the name of the unimplemented function to the error message. For example:


To see this, you need to install the Curl RTE.


You can find the source of this macro here.

The ability to extend the syntax like this makes Curl a much more expressive language than most widely used languages today.