Zuzu Curl

2010-02-08

In defense of the 80-column rule

Early in my career when most computer terminals were low-resolution affairs capable of showing only a limited number of characters, and many monitors, such as the famous VT100 series, were restricted to 80 columns of text. Thus it made perfect sense for programming projects to adopt the programming convention that all code should be formatted to fit within 80-columns or even shorter. This convention not only made sense because of restrictions of the display technology but also to help make the code look good when printed out, which was a frequent occurrence in an era of small displays and slow compilers.

I have adopted this convention for many years on projects in which I have been involved, and have long taken it for granted. However, lately I have been spending time looking at Java source code for various Eclipse plug-ins and have been surprised to find that it is not clear that this code follows any restrictions of line lengths whatsoever. I have even seen some code with lines of more than 160 characters! It is clear that there must be developers who feel that the size and resolution of modern computer displays removes the need for any restriction and who do not understand that there are still many legitimate reasons for limiting the width of their code. So I feel compelled to rant a little bit on why I believe this convention is still justified.

Here are my reasons, in no particular order:

Limits eye-movement while reading code.

While the full field of human vision is fairly wide, the area in which words can be read is restricted to close to the center of vision, thus the eye needs to move in order to read text. The farther the eye needs to move to read, the more effort is required. Reading text with excessively long lines generally results in lower comprehension and greater eye-strain. This is why text in newspapers and magazines is formatted in narrow columns. The same principle applies to code as well.
Avoids need for horizontal scrolling when viewing/editing code.

If the code is wider than your editor, you must horizontally scroll it to read it or edit it. This is obviously tedious. Developers will be forced between sacrificing screen real estate for wide editor panels or having to scroll.
Allows use of side-by-side text editors.

If the code is restricted to a reasonable width, then developers with large enough monitors can place editor panels side-by-side on a regular basis. With an 80-column restriction and using an 9-point font (I prefer proggy tiny with bold punctuation), I can fit two editors side-by-side on my 17" laptop with horizontal room to spare for other Eclipse views. On my large 30" display at work I sometimes show three side-by-side editors.
Facilitates use of graphical diff viewers.

A related issue is the use of side-by-side graphical diff tools such as kdiff3 (which I favor because of it support for comparing directories). When comparing overly wide source files using such tools, you are forced to horizontally scroll both files in order to fully visualize the differences. The problem is even worse when performing a 3-way merge using these tools.
Allows developers to standardize their development environments editor width.

If developers know that all the code they work with will be restricted to a reasonable width, they can configure the views in their development environment accordingly and will not be forced to resize editor views to accommodate differing code widths.
Ensures code will print out nicely.

Although it is much less common to print out code than it once was, it can still be useful from time to time. Code that is too wide will not print out nicely. Depending on your printer settings, the excess width may be automatically wrapped, truncated, or printed on extra sheets of paper. None of these are desirable and make the code hard to read when printed out. I find that code following an 80-column restriction can legibly be printed out in a small font with two-pages per sheet of paper (e.g. on linux, I usually use the command enscript -2r -G to print out code).
Discourages developers from using overly complex nested code.

Deeply nested code is generally considered more complex and harder to understand than shallowly nested code. This is not a hard and fast rule, but generally speaking deeply nested code should be avoided. Putting a restriction on the width of the code, together with reasonable indenting and code formatting rules will naturally discourage excessive nesting.

2009-07-03

Zuzu project now uses Mecurial

I switched the ZUZU project to use Mercurial as its source repository in place of Subversion. This will allow me the ability to have fast access to the entire repository on my laptop and to manage individual commits when then network is unavailable (such as when I am on vacation later this summer).

The Subversion repository still exists on googlecode, but I do not plan to do any further checkins to Subversion.

I am using TortoiseHg on my Vista machine and have also installed the Mercurial Eclipse plugin as well. [Update 2010-02-07: I have since switched to the HgEclipse plugin, a fork of the earlier project which appears to be better maintained and more mature than the other.]

I also recommend "Mercurial: The Definitive Guide", which just came out this week. It is also available online.

2009-06-27

Configuring debuggability in Curl

Debuggability is an attribute of Curl processes that controls how the code is compiled and whether debugger features are enabled for the applet. Debuggability controls the following features/attributes:

The ability to set breakpoints, and step through code in the IDE debugger.
The ability to measure code coverage of running code.
Syntax errors are reported in the Curl IDE in debuggable applets, where the developer may easily click on the error and navigate to the offending code. In non-debuggable applets they are only reported in the applet's main window.
Lazy compilation is disabled in debuggable applets. In non-debuggable applets, functions are not compiled (and errors not detected) until they are first used, which may not happen until well after the application has been started.
Compiler optimizations like function inlining and register allocation are disabled in debuggable applets.
The Curl performance profiler can report information about source lines in debuggable applets; otherwise only function-level information is available. However, because optimizations are disabled, profiling debuggable applets may produce significantly different results than non-debuggable ones.

The principle disadvantage to making your applets debuggable is that the combination of disabled optimizations and insertion of extra noop instructions to support debugging will result in slower code, and in some cases debuggable applets may be dramatically slower. In these cases, the developer may want to be applet to run the same applet debuggable when using debugger or coverage features, but otherwise use a non-debuggable version.

At least up through version 7.0 of the RTE, debuggability of Curl applets in is controlled solely by the list of directories listed in the Debugger pane of the Curl RTE Control Panel. When the RTE starts a new applet, it consults this list to see if the applet should be made debuggable. If the applet's starting URL is in the specified directory, it will be made debuggable.

So for developers to run the same applet with different debuggability settings, they must either add and remove entries to the debuggability settings in the Control Panel every time they want to run the applet differently or they find a way to run the same applet from different paths. The latter is obviously preferrable. On linux (and Mac OSX) this can easily be accomplished by creating symbolic links using the 'ln' command. For instance, on the linux machines I use at work, I have started putting all of my Curl projects in a subdirectory named 'ws' (short for workspace), and have made a symbolic link named 'non-debug-ws' to the 'ws' directory, so that I can configure my debuggability settings to use paths beginning with "file:///u/cbarber/ws/" to load applets from the source with debuggability enabled, and those beginning with "file:///u/cbarber/non-debug-ws/" to run non-debuggable versions.

On Windows, however, this is not so easy, since there is no equivalent command in Windows nor is there any way to accomplish the same thing in the Explorer UI. It turns out that it is indeed possible to create the equivalent of a unix symbol link on Windows NTFS file systems -- the default file system used by Windows NT and later -- a NTFS "junction point", but this ability is only available in low-level system programming APIs. Fortunately, there are a number of open source solutions that give you the ability to create them. The one that I favor is an open-source shell extension called NTFS Link, which adds entries for creating NTFS hardlinks and junction points to the Windows Explorer's "New" submenu. The one gotcha is that you must be careful not to delete junction points in the explorer until you have unlinked it from its target directory, or else you will end up deleting the target directory contents as well!

P.S. In case it is not already obvious, that the debuggable path is the same as the path you use in your development environment. The non-debug path does not have to be reflected in your development environment since it is not expected to cause breakpoints to be triggered, and so on.

2009-03-28

Poor man's union types

The original design of the Curl language anticipated support of user defined "union types", that is, the ability to define an abstract data type representing values that may belong to two or more unrelated types. (For an example from another dynamic language, see Dylan's type-union syntax.)

The proposed syntax was:

{one-of T1, T2 [,...]}

In early versions of the language, one-of was defined as a macro that simply resolved to 'any', which is the supertype of all scalar types in Curl, with the intention of implementing a correct version at a later time. Of course, such an implementation does not do the desired compile-time checking and served only as an indication of the programmers intent. However, when we started looking into implementing this concept for real, it quickly became evident that the amount of effort required to implement it was not justified by the relative infrequency of its anticipated use, so we dropped it from our development schedules and removed the syntactic placeholder.

Nevertheless, there are times when such a type would come in handy. One example can be found in Curl's built-in 'evaluate' function, whose first argument is declared as 'any', but which actually accepts one of 'CurlSource', 'StringInterface', or 'Url'. Because the argument is declared as taking 'any', you can write code that passes an unsupported type and the compiler will not generate an error; an error will not be thrown until the function is actually executed with a bad value. If the function could have been declared as taking a union type, the compiler would be able to detect such errors at compile time.

Fortunately, it turns out that it is possible to define a class type that while not implementing the full semantics of a union type, still provides us with the most important feature you want from a union type: i.e., the type checking of assignments. The trick is to define a class with a 'any' field to hold the value, and an implicit constructor for each type in the union:


{define-class public IntOrString
field public constant value:any
{constructor public implicit {from-int i:int}
  set self.value = i
}
{constructor public implicit {from-String s:String}
  set self.value = s
}
}

Each implicit constructor supports implicit conversion from the argument type when assigning to a variable or argument of the class type, and since the field is constant and can only be initialized by one of the constructors, you can safely assume that the value is a member of one of the specified types (or null 'uninitialized-value-for-type' would return null for one of the types).

It would be tedious to have to define such a class every time you needed to use this pattern, but it is straightforward to define a parameterized versions of this class for different numbers of arguments, and a macro that picks the correct parameterized class based on the number of arguments, and this is exactly what I did last week. I added a new ZUZU.LIB.TYPES package to the ZUZU.LIB project, which contain the macros 'One-of' and 'AnyOne-of' along with associated parameterized value classes. The 'One-of' type represents the value using a field of type '#Object' and can only be used when all of the types in the unions are subtypes of 'Object'; 'AnyOne-of' uses an 'any' field and can be used with any types. Here is a small example:

2008-10-14

Serializing deeply linked data structures

Upon expanding the test cases for my tree classes in ZUZU.LIB.CONTAINERS, I discovered that in one degenerate case involving a pessimally balanced splay tree, attempting to serialize the tree using the default compiler-generated serialization routines resulted in a stack overflow. The problem was that I had a test case that accesses each element in the tree in order before attempting to clone the tree using serialization. For most self-balancing trees, this is not a problem, but for splay trees, this results in a tree that is as unbalanced as possible -- essentially just a long linked list. Because the compiler-generated object-serialize method recursively serializes the classes fields, serializing the tree nodes blows up the stack. This is a potential problem when serializing any linked data structure that may have arbitrarily large depth.

The way around this problem is to implement an explicit non-recursive object-serialize method and object-deserialize constructor for affected classes. The general algorithm is fairly simple:

Iterate non-recursively over the nodes in the datastructure. For each node, temporarily null out its pointers and serialize the node normally. The SerializeOutputStream will remember the objects and will not dump them out again if the same object is serialized later.
If the number of nodes was not known in advance, serialize out a sentinel value to delimit the end of the nodes.
Iterate over the nodes again in the same order and serialize the fields in order.

When deserializing, just reverse this process.

The following example demonstrates this problem for a simple linked list data structure. Note that in the linked list case the algorithm only requires a single
iteration because the next pointer is always just the next element to be serialized. To see the stack overflow, comment out the object-serialize and object-deserialize members.

Note how I used the class version as an optimization to avoid serializing an extra null for each instance.

Fixing this for my tree classes was a little bit more complicated but the principle is the same. You can see my changes here.

2008-10-01

An 'unimplemented' syntax

Frequently I find that I want to quickly sketch out the interface of a function or class method and compile it without actually implementing its body. If the function does not return any arguments, I can simply leave the body empty, but if it does return something, I might need to write a fake return statement to make the compiler happy. In either case, I usually want to leave myself a reminder that the code still needs to be implemented. In Curl, this can easily be done using an exception:


{define-proc {foo}:String
  {error "not implemented yet"}
}

The compiler knows that the 'error' function will always throw an exception and will therefore not complain that the function lacks a return statement. To create your own function like 'error', you only need to make a procedure that always throws an exception and that has a declared return type of 'never-returns':


{define-proc {unimplemented}:never-returns
  {error "not implemented yet"}
}

I have done one better than this by creating an 'unimplemented' syntax in the ZUZU.LIB.SYNTAX package that uses Curl's 'this-function' syntax to add the name of the unimplemented function to the error message. For example:

You can find the source of this macro here.

The ability to extend the syntax like this makes Curl a much more expressive language than most widely used languages today.

2008-09-17

Running Applets directly from Google Code

One thing I have always liked about Curl is the lack of an independent compile/link step. You can run Curl applets directly from source code just using the Curl RTE, which will compile and link the code dynamically as needed. This gives Curl the immediacy and flexibility of scripting languages like JavaScript while retaining the performance of a compiled language. It also means that you can run Curl applets directly from a source code repository with a web interface that can be configured to return the appropriate Curl applet MIME type (text/vnd.curl). Luckily for me, Google Code provides such a repository, so I am able to configure applets in my ZUZU libraries to be run directly from the repository.

Here is an example:

The above applet is located at the URL:

http://zuzu-curl.googlecode.com/svn/trunk/zuzu-curl/LIB/applets/example.curl

This example applet takes arguments in the "query" portion of the URL to set the title of the example and to load the initial contents of the example either from another file or from the query itself (as in this case). This allows me to use the same example applet to show different editable examples in my blog. The embedded example applet used in the training section of the Curl Developer's Site uses the same trick; for example, see here.

Look here for instructions on how to configure your Google Code repository to serve Curl applets. This trick may work on other Subversion-based code hosting services such as SourceForge, but I have not tried it.

UPDATE: Unfortunately, there does not seem to be any comparable support for Mercurial-based repositories. See Google Project Hosting issues 2815 and 2920.