Skip to main content.

Tuesday, October 25, 2011

We're dealing with a pretty critical element of the Falcon new engine: the serialization process. Serialization is extremely important in the new engine because it actually serves many purposes.

  • Storage of built-in type values on static streams.

  • Saving and restoring of pre-compiled source code modules.

  • Assistance in creation of stateful programs across multiple sessions (game save/load, session data on web base applications).

  • Cooperative programming and live data sharing across network nodes.

Most notably the serialization process must be able to restore the status of a program at a different time, at least for those items that have been serialized. For instance, if an object belonging to a certain foreign class (coded in C++) is serialized and it must be deserialized on another program, the system must be able to dynamically verify the presence of that foreign code on which the class is based, or, if not available, try and load it to make it locally visible to all the entities that must interact with that entity. The thing is even more critical if you think of the case of hyper classes, where Falcon classes and foreign classes are seamlessly merged, and they may come from different Falcon modules and native dynamic libraries.

Of course, once a so complex system is setup, it is worth to use it as much pervasively as possible. To make it really useful in serializing any generic object, it must also provide a transparent mechanism to flatten cycles and multiple references to object. Without this feature, the serialization mechanism would be unsafe or partial even on simple arrays and dictionaries.

And, since modules have items to serialize, and some of them may be as complex as object instances, it's worth to integrate this mechanism in the precompile module save/load process (fam). Contrarily with respect to what happens in other programmings languages, fam files are totally standalone modules (similarly to java .class files, but with an internal link-time resolution process that allow to compile them at a place where not al the libraries that they rely on need to be actually available).

In the old engine, module serialization was implemented by kinda removing the problem: a fam module didn't contain the items, but just a set of instructions on how to generate them at link time. As a result, only a limited set of standard items could have been stored in the module. For instance, even declaring a dictionary of fully known static data would have required the VM to construct the dictionary at runtime. This is how it's done in every scripting language I know of, but the fact that we did had a way to serialize dictionaries and that I could not apply it to module pre-compilation bugged me. This problem become more evident in case of the attributes (static data about symbols that could have been queried also form a generic code without the need to run the module through a VM in advance). The data you could store in an attribute was limited to the data that the fam module generator was able to understand. At times, having an array of strings in an attribute could have been useful, but that was not possible.

So, the ability to serialize any kind of value that the falcon parser is able to understand, or even, that the virtual machine is able to generate, was too important to be overlooked and relegated outside the fam module loading process. But, since we had items in module, and ways to restore their value associating their class to them, it was worth to see if the mechanism could have been extended to the module code.

In Falcon, data and code are different, even if they can be pretty seamlessly merged. However, the code is bound to be an acyclic directed graph, while data in items needs not. Also, types of code entities is enumerable, while items are not. This might have suggested a different approach in the serialization of code and data. But, there was a detail that prevented this naive approach: in a non far future, we want to introduce self-modifying code, or in other words, reflect statements into language items. This means that items might potentially contain code, and more specifically, they might directly point to a code entity they are a part of. While the code itself stays directed and acyclic, once the border of values in the code leaf has been crossed it is possible that the entity they point to is the same code that contains them.

With this in mind, extending the same serialization of the items to the fam processing ceased to be an option and became a requirement.

Serialization of items happens through the help of their class, as items are opaque to the VM and the Falcon Class entity represents the item handlers. This means that every single grammar entity must be represented by a Class that the Falcon VM can use. For instance, if we have a child of PStep representing the While construct, we must have a class derived from Class called ClassWhile that knows how to serialize a While instance. In a Falcon program a While instance would be represented by an item containing a While C++ instance and a ClassWhile instance providing the scripting interface to the while construct.

It is not necessary to fully expose the interface of all the grammar constructs to the scripts by now, but all the classes handling the serialization of all the construct need to be put in place. While this is surely a lot of work, it's not terribly longer than writing a singe class knowing how to deserialize any grammar item stored in a file. And it has an interesting advantage: the grammar structures need not to be a closed enumeration anymore, and it becomes pretty simple to create new grammar constructs dynamically from third party modules. As long as those modules are available on the target environment, it is not necessary anymore that all the code entities are known by and declared in the central engine. While the parser is still not extensible (but it is easy to make it so, or even derive a new parser and use that one instead), this means that new module could bring in in new processing modes, and even whole new programming paradigms, as the grammar elements(the Psteps), are exactly what the VM understands and processes.

The serializer processor is now already able to query the classes for their instance to have a say in the process. In other words, instances can provide serialization handlers that must be processed by the virtual machine. This means that, contrarily to the old engine, a VM in place and ready to run is now a necessary part of the fam generation and restore method. However, in the most common cases, the VM won't be excited, and the overhead with respect to a simple serialization is totally marginal, (even more if compared to disk write times). Of course, deserializing a tree of instructions, instead of a flat table of code, is way more complex and cpu/memory intensive. OTOH, the deserialization of the code table is not the only thing that a scripting language must do in order to restore a pre-compiled module (and then run it), so I am pretty sure that the net cost of this serialization method is not as high as it could seem.


No comments yet

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it