Skip to main content.

Tuesday, July 19, 2011

Today, I have completed the OOP model constructs in the Falcon new engine. Some details (for instance, Class states, or a direct way to generate a sub-classed prototype via a single call) remain yet to be dealt with, but they are largely things that I can complete later or that I can leave to be completed by the other developers in the project.

What I want to talk about in this entry is 1) the findings about the engine that I have discovered, and 2) the short term plan.
About the findings, I am increasingly pleased of how well the new engine is bending into the new constructs. More specifically, the idea of the PSteps controlling the virtual machine on one side and the decision to expose the VMContext structure to the classes on the other.

Thanks to the fact that I have delegated the retrieval of properties to the classes as agents in behalf of the virtual machine, it is now much more easy to integrate foreign, unknown data into the engine. It is not anymore necessary to create an intermediate communication layer between foreign data and the engine via a standardized interface; you got your data, and just write the glue code to let the VM deal with it.

And that "foreign" data can be everything; actually, even data coming from the engine itself; despite the fact that it is built-in the engine, having the VM to talk with a C++ structrue is now much easier, safer and consistent than before.

One thing I am particularly satisfied with is the new way the GC treats data. ATM the Garbage Collector is still not active in the collect step, but I have already activated accounting, marking and so on, just to be sure that there won't be surprises later on. In the old engine, the GC had to know the entities it collected, and more specifically, it had to know how 1) get their alive mark and then 2) how to destroy then when the entities were found expired.

The new engine delegates the decision about the survival of entities to the class. For instance, let's take the object internally used to represent Prototype instances: the FlexyDict. It has a stl map of strings->items (yep, we'll find a more powerful structure, like a trie, before the release) representing the structure of the topmost object and a simple ItemArray of items considered the base of the structure. Now, suppose you want to take the base out of the host item and expose it to the language as a standard item array.

Under the old engine, this would have been a serious problem: the structure could not have been destroyed while its owner (the FlexyDict) was still alive, and the other way around. Suppose that you keep the array, for any reason, and drop the Prototype (holding the FlexyDict). The owned FlexyDict gets destroyed, and the ItemArray with it. Bang, you crash.

You had to back-track the FlexyDict from the base array, so that if someone had a reference to the base array somewhere, your owning entity (the Prototype, and the FlexyDict in it) stayed alive.

But as a simple ItemArray cannot know about the fact of being part of another more complex structure, you were required to write a wrapper class around ItemArray that was able to do the trick. And then, it would have been complex to publish the base data as a mere, simple, featureless language-bound Array. You would have had to intercept square accessor and to repeat the behavior of the ItemArray... well, it would have been rather tricky.

And the thing can get even trickier in apparently simple classes of the engine. For instance, the FStat data had a reference to three TimeStamp falcon class (modify, create and access times). Keeping track of those was not exactly simple.

But in the new engine all the GC accounting and the decision about the liveliness of things is left on the Class, that is, on the type handler. In the case of our FlexyDict, the Prototype class KNOWS that it has an Array that might be held somewhere. When giving this array off, to be seen out in the wild, it will just instruct the GC to MARK the array item when needed, but it won't GIVE this array to the GC for separate disposal. Conversely, the Prototype class will check the status of the FlexyDict AND the status of the ItemArray it holds before authorizing the GC to kill this instance. All the complexity of multiple tracking goes away: the engine just marks the ItemArray when it's left alone in the wild, but has not a mean to destroy it directly, as it always stays under the control of the original Prototype class that created it, and that will authorize its destruction only when the whole entity is considered obsolete.

In this way, the mayhem of "carrier" (wrapper) classes, back-and-forth references, weak references, circular references dies away. This, by simply moving the control of the crucial aspects of the GC away from the engine, and into the user code. True, this require the user code to know a bit of things about the place where it is going to be put, but except for the trivial cases, this is always the case under any circumstance. The only difference here is that, in exchange for this extra details you have to know, you get more power, performance, and overall design simplicity. Conversely, in the previous model (and in any model I know) where the details are taken away from you, anything except the "school case" gets pretty tricky, if not impossible, to get right.

Other than this, I am very happy about the way the "operators" work. Operators are the code plugs a Class offers to the engine (the methods named op_*), which act directly on the context (that is, the VM at work) and can inject PSteps in the VM to "complete" lengthy operations at a later moment. For instance, the only way that the HyperClass can use to initialize all the instances of their components is to invoke their op_create as any script code would do, and be called back when the VM has completed the creation of the target item.

Through this model, the VM is able to stay always in control and we never lose the ability to interrupt it and inspect it at any stage, while the operators share a common "protcol" to exchange data through the VM itself, a protocol that is shared with the VM and with the rest of the language.

It's easy to make mistakes (i.e. pushing too much and popping too little), but it's easy to do mistakes under any programming model. OTOH, we'll work out some tool to help third party developers to fix this problems as they first test their applications (i.e. via stack-depth guard PSteps to be inserted in debug builds... or in release Falcon builds but in testing steps of final falcon applications). Thanks to the total control we constantly have on the VM, nailing this bugs (with the proper tools i.e. a Falcon Debugger and third party code test tool) seems to be way easier that other bugs that plague software for years.

Also, one may object that the PStep model forces too many "jumps" in the code and that's breaks a flow that might be kept together for higher performance. That's true, but a PStep call is often resolved through a few CPU clock cycles around a function pointer call. The fact that the PStep implementation can program DIRECTLY the virtual machine, and so don't require wrappers, can justify, even in terms of performance, the extra effort needed to push them and account for them.

In short, I am very, very happy about the overall status of the new engine. It's not only more powerful that the previous ones, it's also more elegant, simple, linear and helpful towards foreign code.


About the development plan, we can now start to write the final compiler -> module -> vm link round. We still have a limited set of constructs, but the engine structure is now relatively mature. The compiler seems to be standing, and although it might be better, it does a decent job. In order we have to 1) provide compile-time error control and recovery 2) compile to modules, 3) complete the serialization-deserialization of modules and 4) complete the link step.

After that, it will be time to re-activate the GC at its full.

Then, the next step will be that of fill in the gaps of missing constructs: functional sequences, switch, try/catch, function closures, select, ranges, references and so on.

Finally, we'll have to complete the core library, which will include some advanced I/O.

Well, there's still some work to do to get 1.0, but the hardest part is behind.

(Note to me: Must deal with the last vestiges of the old REGA thing in function frames. Want to nail it tomorrow).


No comments yet

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it