Skip to main content.

Saturday, December 31, 2011

Wow, looking at the dates I see that more than 2 months have passed since my last article.

The effort to fire up my new business was quite distressing, and the complexity of the things that we're doing in Falcon required an effort that was beyond my expectation. In particular, I've been coding 12-16 hours straight a day since last Tuesday (it's Saturday) to complete the reflective syntax support; now the basics and the skeleton of the system is done, but I need some help to fill in the gaps, so I am describing the system here. This text will go also into the VM/Falcon specifications when we organize it.

One fast note; as a preliminary work for this step, I removed the PCode system. For those who didn't follow the development, PCode were a set of pre-ordered expression PSteps to be specially executed by the VM. In the beginning, that seemed an optimization, so that the PCode could run multiple Expression PSteps before returning the control to the VM for new code to run, but that required the expressions to behave differently from all the other psteps (by not removing themselves when their execution was complete, and by considering their CodeFrame host untouchable), and also would have required re-compilation of the Host pcode when the expressions were changed. In the meanwhile, I found a PStep execution pattern that doesn't require the explicit intervention of the VM, and that is even more efficient than the original PCode idea; so we removed the PCode, and now the expressions, the statements and in general all the PSteps share the same exact behavior and have very similar execution patterns.


Syntax Tree Reflection



Falcon VM executes directly a tree-represented source code through a set of minimal code units called "PStep" that are executed in turn. However, not all the PStep represent a concrete entity of a source program; some of them are just generic instructions to the VM to perform some task.

Generics



A source program written in Falcon is represented by three categories of specialized PStep:

  • Statement: A statement is roughly a line of code in a Falcon source, eventually comprising a list of statements to be conditionally executed. Branches, loops, or even single instructions are statements.

  • Expression: An expression is a tree of simpler expressions bind by operators, which evaluate to a single result. Many statements use an expression to obtain a value that is then used to configure their behavior; for instance, the while statement repeats all the sub-statements it holds while the expression it bears is evaluated as "true". Those expressions are called "selector". Expressions found alone on a line form a special statement called AutoExpression whose sole purpose is that to evaluate them and discard or use their evaluated value in some specific way.

  • SynTree: A SynTree is a collection of sequentially ordered Statements. It represents a block of code that must either be executed or skipped. Some of the statements in a SynTree may control its execution status. Also, syntrees are provided with a selector expression, which might be used for special purposes depending on the statement they are included in, and of an optional target symbol that is bound to receive the result of the selector. For instance, the if statement holds a series of syntrees, which are interpreted as alternative branches if they hold a selector. In a try/catch statement, the selector indicates the kind of data that is to be caught, and the target, if present, is used as a variable where to store the value of the caught entity.



This entities fully represent a Falcon program source file, and they are descendant of a common class called TreeStep. A TreeStep is namely an entity that can be represented as a source Falcon instruction, or set of instructions, or seen the other way, it's the representation of a Falcon source token as an executable VM PStep element. The VM is not bound to execute just TreeSteps, as there are PSteps that can be injected in the VM as complements of the direct code tree representation (for instance, logic expression shortcut gates are PSteps that are onwed by an Expression entity and injected in the VM on need). However, all the code that can be represented as a Falcon source program is derived from the TreeStep class.

This distinction is very important because the TreeStep class has a very important optional feature that is not provided to all the PSteps: reflection. Each TreeStep must expose a Class, an entity derived from the Falcon::Class base class, which represents a handler through which the Falcon VM and other PStep are empowered to manipulate unknown data. In other words, TreeSteps are data known by the Falcon VM. As it's known the Class handler allows the VM to expose methods and properties to the user, to create new instances of the entity, to manage it's serialization and deserialization processes and to handle it's lifetime through the GC marking system.

Internals



The reflection is controlled by a set of files named engine/synclasses*, and the TreeSteps are under the engine/psteps/. The reason to centralize the Class handlers for the PSteps instead of spreading them writing somewhat nearer to their pstep is twofold. First, the PStep should not care about the fact of being handled by a Class or not. Other than declaring what class they are supported by, a TreeStep should totally ignore its handler. Second, 90% of the handler class can be mainly written by deriving from a common base b>ClassTreeStep and just adding the "virtual constructor" semantic needed to create an entity of the correct type on VM request. Also, many of those Class handler that require a specific behavior (nearly all the statements and some expressions) can have 50% to 90% of their behavior inherited from the base ClassTreeStep handler.

As such, a dictionary of class handlers is provided in include/falcon/synclasses_list.h, and some preprocessor macros are used to expand it to generate the classes. A include/falcon/synclasses_id.h is provided to store some special class ID used by the lexer and the parser to determine the context as they compile the source, or in some cases, at runtime by the interactive compiler.

Script interface



The ClassTreeStep base class exposes some methods to the scripts that are available for all the TreeStep elements.

Note: The standard of Falcon Class protocol indicates that a class exposed as "Name" to the script is named like ClassName. So, ClassWhile is seen by the Falcon sources as class named "While".


  • arity: size of the elements that can be directly accessed.

  • Operator[]; the index operator can be accessed to set or get the nth element. Some statements providing fixed but optional blocks can allow some element to be nil. For instance, the for statements access the main block, the forfirst, formiddle and forlast block respectively at index 0,1,2 and 3. Setting one block to nil means removing it.

  • insert(pos, element): Inserts a new element. The element must be of the kind accepted by the parent element (SynTree accepts statements, Statement accept SynTrees, Expression accept other Expressions). If the element has a fixed arity, an exception will be raised. The position (pos) has the same semantic of the [] accessor (negative index start from bottom), and if it is out of range, a new entity will be inserted at end (added).
  • remove(pos): removes the nth element. Elements having a fixed arity will raise an exception.

  • selector: A property returning or accepting an Expression. Setting it to nil means to remove the selector. Some elements do not accept a selector; other require a selector and can't accept a nil.

  • parent: the parent TreeStep. It's a read-only property; can be nil if the TreeStep is currently unparented.



The ClassSynTree handler adds a target class that expses a Symbol (internally handled by ClassSymbol) and can accept a nil in case the symbol is absent.

Each TreeStep element has a parent which is either 0 or a valid TreeStep. A TreeStep can accept as sub-element another TreeStep only if it has not a parent. Reparenting is not allowed. However, it is possible to get an element that currently has a parent and clone() it. The cloning process creates an exactly equal subtree, but the cloned element is unparented and used as the root of the new tree, so it can be stored into any TreeStep accepting it.

Garbage collecting



Parenting is very important for GC marking. When an TreeStep entity is found in a variable, it is marked; but actually, the mark is not applied directly to it, but it escalates up to the topmost parent. Since the TreeStep cannot be unparented once they have a parent, marking the topmost node of a tree (the unparented one) has the same meaning as marking the whole tree. The check for livelyness is performed on the topmost node, that can keep alive its tree alone. Values stored in the tree (for instance, items and symbols) are separately GC-locked, and cannot be disposed until the tree they are hosted in is killed. This grants the topmost parent of a tree full ownership of its sub-elements and to every entity the sub-element may relate to, so that it's possible to plainly delete the children of a node when the node is destroyed or substituted with another unparented node. Functions expose a RootTreeStep which is never exposed to the source files (it has not any handler Class) which can parent the topmost TreeStep and will propagate it's marking to the host function, which might in turn propagate the marking to its host class, if it's a method, and/or module.

Serialization



Since full reflection is granted to each TreeStep element, the task of saving pre-compiled modules is fully delegated to the Store/Restore system. The TreeStep class flatten and unflatten methods perform automatic storage of all the elements that can be accessed through TreeStep::selector, TreeStep::arity and TreeStep::nth virtual methods.

The Store system automatically finds the generator class of an item; so the only specifics that a TreeStep sublcass must respect in order to be serialized is that to offer a public empty constructor.

Note: The Storer system performs serialization in two steps: first, entities are unflattened; they must declare if some part of them might be subject to separate serialization (i.e. if they have other "items" that could be serialized), then they are stored. In the Class::flatten method, the class handler stores sub-elements that have their own class handler taking care of their own serialization, while in the Class::store method, the handler saves the low level data specific for that entity, if there is any. The Class::restore method is called to allow the handler to create a blank entity and eventually fill it with the specific data saved on the stream, and finally the Class::unflatten method is called to allow the entity to hook all the items that were separately restored.

The vast majority of expressions and statements are fully described through their class, selector and elements, so it's unnecessary to provide them with specific flatten/store methods. For those entities having some special attributes (for instance, the return statement that can have a doubt clause) and/or multiple selectors, (for instance, the for/to statement that has two or possibly three expressions defining the loop ends and its step, and a target symbol) it is necessary to reimplement the store/restore, and eventually flatten/unflatten methods. Of course, it is possible to use the base TreeStep::flatten/unflatten methods by invoking it directly, but the code is pretty simple (it just iterates over TreeStep::nth() arity() times, using the child step TreeStep::cls() to compose the item in the flattened array).


Constructors and other details.


To expose the TreeSteps to source scripts so that it is possible to manipulate programs runtime, it is useful to provide some script-level constructor to the Class handling the TreeStep. This is done by implementing the op_create class, which must use the parameters to create and, if useful or necessary configure the TreeStep before returning it to the script via VMContext::stackResult. As the entity is created anew, it should be delegated to the GC via the common FALCON_GC_STORE, where the handler class is the same class creating the entity ("this" is ok), and the stored entity is the just created TreeStep.

As Falcon has the ability to accept more complex structures and variable parameters as function (and constructor) parameters, the constructors exposed to Falcon need not to mimic those available in C++. For instance, the GenArray constructor accepts all the parameters as expressions that, in C++, must be added later on.

Note: Array class is the handler for the array items; GenArray is the handler for the ExprArray TreeStep, which GENERATES an array item. Similarly, all the expressions meant to generate a language item are prefixed Gen*.

For some basic case (n-ary fixed expressions, including zero-ary parameterless ones, variable length of similar tokens as the GenArray constructor), the support is included in the SynClasses system, which can generate standardized Class::op_create code. Specific behavior requires custom implementations.

Specialized TreeStep that offer some elements which cannot be easily captured through the selector/arity abstractions that the basic ClassTreeStep offers. For instance, the forstatements have targets (the for/in may have multiple ones), and expressions that are not seen as selectors. This special behaviors are to be exposed by reimplementing Class::op_setProperty and op_getProperty (which often involve also hasProperty, enumerateProperties and enumeratePV).

Note: All the classes in the ClassTreeStep hierarcy are derived from DerivedFrom. This class abstracts the behavior of a single-parent Class exposed to scripts. To create a class that the scripts see as derived from TreeStep, you need not to derive from ClassTreeStep, but instead to derive from DerivedFrom, passing the concrete instance of the handler ClassTreeStep to it. As any crucial class, ClassTreeStep is published by the engine, and it can be accessed through Engine::instance()->treeStepClass(). All the other TreeStep handler classes (at least, for the elements that are part of the core Falcon language) are members of SynClasses and instantiated at the construction of SynClasses singleton.



Comments

No comments yet

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it