Skip to main content.

Saturday, June 16, 2012

I came out with a good plan to implement pointer-to-item symbols in the Falcon new engine, so I am writing them down to implement them in the next day.

First a bit of context: falcon always used a special item, containing a pointer to another item, to implement the reference-to-item construct. This construct is visible at script level through the $ operator (explicit reference), and is used internally for assignment expansion (a,b,c = vector_of_three_data), pass-by-reference, variable closure and a few more things. The presence of this construct requires every operation to "dereference" its input, causing an extra if-and-ptr-deref practically at each single variable fetch. Performance killing apart, this construct is making me sweat in the implementation of the new dynsyms and varying code constructs; so I decided to get rid of it once and for all.

This requires that every symbol points to a value (an item), instead of being "calculated" as a physical location in a special vector (the globals vector or the local stack); a value that, in some case, can change depending on the context (the same local symbol will point to different items depending on the frame stack where it is accessed).

Traditional solution to this problem was lazy binding of symbols, but this requires a map scan at each access, in the most naive implementations; better solutions have been developed, but they are all sub-optimal with respect to the access by vector base + displacement we use for local variables.

My solution is that to have an hybrid model with early binding for pre-compiled code (static code from modules) and lazy binding for dynamic code (created via ^= literal operator, the "literal" keyword or by invoking the reflected class constructors, as While(), If(), ForIn(), Plus(), Times() and so on). This means I have to drop the plan of making each code dynamically inspectable and variable: it won't be possible to call function.syntree to get the syntactic tree of an arbitrarily compiled function. OTOH, there are plenty of ways to create dynamic code in a program, and this allow the final programmer to decide whether it needs flexibility or speed, deciding its own performance-flexibility trade off.

So, the dynsyms are the only ones that can are created in dynamic code, while global, extern and local symbols are only created during static compilation. You can also dynamically create static code (typically invoking a compile(" some code") pattern), and that static code can be created on the fly (i.e. composing the source string), but the difference with dynamic code is that the latter can be reflected, inspected and modified as any piece of data, while the static code, once created, must be used as-is or discarded.

Globals and externs point to a static memory location which is initialized at module link time.

Local symbols are initialized at function call frame. During the call frame creation, space for local variables and for un-filled parameters is reserved on the local stack. A parallel stack of pointers is also initialized and filled with the pointers to the items allocated in the stack.

When a local variable is referenced externally, for instance by a dynsymbol or a closure symbol, a new item is created in memory and both the local symbol and the closing one point to the same new location.

Call frames have also a local stack of dynsymbols; it's actually a set of pairs [dynsym_ptr, item_ptr]. The item_ptr can point to the same item held by another, possibly non-dynsym variable (i.e. a local or global in the upper frames), or to an newly created variable.

This brings in two problems. 1) the stack must never be invalidated; if we trust the pointers we assign to the local symbols, then those pointers must not change, at least as long as the symbols stay around. 2) there must be a way to garbage collect the newly created item slots that are created outside the local stack. This also involves the global variables: we could easily keep their module around as long as the variables are alive, but it would be a waste, and would create the same problems in case the data comes from another module (external symbols):.

About 1), the solution is that of creating a large stack with a critical area. Periodically, the VM checks if the top data stack pointer is in the critical area; if it is, it starts a "disaster management", where the whole stack is relocated in a larger space, and all the pointers held in the call frames of that context are updated. Or simply raise an exception, requiring the programmer to explicitly take actions to reduce the call depths or provide a large stack space.

About 2) the solution is to check if the pointer held by a symbol falls in the range of its "original" data. A method of the symbol class, accepting the VMContext, can return 0 if the variable held by the symbol is not in need of being garbage-marked, or return a pointer to a memory area that must be marked. In this way, a global symbol can return the head of the global variables array, an extern symbol can return the head of the target module's global variables array, a local symbol can return either 0 (if it still points to the VMContext stack) or the memory where the dynamic item they point to is created, and closed and dynsyms can always return a pointer to that dynamic memory. In this way, the garbage collector can get rid of items that are not anymore used by any symbol.


No comments yet

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it