Skip to main content.

Monday, July 25, 2011

The problem with symbols in a programming language is in the fact that they represent a value, but are NOT a value. You can take them as flexible pointers to values, and this means that the real values must be stored somewhere else.

There is also another problem: symbols are themselves data, and so, they themselves must reside somewhere. Since values and symbols are never in a 1:1 relationship, you need a way to ensure the following conditions at the same time:

  • The value of a variable (named by a symbol) must stay alive at least as long as the symbol is accessible.

  • Symbols must stay alive as long as there is some grammar element or code referencing them.

  • Values, symbols and their container must be collected as soon as possible when they are not used anymore. In short, they must not leak.

Of course, cross referencing between values, symbols and grammar is possible, but error prone and so CPU intensive that you wouldn't want to use it. Using the garbage collector for that would be theoretically possible, but again extremely complex and CPU intensive (even if less than in the case of the reference option).

Storing the symbols in their module (in the module where they are declared), or in the function where they are declared in case of local symbol, is a solution. The data generated by the module, or the choice between static and dynamic modules, will keep the module alive, and its functions and symbols with them.

There are two problems with this scheme: first symbol importing from remote modules (i.e. in case of explicit import directive) and then, dynamic code generation, which can create code snippets -- beside functions; for instance, on-the-fly code compilation, if not auto-generation or script-based self-modification. In the first case, the host module importing symbol from the remote module will also want to reference and keep alive the referenced module (and its code with it), as long as it is alive; if it's a static module, possibly as long as the engine runs.

In the second case, we have a real issue. We need to have variable names to access data created locally, but we can't store this names (symbols) anywhere else if not along the code they serves.

I think the correct solution to symbols generated by dynamic code is to putting them in the SynTree where they belong. This means adding an optional Symbol Table to the SynTree class; and since a SynTree represents a code block at grammar level, this means that we would gain a long wished feature (by some user, actually, not by me) for free: variable scoping.


No comments yet

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it