Skip to main content.

Saturday, July 30, 2011

I have been enlightened by a Great Truth(TM) today. Programming (without adjectives) may be something else, but Good Programming (TM(tm)) is all about giving names to things, or, I may say, "dubbing".

I've been struggling to reintroduce auto-operators (+=, ++, postfix ++ & family) in the Organic Virtual Machine for a couple of days; and a couple of "my" days, so, something 16 work hours per day.

The problem was not that of making the thing happen, but to make it general enough not to occupy half the size of the VM by itself. Self-operators are not exactly a cakewalk to be implemented (in fact, many modern languages have dropped them, or just macro-expand them). Especially postfix increment, which must retain the previous value of an item and then store the new value in the same place where it came from. Consider:

number = array[nextPos()]++

If nextPos() returns a different index at each invocation, expanding ++ to this:

temp = array[nextPos()]
number = temp
array[nextPos()] = temp + 1

would lead to a disaster.

What you have to do is to:

  1. Retreive the coordinates of the entity to be accessed (in our example, call nextPos() and obtain the value of the 'array' symbol)

  2. Save the coordinates for later use

  3. Apply the coordinate in read-mode, to obtain the desired value.

  4. Eventually (in case of postfix operator), save the optanied value.

  5. Apply the operator (increment, addition, multiplication etc).

  6. Restore the access coordinates.

  7. Apply the access coordinates in write-mode (which involves the obtained results).

And that's not all. Depending on the context, you may need to add extra space before starting the process, or remove extra data created by the process after it has completed.

A complete discussion of the topic is not in the scope of this entry. Or, I may say, I don't feel like explaining all the details; but one thing should be noted: we have three kinds of accessors that might require auto operator application (dot-access, array-index-access and direct symbol access), and three kinds of accesses requiring different setups and in-between operations (auto-op, prefix inc/dec and postfix inc/dec).

After some non-elementary study and some failed tries, a "brute force" attack begun to seem an honorable way out. A total of 9 different procedures to handle each possible combination was still manageable, and had the advantage to minimize the steps the VM would have been required to perform. But, on one side, the mechanism would have been extremely rigid and inelegant, and on the other, it would have been an hell to test and to maintain.

So I studied more, and found a better solution, which was that of adding:

  • Add a "pre-compile" method specific for auto-operators, with parameters allowing the auto-operators to specify what they required the pre-compiled expression to perform.

  • Add more PSteps to the expressions exposing the ability to accept l-values (we might say, assignment requests), taking care of doing what required by auto operators.

The idea stood, and was relatively elegant once fine-tuned. However, I had consistent but apparently unexplainable crashes at destructor of PSteps. PSteps are virtual classes, and Expressions are PSteps. I vaguely remember some warning about the fact of having virtually destroyed classes held statically inside other virtually destroyed ones, but this case didn't seem to match. However, after some hours of debugging, I definitely excluded both double-delete and memory corruption due to i.e. buffer overruns or underruns. The damn thing just crashed -- independently of the operations performed actually; it varied on just how many PSteps were declared as members of other PSteps.

However, I am glad that happened. I was bugged about the size that each expression had. An expression takes about 32 bytes, but adding 5-6 psteps to handle all the possible phases meant it would have gone about 128 bytes. Still acceptable considered the fact that you create them once and run them million times, but ... And also, I was bugged about the 6 (5 + 1) destructors that were called in order to get rid of the items. The destructors themselves are empty, but they must exist to ensure proper memory collection across DLL boundaries. The fact that there was something crashy in the process, which I couldn't track, gave me the final motivation to change the plan.

After all, those PSteps were not even referencing the expression they came from, they were just doing things to the context at the right place in the right moment. They were so, general, so ... standard...

And so, I was enlightened. PSteps are meant to be general instructions, similarly to PCODE of traditional VMs, or to Machine Code interpreted by silicon CPUs. The fact that up to date all the PSteps had a specific purpose was just incidental, not programmatic. All the auto-operators could have made use of standard PSteps that knew nothing about the context or the purposed they were used for. As such, we could expose those standard PSteps in the engine, without any need to be related with the expression where they "belong".

So, I added a StdPSteps class that acts as a storage for the PSteps doing generic tasks, as pushing, popping, duplicating, swapping or saving items in the data stack. This class resides in the engine and is created and destroyed with it. The size of the involved expressions is shrunk, the need for extra creation and destruction steps is obviated and the resulting code is both more elegant and more expressive.

But that was not the enlightenment, even if finding this solution was a pleasurable side effect of it. The real thing that I discovered is the mental process through which I reached this solution, and even the previous solution of having a method specific for pre-compilation of auto-ops, and many, many other "brilliant" solutions that up to that moment I never how I was able to achieve.

I give names to things.

Giving names to a thing is more than defining it. It's mapping it in your knowledge space, making room in your mind to deal with that entity, or we may say, create an entity to deal with. If the name is powerful enough, if the name is the right name, then every problem seem to solve by itself. Every relation with the surroundings comes together, and both the data and the operations come to a natural layout.

Somehow, that remembered me of Earthsea a fantasy saga written by Ursula Le Guin, where the most powerful magic was calling things with their True Names.

In the moment I understood those step were "standard", when I gave a name to those PSteps, dubbing them in a significant way, then everything came down to a simple model where I can present those general entities to their users. Similarly (but then I didn't yet realize this mechanism), when I was able to find a name for the process of accepting auto-ops requirements (and request), dubbing it precompileAutoLValue, I was able to relate that process to how the rest of the system wanted to use it and expected it work. Even if it was a "process" and not a just a "thing", a process is still a "thing", and can be named.

The solution I found is so clear and powerful that I have been able to redesign some common parts, making them simpler, more efficient and more elegant.

But the most precious treasure I had been able to dig up today is the understanding of my own solution finding scheme. I hope it can be useful to you as well.


No comments yet

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it