Skip to main content.

Friday, August 17, 2007

I took a couple of days to experiment in the field of optimization. I wanted to clear the field from some doubt I had about different implementation choices for the final 1.0 VM, and I took a couple of free days I had to do those checks now.

First, I put in back the "big switch" the VM had in the early days. Currently, every VM opcode is managed by a direct function vector call (the opcode being the index of the function vector). I changed from the traditional switch approach to that one just for debugging reasons; in the early days of the VM it was easier to follow the program flows by getting a C stack of the performed operations. This model forced to cache the opcodes in VM variables, and to send the VM around as a parameter of each call; so it was planned to be removed.

Stunningly enough, switching back to the big-switch model turned out to downgrade global performances by more or less 5%. I was nearly shocked. The code was visibly optimized with the big switch, the VM was smaller, the need for accessing variables referenced in the VM all around completely gone, yet everything was slower. It must be because of the size of that switch, which should range in 200-300kb, and it seems the compiler isn't quite able to handle it as it should. So, the big-switch-loop is definitely not going to be used. Other than being clumsy and ugly is even slower...

Then I turned to the parameter decoding issue. Falcon VM supports ternary opcode operators; the opcode parameter decoding is performed in three big switches right before the main opcode processing. Moving from the previous experience, I thought that I could have a performance increase by using direct calls to a parameter decoding routine when needed. The vast majority of Falcon opcodes are unary or binary, (many unary) with about 10 opcodes without parameters, so avoiding the decoding switch when not needed seemed to be a good idea. Also, the decoding routines made possible to dereference the parameters only when needed, and gave simpler access to the fixed parameters in some opcdes, as the jumps.

Well, that was another unpleasant surprise. Again, performance was lost and the downgrade was around 3%; so for now we're keeping the three big switches, however, I kept the code in another branch: the performance problem may be solved in future with better decoding code, while having a per-opcode parameter decoding seems definitely more elegant and simpler to be understood and maintained (once decided to keep the opcode-function-handler model).

The thing that really worked, with a performance boost of 15-17% is the removal of the horrible StackFrame? code. Using standard VM stack as a storage for function stack frame, instead of a different stack frame array, which requires dynamic allocation, made a big difference. The code is currently in another branch, but I will merge it as soon as possible; I will release it with the next releases, which should include the new building model and SVN repository.

There are many other performance tweaks around. In example, I noticed that the FORI opcode, initially thought to be faster than a while loop, is actually slower by about 5-7%; so I may remove the opcode and reproduce it with simpler VM operations. UNPK and UNPS opcodes are yelling for vengeance, and they will be definitely changed with highly optimized code.

One area we need to work are small arrays used to pass and return parameters. Some common usage pattern may easily be optimized without the need to create garbage collectible arrays to be sent around.

Finally, we must begin to reshape some common opcodes, as the compare-and-jump opcodes that are now performed in two steps, where there's rarely the need to do that so.

But that's enough for optimization now. There's place to get better, much better VM performance when we decide to do it. What's more important now is to get more libraries, so that it is possible to do something serious with Falcon.

Btw, an apache integration module should be ready by this week-end.

Comments

No comments yet

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it