Cleaning Up After Last Episode

In the previous update, I detailed how the entire middle and back end of the compiler fell apart. I had managed to get pointers working, but doing so exposed all the weaknesses in my naive first implementations of everything. Since then I have fixed up the IR to a much more capable form, reworked the entire code generator, and implemented proper register allocation.

A Real IR

Previously, I had taken the lazy approach of treating the IR as a single linear block of three-address code. This worked fine for the simplistic register allocator I threw together, but with no notion of branching it wasn’t very powerful. The main thing it lacked was the concept of lifetime differences within different branches of code. The new IR is a much more traditional system, where each function is linearized into a sequence of “basic blocks”, each of which contains some code. It differs from a typical basic block implementation in one critical way though - normally a basic block is a section of straight-line code. This would mean that control flows linearly from the start of a block to the end. In my representation, it made more sense to allow branching within blocks, but jumps and calls will only ever target the beginning of a basic block.

Linear Scan Register Allocation

The need for (and subsequent implementation of) the register allocator went hand-in-hand with the IR improvements. In order to get things back to a working state, the lazy first-come-first-serve allocator with no spilling capabilities had to go. In its place is an implementation of linear scan allocation, with similar state save and restore functionality as before. The allocator itself isn’t anything too special, but it took a lot of machinery under the hood to get it to work properly. The pseudocode in the paper I referenced last article really doesn’t consider much by way of details, so a lot of the development time was spent simply trying and tweaking many iterations of supporting code. Now that the allocator properly supports spilling variables, the stack-based state tracking system I’ve been using for branch control has been beefed up a lot too. Now that variables can actually be spilled, it has a lot more work to do in terms of shuffling values around to make sure state remains consistent between different branch exits.

Code Generation

After reworking the IR and register allocator, putting them to use in the code generator was much easier than before. Overall it’s now a lot easier to see what’s going on and to tweak and build functionality on top of what’s already there. It’s much less of a mess than how I was doing it before, and adding new features should be much more elegant to build up on top of what’s already there.

Show me Some Results!

Admittedly, a lot of what’s discussed in this post is pretty hand-wavy and tough to get a mental model of just from reading. I wanted to throw together some diagrams to include, but it’s already been so long since last post that I decided it’d just be better to get this out there. Now that the whole compilation pipeline is really starting to gel together, I’m going to be able to broaden my focus a bit. I want to cut down on the wordiness of these posts and move towards a more engaging and easily digestible format. There are still a lot of big features on the horizon for the compiler, but I want to branch out into some visualization and profiling tools. The main things I’m looking at right now are some sort of visualiztion for the AST and IR. Keep an eye out for those as a follow-up to the text explanations from this post!