06 Oct

Adding branching (MPC -> LLVM for the Neil Language #4)

This is part of a series, the first three parts of the series can be found at:

  1. Hooking up MPC & LLVM
  2. Cleaning up the parser
  3. Adding type identifiers

In this post, we’ll cover how to add branching support to our little toy language I’m calling Neil – Not Exactly an Intermediate Language.

To keep things simple, I’ve decided to add branching of the form:

if (<expression> <comparison operator> <expression>) {
  <statement>*
}

With the following caveats:

  • we will not support else branches
  • we will only support <. <=. >, >=, == and != comparison operators

With that in mind, lets get adding it to Neil!

Grammar Changes

We need a new type of expression for our grammar – a boolean expression. This is an expression that evaluates to boolean (using a comparison operator).

bexp : <lexp>
       ('>' | '<' | \">=\" | \"<=\" | \"!=\" | \"==\")
       <lexp> ;                                               

Our boolean expression (bexp in the grammar) consists of a left expression (lexp), followed by one of the possible six supported comparison operators, followed by another lexp.

Now, we can modify statements in the grammar to add a new statement type for if statements.

stmt : \"return\" <lexp>? ';' 
     | <ident> '(' <ident>? (',' <ident>)* ')' ';' 
     | <typeident> ('=' <lexp>)? ';' 
     | <ident> '=' <lexp> ';' 
     | \"if\" '(' <bexp> ')' '{' <stmt>* '}' ;

And that is all the changes we need to the grammar.

How to Handle Branches

When handling branches, we are going to follow a really simple approach.

For every if statement, we’ll create two new basic blocks. Whatever basic block we are currently in will then conditionally branch between both blocks, in the true case it will branch to the ‘if_true’ block, and otherwise to the ‘if_merge’ block. Within the ‘if_true’ block, when it is has completed its conditional statements, it will always branch to the ‘if_merge’ block on exit. This has the really nice property that at the end of every branching sequence we always converge to exactly one active basic block for future statements.

Changing the Symbol Table

One thing of note is that in our Neil language, identifiers can be declared at any place a statement could be. This means that we are allowed to create variables within the if statement. The problem is that at present our symbol table assumes that all symbols declared within a function will be active for the duration of the function. We need to change our symbol table to be aware of the scope that a symbol currently inhabits. A really simple way to do this is to track which scope level a symbol was declared within.

  • when we enter a new function or if statement, we need to increment the scope
  • when we exit the function or if statement, we need to decrement the scope, and remove all symbols declared associated with that scope
  • when inserting symbols into the symbol table, they will be inserted at the current scope level

For simplicity, I use a std::vector of std::map’s, each map in the vector corresponding to a scope level. Then, when we are looking for a symbol we first look in the last element of the std::vector for the symbol, before iterating backwards through the vector. This allows us to reference symbols in higher scope levels too.

Handling Ifs

To handle if statements:

  • we get an LLVMValueRef for the boolean expression – using LLVMBuildICmp or LLVMBuildFCmp to do so
  • once we have our expression, we increment the scope as all symbols need to be in the new scope level
  • we create two new basic blocks, one for ‘if_true’ and one for ‘if_merge’
  • we use LLVMBuildCondBr to branch, based on the LLVMValueRef for the condition, to either ‘if_true’ or ‘if_merge’
  • we then set the LLVMBuilderRef that we are using to build in the ‘if_true’ basic block
  • then we lower the statements in the if statement (which will all be placed within the ‘if_true’ basic block)
  • and after all statements in the if statement have been processed, we use LLVMBuildBr to unconditionally branch to ‘if_merge’
  • and lastly set the LLVMBuilderRef to add any new statements into the ‘if_merge’ basic block

And that’s it! It’s actually quite a simple set of steps to get branches working when you break it down into the constituent parts.

Result

Now our simple example looks like so:

i32 foo(i32 x) {
  i32 y = x * 5;
  if (y < 4) { i32 z = x; y = z; }
  y = y + 42;
  return y;
}
i32 main() {
  return foo(13);
}

And turns into the following LLVM IR:

define i32 @foo(i32 %x) {
entry:
  %y = alloca i32
  %0 = mul i32 %x, 5
  store i32 %0, i32* %y
  %1 = load i32, i32* %y
  %2 = icmp slt i32 %1, 4
  br i1 %2, label %if_true, label %if_merge

if_true:                                    ; preds = %entry
  %z = alloca i32
  store i32 %x, i32* %z
  %3 = load i32, i32* %z
  store i32 %3, i32* %y
  br label %if_merge

if_merge:                                   ; preds = %if_true, %entry
  %4 = load i32, i32* %y
  %5 = add i32 %4, 42
  store i32 %5, i32* %y
  %6 = load i32, i32* %y
  ret i32 %6
}

define i32 @main() {
entry:
  %0 = call i32 @foo(i32 13)
  ret i32 %0
}

You can check out the full GitHub pull request for the feature here.

In the next post, we’ll look into how we can add the other form of useful branching to the language – loops! Stay tuned.

Leave a Reply

Your email address will not be published. Required fields are marked *