16 Oct

Adding loops (MPC -> LLVM for the Neil Language #5)

This is part of a series, the first four parts of the series can be found at:

  1. Hooking up MPC & LLVM
  2. Cleaning up the parser
  3. Adding type identifiers
  4. Adding branching

In this post, we’ll cover how to add loops to our little toy language I’m calling Neil – Not Exactly an Intermediate Language.

To keep things simple, I’ve decided to add loops of the form:

while (<expression> <comparison operator> <expression>) {
  <statement>*
}

Grammar Changes

We need to add a new kind of statement to the grammar, one for our while loops:

stmt : \"return\" <lexp>? ';' 
     | <ident> '(' <ident>? (',' <ident>)* ')' ';' 
     | <typeident> ('=' <lexp>)? ';' 
     | <ident> '=' <lexp> ';' 
     | \"if\" '(' <bexp> ')' '{' <stmt>* '}'
     | \"while\" '(' <bexp> ')' '{' <stmt>* '}' ;

And with this one change, because we have already handled boolean expression in the additions for branching, we can handle our loops.

How to Handle Loops

Loops are basically branching – the only caveat is that we are going to branch backwards to previous, already executed, basic blocks.

loops

For every while statement we create two new basic blocks. Whatever basic block we are in (in the above example one called ‘entry’) will then conditionally enter the loop by branching either to the ‘while_body’ block (that will contain any statements within the while loop), or by branching to the ‘while_merge’ basic block. Within the body of the loop, the ‘while_body’ basic block will then conditionally (based on the bexp part of the grammar change) loop back to itself, or to the ‘while_merge’. This means that all loops converge as the loop finishes – they will always execute ‘while_merge’ whether the loop is entered or not.

Handling Whiles

To handle while statements:

  • we get an LLVMValueRef for the boolean expression – using LLVMBuildICmp or LLVMBuildFCmp to do so
  • once we have our expression, we increment the scope as all symbols need to be in the new scope level
  • we create two new basic blocks, one for ‘while_body’ and one for ‘while_merge’
  • we use LLVMBuildCondBr to branch, based on the LLVMValueRef for the condition, to either ‘while_body’ or ‘while_merge’
  • we then set the LLVMBuilderRef that we are using to build in the ‘while_body’ basic block
  • then we lower the statements in the while statement (which will all be placed within the ‘while_body’ basic block)
  • and after all statements in the while statement have been processed, we re-evaluate the boolean expression for the while loop, then use LLVMBuildCondBr to conditionally branch to ‘while_merge’, or back to ‘while_body’ if the while loop had more iterations required
  • and lastly set the LLVMBuilderRef to add any new statements into the ‘while_merge’ basic block

And it really is that simple! All the changes we made previously to handle if statements meant that this was a really easy change to add to the language.

Result

Now our simple example looks like so:

i32 foo(i32 x) {
  i32 y = x * 5;
  while (y > 13) {
    if (y < 4) { i32 z = x; y = z; }
    y = y + 42;
  }
  return y;
}
i32 main() {
  return foo(13);
}

And turns into the following LLVM IR:

define i32 @foo(i32 %x) {
entry:
  %y = alloca i32
  %0 = mul i32 %x, 5
  store i32 %0, i32* %y
  %1 = load i32, i32* %y
  %2 = icmp sgt i32 %1, 13
  br i1 %2, label %while_body, label %while_merge

while_body:                     ; preds = %if_merge, %entry
  %3 = load i32, i32* %y
  %4 = icmp slt i32 %3, 4
  br i1 %4, label %if_true, label %if_merge

while_merge:                    ; preds = %if_merge, %entry
  %5 = load i32, i32* %y
  ret i32 %5

if_true:                        ; preds = %while_body
  %z = alloca i32
  store i32 %x, i32* %z
  %6 = load i32, i32* %z
  store i32 %6, i32* %y
  br label %if_merge

if_merge:                       ; preds = %if_true, %while_body
  %7 = load i32, i32* %y
  %8 = add i32 %7, 42
  store i32 %8, i32* %y
  %9 = load i32, i32* %y
  %10 = icmp sgt i32 %9, 13
  br i1 %10, label %while_body, label %while_merge
}

define i32 @main() {
entry:
  %0 = call i32 @foo(i32 13)
  ret i32 %0
}

You can check out the full GitHub pull request for the feature here.

In the next post, we’ll look into how we can add support for pointers to the language, stay tuned!

 

06 Oct

Adding branching (MPC -> LLVM for the Neil Language #4)

This is part of a series, the first three parts of the series can be found at:

  1. Hooking up MPC & LLVM
  2. Cleaning up the parser
  3. Adding type identifiers

In this post, we’ll cover how to add branching support to our little toy language I’m calling Neil – Not Exactly an Intermediate Language.

To keep things simple, I’ve decided to add branching of the form:

if (<expression> <comparison operator> <expression>) {
  <statement>*
}

With the following caveats:

  • we will not support else branches
  • we will only support <. <=. >, >=, == and != comparison operators

With that in mind, lets get adding it to Neil!

Grammar Changes

We need a new type of expression for our grammar – a boolean expression. This is an expression that evaluates to boolean (using a comparison operator).

bexp : <lexp>
       ('>' | '<' | \">=\" | \"<=\" | \"!=\" | \"==\")
       <lexp> ;                                               

Our boolean expression (bexp in the grammar) consists of a left expression (lexp), followed by one of the possible six supported comparison operators, followed by another lexp.

Now, we can modify statements in the grammar to add a new statement type for if statements.

stmt : \"return\" <lexp>? ';' 
     | <ident> '(' <ident>? (',' <ident>)* ')' ';' 
     | <typeident> ('=' <lexp>)? ';' 
     | <ident> '=' <lexp> ';' 
     | \"if\" '(' <bexp> ')' '{' <stmt>* '}' ;

And that is all the changes we need to the grammar.

How to Handle Branches

When handling branches, we are going to follow a really simple approach.

For every if statement, we’ll create two new basic blocks. Whatever basic block we are currently in will then conditionally branch between both blocks, in the true case it will branch to the ‘if_true’ block, and otherwise to the ‘if_merge’ block. Within the ‘if_true’ block, when it is has completed its conditional statements, it will always branch to the ‘if_merge’ block on exit. This has the really nice property that at the end of every branching sequence we always converge to exactly one active basic block for future statements.

Changing the Symbol Table

One thing of note is that in our Neil language, identifiers can be declared at any place a statement could be. This means that we are allowed to create variables within the if statement. The problem is that at present our symbol table assumes that all symbols declared within a function will be active for the duration of the function. We need to change our symbol table to be aware of the scope that a symbol currently inhabits. A really simple way to do this is to track which scope level a symbol was declared within.

  • when we enter a new function or if statement, we need to increment the scope
  • when we exit the function or if statement, we need to decrement the scope, and remove all symbols declared associated with that scope
  • when inserting symbols into the symbol table, they will be inserted at the current scope level

For simplicity, I use a std::vector of std::map’s, each map in the vector corresponding to a scope level. Then, when we are looking for a symbol we first look in the last element of the std::vector for the symbol, before iterating backwards through the vector. This allows us to reference symbols in higher scope levels too.

Handling Ifs

To handle if statements:

  • we get an LLVMValueRef for the boolean expression – using LLVMBuildICmp or LLVMBuildFCmp to do so
  • once we have our expression, we increment the scope as all symbols need to be in the new scope level
  • we create two new basic blocks, one for ‘if_true’ and one for ‘if_merge’
  • we use LLVMBuildCondBr to branch, based on the LLVMValueRef for the condition, to either ‘if_true’ or ‘if_merge’
  • we then set the LLVMBuilderRef that we are using to build in the ‘if_true’ basic block
  • then we lower the statements in the if statement (which will all be placed within the ‘if_true’ basic block)
  • and after all statements in the if statement have been processed, we use LLVMBuildBr to unconditionally branch to ‘if_merge’
  • and lastly set the LLVMBuilderRef to add any new statements into the ‘if_merge’ basic block

And that’s it! It’s actually quite a simple set of steps to get branches working when you break it down into the constituent parts.

Result

Now our simple example looks like so:

i32 foo(i32 x) {
  i32 y = x * 5;
  if (y < 4) { i32 z = x; y = z; }
  y = y + 42;
  return y;
}
i32 main() {
  return foo(13);
}

And turns into the following LLVM IR:

define i32 @foo(i32 %x) {
entry:
  %y = alloca i32
  %0 = mul i32 %x, 5
  store i32 %0, i32* %y
  %1 = load i32, i32* %y
  %2 = icmp slt i32 %1, 4
  br i1 %2, label %if_true, label %if_merge

if_true:                                    ; preds = %entry
  %z = alloca i32
  store i32 %x, i32* %z
  %3 = load i32, i32* %z
  store i32 %3, i32* %y
  br label %if_merge

if_merge:                                   ; preds = %if_true, %entry
  %4 = load i32, i32* %y
  %5 = add i32 %4, 42
  store i32 %5, i32* %y
  %6 = load i32, i32* %y
  ret i32 %6
}

define i32 @main() {
entry:
  %0 = call i32 @foo(i32 13)
  ret i32 %0
}

You can check out the full GitHub pull request for the feature here.

In the next post, we’ll look into how we can add the other form of useful branching to the language – loops! Stay tuned.