16 Oct

Adding loops (MPC -> LLVM for the Neil Language #5)

This is part of a series, the first four parts of the series can be found at:

  1. Hooking up MPC & LLVM
  2. Cleaning up the parser
  3. Adding type identifiers
  4. Adding branching

In this post, we’ll cover how to add loops to our little toy language I’m calling Neil – Not Exactly an Intermediate Language.

To keep things simple, I’ve decided to add loops of the form:

while (<expression> <comparison operator> <expression>) {

Grammar Changes

We need to add a new kind of statement to the grammar, one for our while loops:

stmt : \"return\" <lexp>? ';' 
     | <ident> '(' <ident>? (',' <ident>)* ')' ';' 
     | <typeident> ('=' <lexp>)? ';' 
     | <ident> '=' <lexp> ';' 
     | \"if\" '(' <bexp> ')' '{' <stmt>* '}'
     | \"while\" '(' <bexp> ')' '{' <stmt>* '}' ;

And with this one change, because we have already handled boolean expression in the additions for branching, we can handle our loops.

How to Handle Loops

Loops are basically branching – the only caveat is that we are going to branch backwards to previous, already executed, basic blocks.


For every while statement we create two new basic blocks. Whatever basic block we are in (in the above example one called ‘entry’) will then conditionally enter the loop by branching either to the ‘while_body’ block (that will contain any statements within the while loop), or by branching to the ‘while_merge’ basic block. Within the body of the loop, the ‘while_body’ basic block will then conditionally (based on the bexp part of the grammar change) loop back to itself, or to the ‘while_merge’. This means that all loops converge as the loop finishes – they will always execute ‘while_merge’ whether the loop is entered or not.

Handling Whiles

To handle while statements:

  • we get an LLVMValueRef for the boolean expression – using LLVMBuildICmp or LLVMBuildFCmp to do so
  • once we have our expression, we increment the scope as all symbols need to be in the new scope level
  • we create two new basic blocks, one for ‘while_body’ and one for ‘while_merge’
  • we use LLVMBuildCondBr to branch, based on the LLVMValueRef for the condition, to either ‘while_body’ or ‘while_merge’
  • we then set the LLVMBuilderRef that we are using to build in the ‘while_body’ basic block
  • then we lower the statements in the while statement (which will all be placed within the ‘while_body’ basic block)
  • and after all statements in the while statement have been processed, we re-evaluate the boolean expression for the while loop, then use LLVMBuildCondBr to conditionally branch to ‘while_merge’, or back to ‘while_body’ if the while loop had more iterations required
  • and lastly set the LLVMBuilderRef to add any new statements into the ‘while_merge’ basic block

And it really is that simple! All the changes we made previously to handle if statements meant that this was a really easy change to add to the language.


Now our simple example looks like so:

i32 foo(i32 x) {
  i32 y = x * 5;
  while (y > 13) {
    if (y < 4) { i32 z = x; y = z; }
    y = y + 42;
  return y;
i32 main() {
  return foo(13);

And turns into the following LLVM IR:

define i32 @foo(i32 %x) {
  %y = alloca i32
  %0 = mul i32 %x, 5
  store i32 %0, i32* %y
  %1 = load i32, i32* %y
  %2 = icmp sgt i32 %1, 13
  br i1 %2, label %while_body, label %while_merge

while_body:                     ; preds = %if_merge, %entry
  %3 = load i32, i32* %y
  %4 = icmp slt i32 %3, 4
  br i1 %4, label %if_true, label %if_merge

while_merge:                    ; preds = %if_merge, %entry
  %5 = load i32, i32* %y
  ret i32 %5

if_true:                        ; preds = %while_body
  %z = alloca i32
  store i32 %x, i32* %z
  %6 = load i32, i32* %z
  store i32 %6, i32* %y
  br label %if_merge

if_merge:                       ; preds = %if_true, %while_body
  %7 = load i32, i32* %y
  %8 = add i32 %7, 42
  store i32 %8, i32* %y
  %9 = load i32, i32* %y
  %10 = icmp sgt i32 %9, 13
  br i1 %10, label %while_body, label %while_merge

define i32 @main() {
  %0 = call i32 @foo(i32 13)
  ret i32 %0

You can check out the full GitHub pull request for the feature here.

In the next post, we’ll look into how we can add support for pointers to the language, stay tuned!


2 thoughts on “Adding loops (MPC -> LLVM for the Neil Language #5)

  1. First off, thanks for this series. It’s an interesting read and includes lots of useful LLVM info.

    I’m interested as to why you made the decision to use two blocks rather than 3 to implement the `while` loop. Had you considered having three blocks, one which contains the condition, one with the body and one with the merge? You’d then only ever branch to the merge from the condition block and the main body would always unconditionally jump to the condition block.

    From what I understand LLVM would trivially thread the jumps so there shouldn’t be any real performance difference.

    • So I mostly did it just to keep the code as similar as possible to the if branching code – I’ve been striving as much as possible to minimise the diff from the previous code to keep it as understandable as possible to non-compiler people.

      Also I tend to not fixate on things that I know LLVM can trivially optimise – you’ll notice that I also use alloca for every stack variable allocation, where phi nodes would totally work (and be more optimal). LLVM can handle this stuff with ease so I’ll just let LLVM get on doing what it does best!

Leave a Reply

Your email address will not be published. Required fields are marked *