01 Jun

So long Codeplay, thanks for all the fish!

After 11 years since my first internship, and 9 years full time, I’m leaving Codeplay to join AMD.

A Brief History of Codeplay and I

I was a nineteen year old ned when I started my first internship at Codeplay, and was given the code-monkey job of hand optimizing shaders for an OpenGL ES 2.0 shader compiler, so that Codeplay’s compiler engineers could prove the capability of the compiler. Slated to take 10 weeks, I finished the task in 2 weeks and to stave off boredom in the remaining 8 weeks I ended up proving 100% code coverage of the compiler and writing a test generator to break it.

From these humble beginnings I came back for a second internship where I rewrote the Testplay system so that it’d work on Windows & Linux.

When I joined full time, it was with the promise that I’d be doing PlayStation 3 development. I shipped my first title (NASCAR The Game: 2011) and got to make some SPUs very happy in the process. I felt at home doing games technology, but Codeplay’s market drifted from primarily in games to doing driver development.

For 5 years I shipped ~7 OpenCL implementations in the mobile & embedded space, becoming an expert in integrating and optimization in LLVM in the process.

Then Vulkan came around, and I knew here was the thing I had been waiting for. Having worked for so long on OpenCL, the deep-seated flaws in the API were very clear to me, and I wanted to be sure that Vulkan did not make the same mistakes. After joining Khronos, I proceeded to get Vulkan 1.0 out the door, then focused primarily on getting the Vulkan 1.1 subgroup functionality shipped and into developers hands.

AMD’s Enticing Offer

With my history in Vulkan & SPIR-V, and my LLVM & driver experience – AMD enticed me with an awesome job. I’d get to spend 100% of my time doing Vulkan & SPIR-V, I’d get to work on the Vulkan driver, I’d get to use my skills in LLVM, and I’d still be able to work within Khronos to get extensions and functionality that I know developers will love.

The entire brief was summed up by Rys as ‘Make things fast anywhere you want!’.

The icing on the cake was the opportunity to work for Rys Sommefeldt and closely with Timothy Lottes – two people that are legends in the industry to me.

What more was there to say other than ‘Sign me the fuck up!’

TL;DR

So today, the 1st of June 2018 is my last day at Codeplay. I leave behind many friends and co-workers that I’ll naturally miss, I wish you all the best in your continuing careers!

And on the 18th June 2018 I’ll start the next chapter in my life at AMD. Super excited doesn’t begin to cover it.

07 Sep

I’m speaking at the Munich Khronos Chapter Meeting 13th October 2017

Previously I had begun a series of blog posts detailing how to port applications from OpenCL -> Vulkan.

  1. OpenCL -> Vulkan: A Porting Guide (#1)
  2. OpenCL -> Vulkan: A Porting Guide (#2)
  3. OpenCL -> Vulkan: A Porting Guide (#3)

Instead of continuing this blog series, I’m converting the entire contents into a slide deck, and will be presenting it at the Munich Khronos Chapter meeting on the 13th of October 2017.

So please come along and watch myself, and the other great speakers, talk about some fun things you can do with Vulkan!

Look forward to seeing y’all there.

16 Oct

Adding loops (MPC -> LLVM for the Neil Language #5)

This is part of a series, the first four parts of the series can be found at:

  1. Hooking up MPC & LLVM
  2. Cleaning up the parser
  3. Adding type identifiers
  4. Adding branching

In this post, we’ll cover how to add loops to our little toy language I’m calling Neil – Not Exactly an Intermediate Language.

To keep things simple, I’ve decided to add loops of the form:

while (<expression> <comparison operator> <expression>) {
  <statement>*
}

Grammar Changes

We need to add a new kind of statement to the grammar, one for our while loops:

stmt : \"return\" <lexp>? ';' 
     | <ident> '(' <ident>? (',' <ident>)* ')' ';' 
     | <typeident> ('=' <lexp>)? ';' 
     | <ident> '=' <lexp> ';' 
     | \"if\" '(' <bexp> ')' '{' <stmt>* '}'
     | \"while\" '(' <bexp> ')' '{' <stmt>* '}' ;

And with this one change, because we have already handled boolean expression in the additions for branching, we can handle our loops.

How to Handle Loops

Loops are basically branching – the only caveat is that we are going to branch backwards to previous, already executed, basic blocks.

loops

For every while statement we create two new basic blocks. Whatever basic block we are in (in the above example one called ‘entry’) will then conditionally enter the loop by branching either to the ‘while_body’ block (that will contain any statements within the while loop), or by branching to the ‘while_merge’ basic block. Within the body of the loop, the ‘while_body’ basic block will then conditionally (based on the bexp part of the grammar change) loop back to itself, or to the ‘while_merge’. This means that all loops converge as the loop finishes – they will always execute ‘while_merge’ whether the loop is entered or not.

Handling Whiles

To handle while statements:

  • we get an LLVMValueRef for the boolean expression – using LLVMBuildICmp or LLVMBuildFCmp to do so
  • once we have our expression, we increment the scope as all symbols need to be in the new scope level
  • we create two new basic blocks, one for ‘while_body’ and one for ‘while_merge’
  • we use LLVMBuildCondBr to branch, based on the LLVMValueRef for the condition, to either ‘while_body’ or ‘while_merge’
  • we then set the LLVMBuilderRef that we are using to build in the ‘while_body’ basic block
  • then we lower the statements in the while statement (which will all be placed within the ‘while_body’ basic block)
  • and after all statements in the while statement have been processed, we re-evaluate the boolean expression for the while loop, then use LLVMBuildCondBr to conditionally branch to ‘while_merge’, or back to ‘while_body’ if the while loop had more iterations required
  • and lastly set the LLVMBuilderRef to add any new statements into the ‘while_merge’ basic block

And it really is that simple! All the changes we made previously to handle if statements meant that this was a really easy change to add to the language.

Result

Now our simple example looks like so:

i32 foo(i32 x) {
  i32 y = x * 5;
  while (y > 13) {
    if (y < 4) { i32 z = x; y = z; }
    y = y + 42;
  }
  return y;
}
i32 main() {
  return foo(13);
}

And turns into the following LLVM IR:

define i32 @foo(i32 %x) {
entry:
  %y = alloca i32
  %0 = mul i32 %x, 5
  store i32 %0, i32* %y
  %1 = load i32, i32* %y
  %2 = icmp sgt i32 %1, 13
  br i1 %2, label %while_body, label %while_merge

while_body:                     ; preds = %if_merge, %entry
  %3 = load i32, i32* %y
  %4 = icmp slt i32 %3, 4
  br i1 %4, label %if_true, label %if_merge

while_merge:                    ; preds = %if_merge, %entry
  %5 = load i32, i32* %y
  ret i32 %5

if_true:                        ; preds = %while_body
  %z = alloca i32
  store i32 %x, i32* %z
  %6 = load i32, i32* %z
  store i32 %6, i32* %y
  br label %if_merge

if_merge:                       ; preds = %if_true, %while_body
  %7 = load i32, i32* %y
  %8 = add i32 %7, 42
  store i32 %8, i32* %y
  %9 = load i32, i32* %y
  %10 = icmp sgt i32 %9, 13
  br i1 %10, label %while_body, label %while_merge
}

define i32 @main() {
entry:
  %0 = call i32 @foo(i32 13)
  ret i32 %0
}

You can check out the full GitHub pull request for the feature here.

In the next post, we’ll look into how we can add support for pointers to the language, stay tuned!

 

06 Oct

Adding branching (MPC -> LLVM for the Neil Language #4)

This is part of a series, the first three parts of the series can be found at:

  1. Hooking up MPC & LLVM
  2. Cleaning up the parser
  3. Adding type identifiers

In this post, we’ll cover how to add branching support to our little toy language I’m calling Neil – Not Exactly an Intermediate Language.

To keep things simple, I’ve decided to add branching of the form:

if (<expression> <comparison operator> <expression>) {
  <statement>*
}

With the following caveats:

  • we will not support else branches
  • we will only support <. <=. >, >=, == and != comparison operators

With that in mind, lets get adding it to Neil!

Grammar Changes

We need a new type of expression for our grammar – a boolean expression. This is an expression that evaluates to boolean (using a comparison operator).

bexp : <lexp>
       ('>' | '<' | \">=\" | \"<=\" | \"!=\" | \"==\")
       <lexp> ;                                               

Our boolean expression (bexp in the grammar) consists of a left expression (lexp), followed by one of the possible six supported comparison operators, followed by another lexp.

Now, we can modify statements in the grammar to add a new statement type for if statements.

stmt : \"return\" <lexp>? ';' 
     | <ident> '(' <ident>? (',' <ident>)* ')' ';' 
     | <typeident> ('=' <lexp>)? ';' 
     | <ident> '=' <lexp> ';' 
     | \"if\" '(' <bexp> ')' '{' <stmt>* '}' ;

And that is all the changes we need to the grammar.

How to Handle Branches

When handling branches, we are going to follow a really simple approach.

For every if statement, we’ll create two new basic blocks. Whatever basic block we are currently in will then conditionally branch between both blocks, in the true case it will branch to the ‘if_true’ block, and otherwise to the ‘if_merge’ block. Within the ‘if_true’ block, when it is has completed its conditional statements, it will always branch to the ‘if_merge’ block on exit. This has the really nice property that at the end of every branching sequence we always converge to exactly one active basic block for future statements.

Changing the Symbol Table

One thing of note is that in our Neil language, identifiers can be declared at any place a statement could be. This means that we are allowed to create variables within the if statement. The problem is that at present our symbol table assumes that all symbols declared within a function will be active for the duration of the function. We need to change our symbol table to be aware of the scope that a symbol currently inhabits. A really simple way to do this is to track which scope level a symbol was declared within.

  • when we enter a new function or if statement, we need to increment the scope
  • when we exit the function or if statement, we need to decrement the scope, and remove all symbols declared associated with that scope
  • when inserting symbols into the symbol table, they will be inserted at the current scope level

For simplicity, I use a std::vector of std::map’s, each map in the vector corresponding to a scope level. Then, when we are looking for a symbol we first look in the last element of the std::vector for the symbol, before iterating backwards through the vector. This allows us to reference symbols in higher scope levels too.

Handling Ifs

To handle if statements:

  • we get an LLVMValueRef for the boolean expression – using LLVMBuildICmp or LLVMBuildFCmp to do so
  • once we have our expression, we increment the scope as all symbols need to be in the new scope level
  • we create two new basic blocks, one for ‘if_true’ and one for ‘if_merge’
  • we use LLVMBuildCondBr to branch, based on the LLVMValueRef for the condition, to either ‘if_true’ or ‘if_merge’
  • we then set the LLVMBuilderRef that we are using to build in the ‘if_true’ basic block
  • then we lower the statements in the if statement (which will all be placed within the ‘if_true’ basic block)
  • and after all statements in the if statement have been processed, we use LLVMBuildBr to unconditionally branch to ‘if_merge’
  • and lastly set the LLVMBuilderRef to add any new statements into the ‘if_merge’ basic block

And that’s it! It’s actually quite a simple set of steps to get branches working when you break it down into the constituent parts.

Result

Now our simple example looks like so:

i32 foo(i32 x) {
  i32 y = x * 5;
  if (y < 4) { i32 z = x; y = z; }
  y = y + 42;
  return y;
}
i32 main() {
  return foo(13);
}

And turns into the following LLVM IR:

define i32 @foo(i32 %x) {
entry:
  %y = alloca i32
  %0 = mul i32 %x, 5
  store i32 %0, i32* %y
  %1 = load i32, i32* %y
  %2 = icmp slt i32 %1, 4
  br i1 %2, label %if_true, label %if_merge

if_true:                                    ; preds = %entry
  %z = alloca i32
  store i32 %x, i32* %z
  %3 = load i32, i32* %z
  store i32 %3, i32* %y
  br label %if_merge

if_merge:                                   ; preds = %if_true, %entry
  %4 = load i32, i32* %y
  %5 = add i32 %4, 42
  store i32 %5, i32* %y
  %6 = load i32, i32* %y
  ret i32 %6
}

define i32 @main() {
entry:
  %0 = call i32 @foo(i32 13)
  ret i32 %0
}

You can check out the full GitHub pull request for the feature here.

In the next post, we’ll look into how we can add the other form of useful branching to the language – loops! Stay tuned.

25 Sep

spirv-stats update – exposing more information

In a previous post I introduced a little command line tool spirv-stats I’ve been working on. Since doing the initial version, I’ve extended the information the tool will give you based on some queries I had on the SPIR-V binaries we were using – in the GitHub pull request here.

For the most commonly used opcodes, I’ve tried to break them down to understand a little more about the shape of the opcodes.

OpLoad & OpStore

For OpLoad and OpStore – they have an optional additional parameter for a memory access. So I wondered, given the SPIR-V shaders we have as input, how many of the OpLoad’s and OpStore’s in the SPIR-V have the optional memory access literal? It turns out none of them do!

OpDecorate & OpMemberDecorate

For OpDecorate the first thing I wanted to know was how many of the decorations used had any additional literals. It turns out that 70% of OpDecorate’s have no additional literal, and the remaining 30% has one additional literal. The next query I had was what kind of decorations were mostly used in the SPIR-V shaders? It turns out the most used decoration was the RelaxedPrecision decoration with 66% of the uses of OpDecorate just for this one. The next most used was the Location decoration with 11%. I then extended these checks over to OpMemberDecorate, and it turns out that 90% of decorations on OpMemberDecorate have one literal! The reason is because a cool 84% of the decorations used on OpMemberDecorate are for encoding the Offset of struct members.

OpAccessChain

OpAccessChain can have an arbitrarily long set of IDs used to index into the pointer object. So I wondered how many of these were using a small number of indices? It turns out that 78% of the uses of OpAccessChain had only one index, 19% have two indices, and a mere 2% have three indices.

OpVariable

I wondered how many variables as used in our SPIR-V shaders had initializers (they have an initial value). None of them do! Of all 19041 uses of OpVariable not one had an initializer.

OpConstant

Of the constants used in the SPIR-V shaders, all of them use exactly one literal. This is unsurprising because int64/double types are not widely supported or used in shaders, but I wanted to be sure.

OpVectorShuffle

The first thing I wanted to know about OpVectorShuffle was how many literals were being used when shuffling the vectors – remember that the number of literals corresponds to the width of the output vector. It turns out that 31% of shuffles have two literals (a common case when extracting from a vec4 the indices into an image sample), 45% of shuffles have three literals, and 23% have four literals. The next question I had was to do with the undef literal that can be used in shuffle. 0xFFFFFFFFu (-1 in signed) can be used to signify that that element of the resulting vector is undefined. I wondered if the SPIR-V shaders we had were using this? It turns out none of them are (currently at least). The next question I had was how many shuffles were using literals lower than 4, and lower than 8. These two numbers would be if you were shuffling an individual vec4, or shuffling two vec4. 82% of the shuffles are using literals lower than 4 – so this could either be shuffling two vec2’s together, or one vec4. The next question then is how many OpShuffle’s are using the same vector ID in both vector components. This pattern is used when you actually only want to shuffle elements from the one vector. Well it turns out exactly 82% of shuffles were using both vectors the same!

OpCompositeExtract & OpCompositeConstruct

The last two opcodes that I have looked at currently were OpCompositeExtract and OpCompositeConstruct. For both I wondered how many what were the common number of literals being used? For extract, 97% were using exactly one literal. and 3% were using two literals. For construct, 17% used one literal, 41% used two literals, 41% used three literals. Also, for extract, I wondered how many of the extracts were being used to pull a single element out of a vector. So I checked how many times the literal was zero to three. Roughly 26% were accessing the zeroth, first or second, and 20% the third.

Sample Output

Below is a sample output run over the shaders that smol-v uses for testing. The changes in the latest version from the previous are highlighted in red.

Totals: 286932 hits 4985312 bytes
                        OpLoad[ 61] =  49915 hits (17.40%) 798640 bytes (16.02%)
                                    0 hits with memory access  0.00%
                       OpStore[ 62] =  29233 hits (10.19%) 350796 bytes ( 7.04%)
                                    0 hits with memory access  0.00%
                    OpDecorate[ 71] =  23770 hits ( 8.28%) 312964 bytes ( 6.28%)
                                16839 hits with no literals   70.84%
                                 6931 hits with 1 literal     29.16%
                                    0 hits with 2+ literals    0.00%
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                15742 hits of decoration  0   66.23%
                                 1036 hits of decoration  2    4.36%
                                    1 hits of decoration  3    0.00%
                                  971 hits of decoration  6    4.08%
                                  206 hits of decoration 11    0.87%
                                    6 hits of decoration 14    0.03%
                                   17 hits of decoration 15    0.07%
                                   37 hits of decoration 18    0.16%
                                 2648 hits of decoration 30   11.14%
                                 1517 hits of decoration 33    6.38%
                                 1589 hits of decoration 34    6.68%
                 OpAccessChain[ 65] =  20116 hits ( 7.01%) 421496 bytes ( 8.45%)
                                    0 hits with 0 indices      0.00%
                                15768 hits with 1 index       78.39%
                                 3904 hits with 2 indices     19.41%
                                  442 hits with 3 indices      2.20%
                                    2 hits with 4 indices      0.01%
                                    0 hits with 5+ indices     0.00%
                    OpVariable[ 59] =  19041 hits ( 6.64%) 304656 bytes ( 6.11%)
                                    0 hits with initializer    0.00%
              OpMemberDecorate[ 72] =  14332 hits ( 4.99%) 280916 bytes ( 5.63%)
                                 1431 hits with no literals    9.98%
                                12901 hits with 1 literal     90.02%
                                    0 hits with 2+ literals    0.00%
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                  850 hits of decoration  0    5.93%
                                  579 hits of decoration  4    4.04%
                                  579 hits of decoration  7    4.04%
                                  180 hits of decoration 11    1.26%
                                    2 hits of decoration 24    0.01%
                                12142 hits of decoration 35   84.72%
                    OpConstant[ 43] =  10823 hits ( 3.77%) 173168 bytes ( 3.47%)
                                10823 hits have 1 literal    100.00%
                                    0 hits have 2+ literals    0.00%
                       OpLabel[248] =   9915 hits ( 3.46%)  79320 bytes ( 1.59%)
               OpVectorShuffle[ 79] =   9732 hits ( 3.39%) 308372 bytes ( 6.19%)
                                    0 hits with 0 literals     0.00%
                                    0 hits with 1 literal      0.00%
                                 3045 hits with 2 literals    31.29%
                                 4405 hits with 3 literals    45.26%
                                 2282 hits with 4 literals    23.45%
                                    0 hits with 5+ literals    0.00%
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    0 hits with undef literal  0.00%
                                 7980 hits with literals < 4  82.00%
                                 1752 hits with literals < 8  18.00%
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                 7980 hits with same vector   82.00%
            OpCompositeExtract[ 81] =   9595 hits ( 3.34%) 193220 bytes ( 3.88%)
                                    0 hits with 0 literals     0.00%
                                 9265 hits with 1 literal     96.56%
                                  330 hits with 2 literals     3.44%
                                    0 hits with 3+ literals   0.00%
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                 2547 hits with literal =  0  26.55%
                                 2414 hits with literal =  1  25.16%
                                 2398 hits with literal =  2  24.99%
                                 1906 hits with literal =  3  19.86%
                                    0 hits with literal =  4   0.00%
                        OpName[  5] =   9233 hits ( 3.22%) 164092 bytes ( 3.29%)
                        OpFMul[133] =   8532 hits ( 2.97%) 170640 bytes ( 3.42%)
          OpCompositeConstruct[ 80] =   6678 hits ( 2.33%) 166680 bytes ( 3.34%)
                                    0 hits with 0 literals     0.00%
                                 1161 hits with 1 literal     17.39%
                                 2754 hits with 2 literals    41.24%
                                 2763 hits with 3 literals    41.37%
                                    0 hits with 4 literals     0.00%
                                    0 hits with 5+ literals    0.00%
                        OpFAdd[129] =   5922 hits ( 2.06%) 118440 bytes ( 2.38%)
                 OpTypePointer[ 32] =   5486 hits ( 1.91%)  87776 bytes ( 1.76%)
                     OpExtInst[ 12] =   5257 hits ( 1.83%) 145980 bytes ( 2.93%)
                      OpBranch[249] =   5229 hits ( 1.82%)  41832 bytes ( 0.84%)
           OpBranchConditional[250] =   3193 hits ( 1.11%)  51088 bytes ( 1.02%)
              OpSelectionMerge[247] =   3109 hits ( 1.08%)  37308 bytes ( 0.75%)
                        OpFSub[131] =   2668 hits ( 0.93%)  53360 bytes ( 1.07%)
                  OpMemberName[  6] =   2507 hits ( 0.87%)  78044 bytes ( 1.57%)
                OpFunctionCall[ 57] =   2198 hits ( 0.77%)  58520 bytes ( 1.17%)
           OpConstantComposite[ 44] =   2155 hits ( 0.75%)  50928 bytes ( 1.02%)
           OpFunctionParameter[ 55] =   2117 hits ( 0.74%)  25404 bytes ( 0.51%)
                         OpDot[148] =   1911 hits ( 0.67%)  38220 bytes ( 0.77%)
           OpVectorTimesScalar[142] =   1488 hits ( 0.52%)  29760 bytes ( 0.60%)
                    OpFunction[ 54] =   1398 hits ( 0.49%)  27960 bytes ( 0.56%)
                 OpFunctionEnd[ 56] =   1398 hits ( 0.49%)   5592 bytes ( 0.11%)
                OpTypeFunction[ 33] =   1175 hits ( 0.41%)  21076 bytes ( 0.42%)
                  OpTypeVector[ 23] =   1110 hits ( 0.39%)  17760 bytes ( 0.36%)
                  OpTypeStruct[ 30] =   1065 hits ( 0.37%)  58496 bytes ( 1.17%)
                   OpTypeArray[ 28] =   1038 hits ( 0.36%)  16608 bytes ( 0.33%)
                     OpFNegate[127] =   1038 hits ( 0.36%)  16608 bytes ( 0.33%)
      OpImageSampleImplicitLod[ 87] =    969 hits ( 0.34%)  19660 bytes ( 0.39%)
                        OpFDiv[136] =    961 hits ( 0.33%)  19220 bytes ( 0.39%)
                 OpReturnValue[254] =    928 hits ( 0.32%)   7424 bytes ( 0.15%)
                   OpFOrdEqual[180] =    722 hits ( 0.25%)  14440 bytes ( 0.29%)
                     OpTypeInt[ 21] =    661 hits ( 0.23%)  10576 bytes ( 0.21%)
           OpFOrdLessThanEqual[188] =    595 hits ( 0.21%)  11900 bytes ( 0.24%)
                OpFOrdLessThan[184] =    588 hits ( 0.20%)  11760 bytes ( 0.24%)
                      OpIEqual[170] =    586 hits ( 0.20%)  11720 bytes ( 0.24%)
                      OpReturn[253] =    525 hits ( 0.18%)   2100 bytes ( 0.04%)
                        OpIAdd[128] =    465 hits ( 0.16%)   9300 bytes ( 0.19%)
                   OpTypeImage[ 25] =    437 hits ( 0.15%)  15732 bytes ( 0.32%)
            OpTypeSampledImage[ 27] =    437 hits ( 0.15%)   5244 bytes ( 0.11%)
        OpFOrdGreaterThanEqual[190] =    412 hits ( 0.14%)   8240 bytes ( 0.17%)
             OpFOrdGreaterThan[186] =    391 hits ( 0.14%)   7820 bytes ( 0.16%)
      OpImageSampleExplicitLod[ 88] =    376 hits ( 0.13%)  11128 bytes ( 0.22%)
                  OpCapability[ 17] =    372 hits ( 0.13%)   2976 bytes ( 0.06%)
                 OpMemoryModel[ 14] =    341 hits ( 0.12%)   4092 bytes ( 0.08%)
                  OpEntryPoint[ 15] =    341 hits ( 0.12%)  17808 bytes ( 0.36%)
                    OpTypeVoid[ 19] =    341 hits ( 0.12%)   2728 bytes ( 0.05%)
               OpExtInstImport[ 11] =    341 hits ( 0.12%)   8184 bytes ( 0.16%)
                   OpTypeFloat[ 22] =    341 hits ( 0.12%)   4092 bytes ( 0.08%)
                  OpLogicalAnd[167] =    331 hits ( 0.12%)   6620 bytes ( 0.13%)
                         OpPhi[245] =    281 hits ( 0.10%)   7868 bytes ( 0.16%)
                  OpTypeMatrix[ 24] =    255 hits ( 0.09%)   4080 bytes ( 0.08%)
               OpExecutionMode[ 16] =    235 hits ( 0.08%)   2852 bytes ( 0.06%)
  OpImageSampleDrefExplicitLod[ 90] =    226 hits ( 0.08%)   7232 bytes ( 0.15%)
                    OpTypeBool[ 20] =    212 hits ( 0.07%)   1696 bytes ( 0.03%)
           OpVectorTimesMatrix[144] =    194 hits ( 0.07%)   3880 bytes ( 0.08%)
             OpSourceExtension[  4] =    167 hits ( 0.06%)   5732 bytes ( 0.11%)
                  OpLogicalNot[168] =    160 hits ( 0.06%)   2560 bytes ( 0.05%)
                      OpSource[  3] =    141 hits ( 0.05%)   1692 bytes ( 0.03%)
                        OpIMul[132] =    135 hits ( 0.05%)   2700 bytes ( 0.05%)
                 OpConvertSToF[111] =    116 hits ( 0.04%)   1856 bytes ( 0.04%)
                        OpFMod[141] =    114 hits ( 0.04%)   2280 bytes ( 0.05%)
                   OpLogicalOr[166] =     93 hits ( 0.03%)   1860 bytes ( 0.04%)
                   OpSLessThan[177] =     92 hits ( 0.03%)   1840 bytes ( 0.04%)
                   OpLoopMerge[246] =     84 hits ( 0.03%)   1344 bytes ( 0.03%)
                      OpSelect[169] =     68 hits ( 0.02%)   1632 bytes ( 0.03%)
                 OpConvertFToS[110] =     67 hits ( 0.02%)   1072 bytes ( 0.02%)
                     OpBitcast[124] =     66 hits ( 0.02%)   1056 bytes ( 0.02%)
           OpMatrixTimesVector[145] =     48 hits ( 0.02%)    960 bytes ( 0.02%)
            OpShiftLeftLogical[196] =     46 hits ( 0.02%)    920 bytes ( 0.02%)
                OpFOrdNotEqual[182] =     43 hits ( 0.01%)    860 bytes ( 0.02%)
                        OpKill[252] =     40 hits ( 0.01%)    160 bytes ( 0.00%)
           OpSGreaterThanEqual[175] =     39 hits ( 0.01%)    780 bytes ( 0.02%)
           OpMatrixTimesScalar[143] =     32 hits ( 0.01%)    640 bytes ( 0.01%)
              OpSLessThanEqual[179] =     21 hits ( 0.01%)    420 bytes ( 0.01%)
                        OpISub[130] =     21 hits ( 0.01%)    420 bytes ( 0.01%)
           OpMatrixTimesMatrix[146] =     15 hits ( 0.01%)    300 bytes ( 0.01%)
                        OpSDiv[135] =     12 hits ( 0.00%)    240 bytes ( 0.00%)
                      OpFwidth[209] =      8 hits ( 0.00%)    128 bytes ( 0.00%)
                 OpConvertUToF[112] =      6 hits ( 0.00%)     96 bytes ( 0.00%)
                   OpTranspose[ 84] =      6 hits ( 0.00%)     96 bytes ( 0.00%)
                  OpEmitVertex[218] =      5 hits ( 0.00%)     20 bytes ( 0.00%)
                   OpINotEqual[171] =      5 hits ( 0.00%)    100 bytes ( 0.00%)
               OpConstantFalse[ 42] =      5 hits ( 0.00%)     60 bytes ( 0.00%)
                OpConstantTrue[ 41] =      5 hits ( 0.00%)     60 bytes ( 0.00%)
                         OpAny[154] =      4 hits ( 0.00%)     64 bytes ( 0.00%)
              OpControlBarrier[224] =      4 hits ( 0.00%)     64 bytes ( 0.00%)
             OpLogicalNotEqual[165] =      3 hits ( 0.00%)     60 bytes ( 0.00%)
                     OpSNegate[126] =      3 hits ( 0.00%)     48 bytes ( 0.00%)
        OpVectorExtractDynamic[ 77] =      3 hits ( 0.00%)     60 bytes ( 0.00%)
                OpEndPrimitive[219] =      2 hits ( 0.00%)      8 bytes ( 0.00%)
                        OpDPdy[208] =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                        OpDPdx[207] =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                   OpULessThan[176] =      2 hits ( 0.00%)     40 bytes ( 0.00%)
                        OpUMod[137] =      2 hits ( 0.00%)     40 bytes ( 0.00%)
                OpSGreaterThan[173] =      1 hits ( 0.00%)     20 bytes ( 0.00%)
             OpCompositeInsert[ 82] =      1 hits ( 0.00%)     24 bytes ( 0.00%)
            OpTypeRuntimeArray[ 29] =      1 hits ( 0.00%)     12 bytes ( 0.00%)
                       OpUndef[  1] =      1 hits ( 0.00%)     12 bytes ( 0.00%)
21 Sep

spirv-stats – a tool to output statistics of your SPIR-V shader modules

I’ve just released a small one C++ file tool called spirv-stats. It will take one or more SPIR-V input files, and calculate the composition of the SPIR-V shader modules like so:

Totals: 35 hits 564 bytes
               OpDecorate =      5 hits (14.29%) 80 bytes (14.18%)
            OpTypePointer =      4 hits (11.43%) 64 bytes (11.35%)
               OpVariable =      4 hits (11.43%) 64 bytes (11.35%)
                   OpLoad =      3 hits ( 8.57%) 48 bytes ( 8.51%)
             OpTypeVector =      2 hits ( 5.71%) 32 bytes ( 5.67%)
          OpExtInstImport =      1 hits ( 2.86%) 24 bytes ( 4.26%)
            OpMemoryModel =      1 hits ( 2.86%) 12 bytes ( 2.13%)
             OpEntryPoint =      1 hits ( 2.86%) 32 bytes ( 5.67%)
          OpExecutionMode =      1 hits ( 2.86%) 12 bytes ( 2.13%)
             OpCapability =      1 hits ( 2.86%)  8 bytes ( 1.42%)
               OpTypeVoid =      1 hits ( 2.86%)  8 bytes ( 1.42%)
              OpTypeFloat =      1 hits ( 2.86%) 12 bytes ( 2.13%)
              OpTypeImage =      1 hits ( 2.86%) 36 bytes ( 6.38%)
       OpTypeSampledImage =      1 hits ( 2.86%) 12 bytes ( 2.13%)
           OpTypeFunction =      1 hits ( 2.86%) 12 bytes ( 2.13%)
               OpFunction =      1 hits ( 2.86%) 20 bytes ( 3.55%)
            OpFunctionEnd =      1 hits ( 2.86%)  4 bytes ( 0.71%)
                  OpStore =      1 hits ( 2.86%) 12 bytes ( 2.13%)
 OpImageSampleImplicitLod =      1 hits ( 2.86%) 20 bytes ( 3.55%)
                   OpFMul =      1 hits ( 2.86%) 20 bytes ( 3.55%)
                  OpLabel =      1 hits ( 2.86%)  8 bytes ( 1.42%)
                 OpReturn =      1 hits ( 2.86%)  4 bytes ( 0.71%)

It firstly outputs the total number of hits in the SPIR-V shader module(s) – this is the total number of opcodes found within the module(s). Then it outputs the total byte size of the module(s), followed by a sorted breakdown of the module(s), with the most hit opcodes coming first.

I decided to run this across the various folders of SPIR-V that @aras_p is using in his smol-v tool, with the following results.

dota2

Totals: 57090 hits 1099348 bytes
              OpMemberDecorate =  10907 hits (19.10%) 215824 bytes (19.63%)
                        OpLoad =   9140 hits (16.01%) 146240 bytes (13.30%)
                 OpAccessChain =   6034 hits (10.57%) 127080 bytes (11.56%)
                    OpVariable =   3273 hits ( 5.73%)  52368 bytes ( 4.76%)
                    OpDecorate =   2923 hits ( 5.12%)  44556 bytes ( 4.05%)
                       OpStore =   2701 hits ( 4.73%)  32412 bytes ( 2.95%)
                    OpConstant =   2403 hits ( 4.21%)  38448 bytes ( 3.50%)
                        OpFMul =   2398 hits ( 4.20%)  47960 bytes ( 4.36%)
          OpCompositeConstruct =   2053 hits ( 3.60%)  49788 bytes ( 4.53%)
                 OpTypePointer =   1952 hits ( 3.42%)  31232 bytes ( 2.84%)
                        OpFAdd =   1669 hits ( 2.92%)  33380 bytes ( 3.04%)
               OpVectorShuffle =   1371 hits ( 2.40%)  41800 bytes ( 3.80%)
                     OpExtInst =   1259 hits ( 2.21%)  35060 bytes ( 3.19%)
                        OpFSub =    925 hits ( 1.62%)  18500 bytes ( 1.68%)
                         OpDot =    894 hits ( 1.57%)  17880 bytes ( 1.63%)
           OpConstantComposite =    736 hits ( 1.29%)  18096 bytes ( 1.65%)
                  OpTypeStruct =    573 hits ( 1.00%)  44184 bytes ( 4.02%)
                       OpLabel =    542 hits ( 0.95%)   4336 bytes ( 0.39%)
                  OpTypeVector =    406 hits ( 0.71%)   6496 bytes ( 0.59%)
                   OpTypeArray =    403 hits ( 0.71%)   6448 bytes ( 0.59%)
      OpImageSampleImplicitLod =    316 hits ( 0.55%)   6320 bytes ( 0.57%)
      OpImageSampleExplicitLod =    300 hits ( 0.53%)   9000 bytes ( 0.82%)
            OpCompositeExtract =    257 hits ( 0.45%)   5140 bytes ( 0.47%)
                      OpBranch =    249 hits ( 0.44%)   1992 bytes ( 0.18%)
            OpTypeSampledImage =    225 hits ( 0.39%)   2700 bytes ( 0.25%)
                   OpTypeImage =    225 hits ( 0.39%)   8100 bytes ( 0.74%)
                        OpFDiv =    212 hits ( 0.37%)   4240 bytes ( 0.39%)
                     OpTypeInt =    210 hits ( 0.37%)   3360 bytes ( 0.31%)
  OpImageSampleDrefExplicitLod =    207 hits ( 0.36%)   6624 bytes ( 0.60%)
           OpBranchConditional =    155 hits ( 0.27%)   2480 bytes ( 0.23%)
              OpSelectionMerge =    153 hits ( 0.27%)   1836 bytes ( 0.17%)
                  OpCapability =    131 hits ( 0.23%)   1048 bytes ( 0.10%)
                OpFOrdLessThan =    131 hits ( 0.23%)   2620 bytes ( 0.24%)
                   OpTypeFloat =    110 hits ( 0.19%)   1320 bytes ( 0.12%)
                    OpTypeVoid =    110 hits ( 0.19%)    880 bytes ( 0.08%)
                 OpFunctionEnd =    110 hits ( 0.19%)    440 bytes ( 0.04%)
                  OpEntryPoint =    110 hits ( 0.19%)   6240 bytes ( 0.57%)
                 OpMemoryModel =    110 hits ( 0.19%)   1320 bytes ( 0.12%)
               OpExtInstImport =    110 hits ( 0.19%)   2640 bytes ( 0.24%)
                      OpReturn =    110 hits ( 0.19%)    440 bytes ( 0.04%)
                    OpFunction =    110 hits ( 0.19%)   2200 bytes ( 0.20%)
                OpTypeFunction =    110 hits ( 0.19%)   1320 bytes ( 0.12%)
                  OpTypeMatrix =     92 hits ( 0.16%)   1472 bytes ( 0.13%)
                        OpName =     87 hits ( 0.15%)   1368 bytes ( 0.12%)
                    OpTypeBool =     67 hits ( 0.12%)    536 bytes ( 0.05%)
             OpFOrdGreaterThan =     67 hits ( 0.12%)   1340 bytes ( 0.12%)
               OpExecutionMode =     65 hits ( 0.11%)    780 bytes ( 0.07%)
                      OpSelect =     56 hits ( 0.10%)   1344 bytes ( 0.12%)
                 OpConvertSToF =     48 hits ( 0.08%)    768 bytes ( 0.07%)
                     OpBitcast =     44 hits ( 0.08%)    704 bytes ( 0.06%)
            OpShiftLeftLogical =     44 hits ( 0.08%)    880 bytes ( 0.08%)
                        OpIAdd =     44 hits ( 0.08%)    880 bytes ( 0.08%)
                  OpMemberName =     34 hits ( 0.06%)    832 bytes ( 0.08%)
                        OpKill =     28 hits ( 0.05%)    112 bytes ( 0.01%)
                        OpFMod =     17 hits ( 0.03%)    340 bytes ( 0.03%)
                 OpConvertFToS =     16 hits ( 0.03%)    256 bytes ( 0.02%)
                   OpFOrdEqual =     11 hits ( 0.02%)    220 bytes ( 0.02%)
             OpSourceExtension =     10 hits ( 0.02%)    360 bytes ( 0.03%)
                      OpSource =     10 hits ( 0.02%)    120 bytes ( 0.01%)
           OpVectorTimesScalar =      8 hits ( 0.01%)    160 bytes ( 0.01%)
                     OpFNegate =      7 hits ( 0.01%)    112 bytes ( 0.01%)
        OpVectorExtractDynamic =      3 hits ( 0.01%)     60 bytes ( 0.01%)
                  OpLogicalNot =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                 OpConvertUToF =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                   OpLoopMerge =      2 hits ( 0.00%)     32 bytes ( 0.00%)
           OpFOrdLessThanEqual =      2 hits ( 0.00%)     40 bytes ( 0.00%)
                   OpLogicalOr =      1 hits ( 0.00%)     20 bytes ( 0.00%)

OpMemberDecorate dominates the dota2 shaders – nearly 20% of the module is decorating members of structs! Next we have OpLoad at 16% of the hits, but with 13% of the size of the files, followed by OpAccessChain at 10% of the hits and 12% of the size. Most of the shader module(s) are taken up with decorating struct members, and then loading and storing to various variables.

shadertoy

Totals: 75544 hits 1170540 bytes
                        OpLoad =  13486 hits (17.85%) 215776 bytes (18.43%)
                       OpStore =  10671 hits (14.13%) 128052 bytes (10.94%)
                       OpLabel =   7005 hits ( 9.27%)  56040 bytes ( 4.79%)
                        OpName =   5718 hits ( 7.57%)  93332 bytes ( 7.97%)
                    OpVariable =   5503 hits ( 7.28%)  88048 bytes ( 7.52%)
                      OpBranch =   4010 hits ( 5.31%)  32080 bytes ( 2.74%)
                    OpConstant =   3059 hits ( 4.05%)  48944 bytes ( 4.18%)
           OpBranchConditional =   2523 hits ( 3.34%)  40368 bytes ( 3.45%)
              OpSelectionMerge =   2469 hits ( 3.27%)  29628 bytes ( 2.53%)
                 OpAccessChain =   2171 hits ( 2.87%)  43696 bytes ( 3.73%)
                     OpExtInst =   1736 hits ( 2.30%)  47980 bytes ( 4.10%)
                        OpFMul =   1495 hits ( 1.98%)  29900 bytes ( 2.55%)
                OpFunctionCall =   1308 hits ( 1.73%)  38372 bytes ( 3.28%)
                        OpFAdd =   1230 hits ( 1.63%)  24600 bytes ( 2.10%)
                        OpFSub =   1027 hits ( 1.36%)  20540 bytes ( 1.75%)
           OpFunctionParameter =    960 hits ( 1.27%)  11520 bytes ( 0.98%)
           OpConstantComposite =    918 hits ( 1.22%)  20852 bytes ( 1.78%)
          OpCompositeConstruct =    869 hits ( 1.15%)  20120 bytes ( 1.72%)
                   OpFOrdEqual =    702 hits ( 0.93%)  14040 bytes ( 1.20%)
           OpVectorTimesScalar =    686 hits ( 0.91%)  13720 bytes ( 1.17%)
               OpVectorShuffle =    593 hits ( 0.78%)  17828 bytes ( 1.52%)
           OpFOrdLessThanEqual =    591 hits ( 0.78%)  11820 bytes ( 1.01%)
                      OpIEqual =    582 hits ( 0.77%)  11640 bytes ( 0.99%)
            OpCompositeExtract =    427 hits ( 0.57%)   8540 bytes ( 0.73%)
                    OpFunction =    415 hits ( 0.55%)   8300 bytes ( 0.71%)
                 OpFunctionEnd =    415 hits ( 0.55%)   1660 bytes ( 0.14%)
                        OpFDiv =    380 hits ( 0.50%)   7600 bytes ( 0.65%)
                 OpReturnValue =    342 hits ( 0.45%)   2736 bytes ( 0.23%)
                  OpLogicalAnd =    330 hits ( 0.44%)   6600 bytes ( 0.56%)
        OpFOrdGreaterThanEqual =    325 hits ( 0.43%)   6500 bytes ( 0.56%)
                OpFOrdLessThan =    289 hits ( 0.38%)   5780 bytes ( 0.49%)
                         OpPhi =    278 hits ( 0.37%)   7784 bytes ( 0.66%)
                OpTypeFunction =    268 hits ( 0.35%)   5808 bytes ( 0.50%)
             OpFOrdGreaterThan =    246 hits ( 0.33%)   4920 bytes ( 0.42%)
                        OpIAdd =    244 hits ( 0.32%)   4880 bytes ( 0.42%)
                 OpTypePointer =    244 hits ( 0.32%)   3904 bytes ( 0.33%)
              OpMemberDecorate =    180 hits ( 0.24%)   3600 bytes ( 0.31%)
                  OpMemberName =    180 hits ( 0.24%)   4320 bytes ( 0.37%)
                    OpDecorate =    162 hits ( 0.21%)   2520 bytes ( 0.22%)
                      OpReturn =    127 hits ( 0.17%)    508 bytes ( 0.04%)
                  OpLogicalNot =    122 hits ( 0.16%)   1952 bytes ( 0.17%)
                         OpDot =    103 hits ( 0.14%)   2060 bytes ( 0.18%)
                        OpFMod =     97 hits ( 0.13%)   1940 bytes ( 0.17%)
      OpImageSampleImplicitLod =     96 hits ( 0.13%)   2200 bytes ( 0.19%)
                   OpLogicalOr =     88 hits ( 0.12%)   1760 bytes ( 0.15%)
                     OpFNegate =     83 hits ( 0.11%)   1328 bytes ( 0.11%)
                   OpSLessThan =     74 hits ( 0.10%)   1480 bytes ( 0.13%)
                 OpConvertSToF =     67 hits ( 0.09%)   1072 bytes ( 0.09%)
                  OpTypeVector =     58 hits ( 0.08%)    928 bytes ( 0.08%)
                   OpLoopMerge =     54 hits ( 0.07%)    864 bytes ( 0.07%)
                 OpConvertFToS =     43 hits ( 0.06%)    688 bytes ( 0.06%)
           OpSGreaterThanEqual =     39 hits ( 0.05%)    780 bytes ( 0.07%)
                   OpTypeArray =     38 hits ( 0.05%)    608 bytes ( 0.05%)
                     OpTypeInt =     36 hits ( 0.05%)    576 bytes ( 0.05%)
           OpMatrixTimesVector =     32 hits ( 0.04%)    640 bytes ( 0.05%)
              OpSLessThanEqual =     21 hits ( 0.03%)    420 bytes ( 0.04%)
                        OpISub =     21 hits ( 0.03%)    420 bytes ( 0.04%)
            OpTypeSampledImage =     19 hits ( 0.03%)    228 bytes ( 0.02%)
                   OpTypeImage =     19 hits ( 0.03%)    684 bytes ( 0.06%)
                    OpTypeBool =     18 hits ( 0.02%)    144 bytes ( 0.01%)
                   OpTypeFloat =     18 hits ( 0.02%)    216 bytes ( 0.02%)
                  OpTypeStruct =     18 hits ( 0.02%)    864 bytes ( 0.07%)
                    OpTypeVoid =     18 hits ( 0.02%)    144 bytes ( 0.01%)
                      OpSource =     18 hits ( 0.02%)    216 bytes ( 0.02%)
               OpExtInstImport =     18 hits ( 0.02%)    432 bytes ( 0.04%)
                 OpMemoryModel =     18 hits ( 0.02%)    216 bytes ( 0.02%)
                  OpEntryPoint =     18 hits ( 0.02%)    504 bytes ( 0.04%)
               OpExecutionMode =     18 hits ( 0.02%)    216 bytes ( 0.02%)
                  OpCapability =     18 hits ( 0.02%)    144 bytes ( 0.01%)
                OpFOrdNotEqual =     16 hits ( 0.02%)    320 bytes ( 0.03%)
                  OpTypeMatrix =     13 hits ( 0.02%)    208 bytes ( 0.02%)
                        OpIMul =     13 hits ( 0.02%)    260 bytes ( 0.02%)
                        OpSDiv =     12 hits ( 0.02%)    240 bytes ( 0.02%)
                      OpSelect =      7 hits ( 0.01%)    168 bytes ( 0.01%)
                OpConstantTrue =      5 hits ( 0.01%)     60 bytes ( 0.01%)
               OpConstantFalse =      5 hits ( 0.01%)     60 bytes ( 0.01%)
                         OpAny =      4 hits ( 0.01%)     64 bytes ( 0.01%)
           OpVectorTimesMatrix =      3 hits ( 0.00%)     60 bytes ( 0.01%)
                        OpKill =      3 hits ( 0.00%)     12 bytes ( 0.00%)
                        OpDPdx =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                        OpDPdy =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                     OpSNegate =      2 hits ( 0.00%)     32 bytes ( 0.00%)
             OpLogicalNotEqual =      1 hits ( 0.00%)     20 bytes ( 0.00%)
                       OpUndef =      1 hits ( 0.00%)     12 bytes ( 0.00%)
                OpSGreaterThan =      1 hits ( 0.00%)     20 bytes ( 0.00%)

The shadertoy folder is dominated by loads and stores. Then curiously OpLabel – this indicates that there is a heavy amount of branching/looping occurring in the source shaders, as an OpLabel signifies a new basic block has been declared. OpBranch is the sixth most used opcode, which also backs up the view that these shaders make heavy use of branching/looping.

talos

Totals: 76515 hits 1369056 bytes
                        OpLoad =  14158 hits (18.50%) 226528 bytes (16.55%)
                       OpStore =   8912 hits (11.65%) 106944 bytes ( 7.81%)
            OpCompositeExtract =   8642 hits (11.29%) 174160 bytes (12.72%)
                    OpVariable =   7203 hits ( 9.41%) 115248 bytes ( 8.42%)
                 OpAccessChain =   6349 hits ( 8.30%) 135404 bytes ( 9.89%)
               OpVectorShuffle =   3885 hits ( 5.08%) 122764 bytes ( 8.97%)
                    OpConstant =   3577 hits ( 4.67%)  57232 bytes ( 4.18%)
          OpCompositeConstruct =   3158 hits ( 4.13%)  82616 bytes ( 6.03%)
                    OpDecorate =   1921 hits ( 2.51%)  30188 bytes ( 2.21%)
                       OpLabel =   1642 hits ( 2.15%)  13136 bytes ( 0.96%)
                        OpFMul =   1622 hits ( 2.12%)  32440 bytes ( 2.37%)
                 OpTypePointer =   1548 hits ( 2.02%)  24768 bytes ( 1.81%)
                     OpExtInst =   1346 hits ( 1.76%)  38388 bytes ( 2.80%)
           OpFunctionParameter =   1142 hits ( 1.49%)  13704 bytes ( 1.00%)
                        OpFAdd =   1046 hits ( 1.37%)  20920 bytes ( 1.53%)
                OpFunctionCall =    863 hits ( 1.13%)  19616 bytes ( 1.43%)
           OpVectorTimesScalar =    794 hits ( 1.04%)  15880 bytes ( 1.16%)
                    OpFunction =    743 hits ( 0.97%)  14860 bytes ( 1.09%)
                 OpFunctionEnd =    743 hits ( 0.97%)   2972 bytes ( 0.22%)
                        OpFSub =    716 hits ( 0.94%)  14320 bytes ( 1.05%)
                OpTypeFunction =    680 hits ( 0.89%)  12528 bytes ( 0.92%)
                      OpBranch =    589 hits ( 0.77%)   4712 bytes ( 0.34%)
                 OpReturnValue =    586 hits ( 0.77%)   4688 bytes ( 0.34%)
      OpImageSampleImplicitLod =    316 hits ( 0.41%)   6320 bytes ( 0.46%)
           OpBranchConditional =    305 hits ( 0.40%)   4880 bytes ( 0.36%)
                  OpTypeVector =    299 hits ( 0.39%)   4784 bytes ( 0.35%)
              OpSelectionMerge =    281 hits ( 0.37%)   3372 bytes ( 0.25%)
           OpConstantComposite =    269 hits ( 0.35%)   6432 bytes ( 0.47%)
                         OpDot =    228 hits ( 0.30%)   4560 bytes ( 0.33%)
                     OpTypeInt =    200 hits ( 0.26%)   3200 bytes ( 0.23%)
           OpVectorTimesMatrix =    191 hits ( 0.25%)   3820 bytes ( 0.28%)
                      OpReturn =    158 hits ( 0.21%)    632 bytes ( 0.05%)
                        OpIAdd =    154 hits ( 0.20%)   3080 bytes ( 0.22%)
                        OpFDiv =    152 hits ( 0.20%)   3040 bytes ( 0.22%)
                  OpTypeMatrix =    150 hits ( 0.20%)   2400 bytes ( 0.18%)
                        OpIMul =    117 hits ( 0.15%)   2340 bytes ( 0.17%)
                     OpFNegate =    117 hits ( 0.15%)   1872 bytes ( 0.14%)
                  OpTypeStruct =    107 hits ( 0.14%)   1340 bytes ( 0.10%)
            OpTypeSampledImage =    102 hits ( 0.13%)   1224 bytes ( 0.09%)
                   OpTypeImage =    102 hits ( 0.13%)   3672 bytes ( 0.27%)
                   OpTypeArray =    101 hits ( 0.13%)   1616 bytes ( 0.12%)
                  OpEntryPoint =    100 hits ( 0.13%)   5352 bytes ( 0.39%)
              OpMemberDecorate =    100 hits ( 0.13%)   2000 bytes ( 0.15%)
                  OpCapability =    100 hits ( 0.13%)    800 bytes ( 0.06%)
                    OpTypeVoid =    100 hits ( 0.13%)    800 bytes ( 0.06%)
               OpExtInstImport =    100 hits ( 0.13%)   2400 bytes ( 0.18%)
                 OpMemoryModel =    100 hits ( 0.13%)   1200 bytes ( 0.09%)
                   OpTypeFloat =    100 hits ( 0.13%)   1200 bytes ( 0.09%)
        OpFOrdGreaterThanEqual =     84 hits ( 0.11%)   1680 bytes ( 0.12%)
             OpFOrdGreaterThan =     78 hits ( 0.10%)   1560 bytes ( 0.11%)
                    OpTypeBool =     75 hits ( 0.10%)    600 bytes ( 0.04%)
                OpFOrdLessThan =     73 hits ( 0.10%)   1460 bytes ( 0.11%)
               OpExecutionMode =     63 hits ( 0.08%)    756 bytes ( 0.06%)
                  OpLogicalNot =     36 hits ( 0.05%)    576 bytes ( 0.04%)
      OpImageSampleExplicitLod =     36 hits ( 0.05%)   1008 bytes ( 0.07%)
           OpMatrixTimesScalar =     32 hits ( 0.04%)    640 bytes ( 0.05%)
                   OpLoopMerge =     24 hits ( 0.03%)    384 bytes ( 0.03%)
           OpMatrixTimesVector =     16 hits ( 0.02%)    320 bytes ( 0.02%)
           OpMatrixTimesMatrix =     15 hits ( 0.02%)    300 bytes ( 0.02%)
  OpImageSampleDrefExplicitLod =     14 hits ( 0.02%)    448 bytes ( 0.03%)
                   OpSLessThan =     14 hits ( 0.02%)    280 bytes ( 0.02%)
                 OpConvertFToS =      8 hits ( 0.01%)    128 bytes ( 0.01%)
                      OpFwidth =      8 hits ( 0.01%)    128 bytes ( 0.01%)
                   OpTranspose =      6 hits ( 0.01%)     96 bytes ( 0.01%)
                   OpLogicalOr =      4 hits ( 0.01%)     80 bytes ( 0.01%)
                OpFOrdNotEqual =      4 hits ( 0.01%)     80 bytes ( 0.01%)
                        OpKill =      4 hits ( 0.01%)     16 bytes ( 0.00%)
                         OpPhi =      3 hits ( 0.00%)     84 bytes ( 0.01%)
           OpFOrdLessThanEqual =      2 hits ( 0.00%)     40 bytes ( 0.00%)
             OpLogicalNotEqual =      2 hits ( 0.00%)     40 bytes ( 0.00%)

The talos folder is dominated once again by loads and stores. Next is OpCompositeExtract – which is extracting an element from a composite (aggregate, matrix or vector). I’d take a guess that there is a lot of vector math going on in these shaders, as the sixth most used opcode is OpVectorShuffle.

unity

Totals: 77783 hits 1346368 bytes
                    OpDecorate =  18764 hits (24.12%) 235700 bytes (17.51%)
                        OpLoad =  13131 hits (16.88%) 210096 bytes (15.60%)
                       OpStore =   6949 hits ( 8.93%)  83388 bytes ( 6.19%)
                 OpAccessChain =   5562 hits ( 7.15%) 115316 bytes ( 8.56%)
               OpVectorShuffle =   3883 hits ( 4.99%) 125980 bytes ( 9.36%)
                        OpName =   3428 hits ( 4.41%)  69392 bytes ( 5.15%)
              OpMemberDecorate =   3145 hits ( 4.04%)  59492 bytes ( 4.42%)
                    OpVariable =   3062 hits ( 3.94%)  48992 bytes ( 3.64%)
                        OpFMul =   3017 hits ( 3.88%)  60340 bytes ( 4.48%)
                  OpMemberName =   2293 hits ( 2.95%)  72892 bytes ( 5.41%)
                        OpFAdd =   1977 hits ( 2.54%)  39540 bytes ( 2.94%)
                    OpConstant =   1784 hits ( 2.29%)  28544 bytes ( 2.12%)
                 OpTypePointer =   1742 hits ( 2.24%)  27872 bytes ( 2.07%)
                     OpExtInst =    916 hits ( 1.18%)  24552 bytes ( 1.82%)
                     OpFNegate =    831 hits ( 1.07%)  13296 bytes ( 0.99%)
                       OpLabel =    726 hits ( 0.93%)   5808 bytes ( 0.43%)
                         OpDot =    686 hits ( 0.88%)  13720 bytes ( 1.02%)
          OpCompositeConstruct =    598 hits ( 0.77%)  14156 bytes ( 1.05%)
                   OpTypeArray =    496 hits ( 0.64%)   7936 bytes ( 0.59%)
                      OpBranch =    381 hits ( 0.49%)   3048 bytes ( 0.23%)
                  OpTypeStruct =    367 hits ( 0.47%)  12108 bytes ( 0.90%)
                  OpTypeVector =    347 hits ( 0.45%)   5552 bytes ( 0.41%)
            OpCompositeExtract =    269 hits ( 0.35%)   5380 bytes ( 0.40%)
      OpImageSampleImplicitLod =    241 hits ( 0.31%)   4820 bytes ( 0.36%)
           OpConstantComposite =    232 hits ( 0.30%)   5548 bytes ( 0.41%)
                        OpFDiv =    217 hits ( 0.28%)   4340 bytes ( 0.32%)
                     OpTypeInt =    215 hits ( 0.28%)   3440 bytes ( 0.26%)
           OpBranchConditional =    210 hits ( 0.27%)   3360 bytes ( 0.25%)
              OpSelectionMerge =    206 hits ( 0.26%)   2472 bytes ( 0.18%)
             OpSourceExtension =    157 hits ( 0.20%)   5372 bytes ( 0.40%)
                 OpFunctionEnd =    130 hits ( 0.17%)    520 bytes ( 0.04%)
                      OpReturn =    130 hits ( 0.17%)    520 bytes ( 0.04%)
                    OpFunction =    130 hits ( 0.17%)   2600 bytes ( 0.19%)
                  OpCapability =    123 hits ( 0.16%)    984 bytes ( 0.07%)
                OpTypeFunction =    117 hits ( 0.15%)   1420 bytes ( 0.11%)
                    OpTypeVoid =    113 hits ( 0.15%)    904 bytes ( 0.07%)
                      OpSource =    113 hits ( 0.15%)   1356 bytes ( 0.10%)
                  OpEntryPoint =    113 hits ( 0.15%)   5712 bytes ( 0.42%)
               OpExtInstImport =    113 hits ( 0.15%)   2712 bytes ( 0.20%)
                   OpTypeFloat =    113 hits ( 0.15%)   1356 bytes ( 0.10%)
                 OpMemoryModel =    113 hits ( 0.15%)   1356 bytes ( 0.10%)
                OpFOrdLessThan =     95 hits ( 0.12%)   1900 bytes ( 0.14%)
            OpTypeSampledImage =     91 hits ( 0.12%)   1092 bytes ( 0.08%)
                   OpTypeImage =     91 hits ( 0.12%)   3276 bytes ( 0.24%)
               OpExecutionMode =     89 hits ( 0.11%)   1100 bytes ( 0.08%)
                    OpTypeBool =     52 hits ( 0.07%)    416 bytes ( 0.03%)
      OpImageSampleExplicitLod =     40 hits ( 0.05%)   1120 bytes ( 0.08%)
                OpFunctionCall =     27 hits ( 0.03%)    532 bytes ( 0.04%)
                OpFOrdNotEqual =     23 hits ( 0.03%)    460 bytes ( 0.03%)
                        OpIAdd =     23 hits ( 0.03%)    460 bytes ( 0.03%)
                     OpBitcast =     22 hits ( 0.03%)    352 bytes ( 0.03%)
           OpFunctionParameter =     15 hits ( 0.02%)    180 bytes ( 0.01%)
                   OpFOrdEqual =      9 hits ( 0.01%)    180 bytes ( 0.01%)
                      OpSelect =      5 hits ( 0.01%)    120 bytes ( 0.01%)
                   OpINotEqual =      5 hits ( 0.01%)    100 bytes ( 0.01%)
                  OpEmitVertex =      5 hits ( 0.01%)     20 bytes ( 0.00%)
                        OpIMul =      5 hits ( 0.01%)    100 bytes ( 0.01%)
  OpImageSampleDrefExplicitLod =      5 hits ( 0.01%)    160 bytes ( 0.01%)
                        OpKill =      5 hits ( 0.01%)     20 bytes ( 0.00%)
                 OpConvertUToF =      4 hits ( 0.01%)     64 bytes ( 0.00%)
                   OpSLessThan =      4 hits ( 0.01%)     80 bytes ( 0.01%)
              OpControlBarrier =      4 hits ( 0.01%)     64 bytes ( 0.00%)
                   OpLoopMerge =      4 hits ( 0.01%)     64 bytes ( 0.00%)
                      OpIEqual =      4 hits ( 0.01%)     80 bytes ( 0.01%)
        OpFOrdGreaterThanEqual =      3 hits ( 0.00%)     60 bytes ( 0.00%)
                        OpUMod =      2 hits ( 0.00%)     40 bytes ( 0.00%)
                   OpULessThan =      2 hits ( 0.00%)     40 bytes ( 0.00%)
            OpShiftLeftLogical =      2 hits ( 0.00%)     40 bytes ( 0.00%)
                OpEndPrimitive =      2 hits ( 0.00%)      8 bytes ( 0.00%)
                 OpConvertSToF =      1 hits ( 0.00%)     16 bytes ( 0.00%)
                  OpLogicalAnd =      1 hits ( 0.00%)     20 bytes ( 0.00%)
                     OpSNegate =      1 hits ( 0.00%)     16 bytes ( 0.00%)
             OpCompositeInsert =      1 hits ( 0.00%)     24 bytes ( 0.00%)
            OpTypeRuntimeArray =      1 hits ( 0.00%)     12 bytes ( 0.00%)

And lastly the unity folder. These shader modules are dominated by OpDecorate. The next three most used opcodes are OpLoad, OpStore and OpAccessChain – so loading and storing to variables is taking up a sizeable amount of the shader modules.

All Together

If we look at all the folders above as one output from  spirv-stats instead:

Totals: 286932 hits 4985312 bytes
                        OpLoad =  49915 hits (17.40%) 798640 bytes (16.02%)
                       OpStore =  29233 hits (10.19%) 350796 bytes ( 7.04%)
                    OpDecorate =  23770 hits ( 8.28%) 312964 bytes ( 6.28%)
                 OpAccessChain =  20116 hits ( 7.01%) 421496 bytes ( 8.45%)
                    OpVariable =  19041 hits ( 6.64%) 304656 bytes ( 6.11%)
              OpMemberDecorate =  14332 hits ( 4.99%) 280916 bytes ( 5.63%)
                    OpConstant =  10823 hits ( 3.77%) 173168 bytes ( 3.47%)
                       OpLabel =   9915 hits ( 3.46%)  79320 bytes ( 1.59%)
               OpVectorShuffle =   9732 hits ( 3.39%) 308372 bytes ( 6.19%)
            OpCompositeExtract =   9595 hits ( 3.34%) 193220 bytes ( 3.88%)
                        OpName =   9233 hits ( 3.22%) 164092 bytes ( 3.29%)
                        OpFMul =   8532 hits ( 2.97%) 170640 bytes ( 3.42%)
          OpCompositeConstruct =   6678 hits ( 2.33%) 166680 bytes ( 3.34%)
                        OpFAdd =   5922 hits ( 2.06%) 118440 bytes ( 2.38%)
                 OpTypePointer =   5486 hits ( 1.91%)  87776 bytes ( 1.76%)
                     OpExtInst =   5257 hits ( 1.83%) 145980 bytes ( 2.93%)
                      OpBranch =   5229 hits ( 1.82%)  41832 bytes ( 0.84%)
           OpBranchConditional =   3193 hits ( 1.11%)  51088 bytes ( 1.02%)
              OpSelectionMerge =   3109 hits ( 1.08%)  37308 bytes ( 0.75%)
                        OpFSub =   2668 hits ( 0.93%)  53360 bytes ( 1.07%)
                  OpMemberName =   2507 hits ( 0.87%)  78044 bytes ( 1.57%)
                OpFunctionCall =   2198 hits ( 0.77%)  58520 bytes ( 1.17%)
           OpConstantComposite =   2155 hits ( 0.75%)  50928 bytes ( 1.02%)
           OpFunctionParameter =   2117 hits ( 0.74%)  25404 bytes ( 0.51%)
                         OpDot =   1911 hits ( 0.67%)  38220 bytes ( 0.77%)
           OpVectorTimesScalar =   1488 hits ( 0.52%)  29760 bytes ( 0.60%)
                    OpFunction =   1398 hits ( 0.49%)  27960 bytes ( 0.56%)
                 OpFunctionEnd =   1398 hits ( 0.49%)   5592 bytes ( 0.11%)
                OpTypeFunction =   1175 hits ( 0.41%)  21076 bytes ( 0.42%)
                  OpTypeVector =   1110 hits ( 0.39%)  17760 bytes ( 0.36%)
                  OpTypeStruct =   1065 hits ( 0.37%)  58496 bytes ( 1.17%)
                     OpFNegate =   1038 hits ( 0.36%)  16608 bytes ( 0.33%)
                   OpTypeArray =   1038 hits ( 0.36%)  16608 bytes ( 0.33%)
      OpImageSampleImplicitLod =    969 hits ( 0.34%)  19660 bytes ( 0.39%)
                        OpFDiv =    961 hits ( 0.33%)  19220 bytes ( 0.39%)
                 OpReturnValue =    928 hits ( 0.32%)   7424 bytes ( 0.15%)
                   OpFOrdEqual =    722 hits ( 0.25%)  14440 bytes ( 0.29%)
                     OpTypeInt =    661 hits ( 0.23%)  10576 bytes ( 0.21%)
           OpFOrdLessThanEqual =    595 hits ( 0.21%)  11900 bytes ( 0.24%)
                OpFOrdLessThan =    588 hits ( 0.20%)  11760 bytes ( 0.24%)
                      OpIEqual =    586 hits ( 0.20%)  11720 bytes ( 0.24%)
                      OpReturn =    525 hits ( 0.18%)   2100 bytes ( 0.04%)
                        OpIAdd =    465 hits ( 0.16%)   9300 bytes ( 0.19%)
                   OpTypeImage =    437 hits ( 0.15%)  15732 bytes ( 0.32%)
            OpTypeSampledImage =    437 hits ( 0.15%)   5244 bytes ( 0.11%)
        OpFOrdGreaterThanEqual =    412 hits ( 0.14%)   8240 bytes ( 0.17%)
             OpFOrdGreaterThan =    391 hits ( 0.14%)   7820 bytes ( 0.16%)
      OpImageSampleExplicitLod =    376 hits ( 0.13%)  11128 bytes ( 0.22%)
                  OpCapability =    372 hits ( 0.13%)   2976 bytes ( 0.06%)
               OpExtInstImport =    341 hits ( 0.12%)   8184 bytes ( 0.16%)
                 OpMemoryModel =    341 hits ( 0.12%)   4092 bytes ( 0.08%)
                  OpEntryPoint =    341 hits ( 0.12%)  17808 bytes ( 0.36%)
                    OpTypeVoid =    341 hits ( 0.12%)   2728 bytes ( 0.05%)
                   OpTypeFloat =    341 hits ( 0.12%)   4092 bytes ( 0.08%)
                  OpLogicalAnd =    331 hits ( 0.12%)   6620 bytes ( 0.13%)
                         OpPhi =    281 hits ( 0.10%)   7868 bytes ( 0.16%)
                  OpTypeMatrix =    255 hits ( 0.09%)   4080 bytes ( 0.08%)
               OpExecutionMode =    235 hits ( 0.08%)   2852 bytes ( 0.06%)
  OpImageSampleDrefExplicitLod =    226 hits ( 0.08%)   7232 bytes ( 0.15%)
                    OpTypeBool =    212 hits ( 0.07%)   1696 bytes ( 0.03%)
           OpVectorTimesMatrix =    194 hits ( 0.07%)   3880 bytes ( 0.08%)
             OpSourceExtension =    167 hits ( 0.06%)   5732 bytes ( 0.11%)
                  OpLogicalNot =    160 hits ( 0.06%)   2560 bytes ( 0.05%)
                      OpSource =    141 hits ( 0.05%)   1692 bytes ( 0.03%)
                        OpIMul =    135 hits ( 0.05%)   2700 bytes ( 0.05%)
                 OpConvertSToF =    116 hits ( 0.04%)   1856 bytes ( 0.04%)
                        OpFMod =    114 hits ( 0.04%)   2280 bytes ( 0.05%)
                   OpLogicalOr =     93 hits ( 0.03%)   1860 bytes ( 0.04%)
                   OpSLessThan =     92 hits ( 0.03%)   1840 bytes ( 0.04%)
                   OpLoopMerge =     84 hits ( 0.03%)   1344 bytes ( 0.03%)
                      OpSelect =     68 hits ( 0.02%)   1632 bytes ( 0.03%)
                 OpConvertFToS =     67 hits ( 0.02%)   1072 bytes ( 0.02%)
                     OpBitcast =     66 hits ( 0.02%)   1056 bytes ( 0.02%)
           OpMatrixTimesVector =     48 hits ( 0.02%)    960 bytes ( 0.02%)
            OpShiftLeftLogical =     46 hits ( 0.02%)    920 bytes ( 0.02%)
                OpFOrdNotEqual =     43 hits ( 0.01%)    860 bytes ( 0.02%)
                        OpKill =     40 hits ( 0.01%)    160 bytes ( 0.00%)
           OpSGreaterThanEqual =     39 hits ( 0.01%)    780 bytes ( 0.02%)
           OpMatrixTimesScalar =     32 hits ( 0.01%)    640 bytes ( 0.01%)
                        OpISub =     21 hits ( 0.01%)    420 bytes ( 0.01%)
              OpSLessThanEqual =     21 hits ( 0.01%)    420 bytes ( 0.01%)
           OpMatrixTimesMatrix =     15 hits ( 0.01%)    300 bytes ( 0.01%)
                        OpSDiv =     12 hits ( 0.00%)    240 bytes ( 0.00%)
                      OpFwidth =      8 hits ( 0.00%)    128 bytes ( 0.00%)
                 OpConvertUToF =      6 hits ( 0.00%)     96 bytes ( 0.00%)
                   OpTranspose =      6 hits ( 0.00%)     96 bytes ( 0.00%)
                OpConstantTrue =      5 hits ( 0.00%)     60 bytes ( 0.00%)
                   OpINotEqual =      5 hits ( 0.00%)    100 bytes ( 0.00%)
                  OpEmitVertex =      5 hits ( 0.00%)     20 bytes ( 0.00%)
               OpConstantFalse =      5 hits ( 0.00%)     60 bytes ( 0.00%)
              OpControlBarrier =      4 hits ( 0.00%)     64 bytes ( 0.00%)
                         OpAny =      4 hits ( 0.00%)     64 bytes ( 0.00%)
                     OpSNegate =      3 hits ( 0.00%)     48 bytes ( 0.00%)
             OpLogicalNotEqual =      3 hits ( 0.00%)     60 bytes ( 0.00%)
        OpVectorExtractDynamic =      3 hits ( 0.00%)     60 bytes ( 0.00%)
                   OpULessThan =      2 hits ( 0.00%)     40 bytes ( 0.00%)
                        OpUMod =      2 hits ( 0.00%)     40 bytes ( 0.00%)
                        OpDPdx =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                        OpDPdy =      2 hits ( 0.00%)     32 bytes ( 0.00%)
                OpEndPrimitive =      2 hits ( 0.00%)      8 bytes ( 0.00%)
             OpCompositeInsert =      1 hits ( 0.00%)     24 bytes ( 0.00%)
            OpTypeRuntimeArray =      1 hits ( 0.00%)     12 bytes ( 0.00%)
                       OpUndef =      1 hits ( 0.00%)     12 bytes ( 0.00%)
                OpSGreaterThan =      1 hits ( 0.00%)     20 bytes ( 0.00%)

We can see that loading and storing dominates our shader modules at 28% of the opcodes and 25% of the binary size.

Summary

The tool showed us some interesting divergent trends across each of the providers of the SPIR-V shader modules. Thanks to Valve, Shadertoy, Croteam and Unity for allowing @aras_p to use their SPIR-V shaders when he wrote his smol-v tool. I wouldn’t have had such interesting source material otherwise to run my tool against!

The  spirv-stats tool can be got via its GitHub repository – hope it is useful to someone!

18 Sep

Adding type identifiers (MPC -> LLVM for the Neil Language #3)

In my previous post cleaning up the parser, I had rationalised the parser for my custom language I’m calling Neil – Not Exactly an Intermediate Language.

The next step in making my language more useful and expression is to add the ability to arbitrarily create variables within a function, store an initial value to the variable, and then be able to update the variable.

Previously, our language could handle:

i32 foo(i32 x) {
  return x * 5 + 42;
}
i32 main() {
  return foo(13);
}

Whereas now we want it to be able to handle:

i32 foo(i32 x) {
  i32 y = x * 5;
  y = y + 42;
  return y;
}
i32 main() {
  return foo(13);
}

In GitHub Pull Request #1 I’ve submitted the patch of changes required to get this to work, but lets cover the steps that were required to do this!

MPC Grammar Changes

We firstly need to modify our MPC grammar to accept type declarations within the body of a function. Our functions (procedure in the grammar) are made up of zero or more statements. We modify the stmt definition in the grammar to be:

 stmt : \"return\" <lexp>? ';' 
      | <ident> '(' <ident>? (',' <ident>)* ')' ';'
      | <typeident> ('=' <lexp>)? ';' 
      | <ident> '=' <lexp> ';' ;

As shown in red, we’ve added two new statement variants:

  • a type identifier followed by an optional initializer
  • assigning into an existing identifier

With these two changes, we can represent all the fun variable shenanigans we could imagine.

AST Lowering

We need to handle two extra cases in our lower_statement method now – lowering type definitions within functions, and assignments into already defined identifiers.

To detect the type identifier case, we simply:

  • add some logic to check if the first child of a statement AST node is a typeident
  • the first child of a type identifier is the type, which we lookup in the type table
  • the second child of a type identifier is the name of the variable, which we’ll use to assign an entry in the symbol table
  • we then use LLVM’s stack allocation instruction alloca to create memory for our variable to reside within
  • and lastly store into the symbol table, using the variable name we already parsed, that the name maps to the alloca’ed variable

We then need to check whether the variable had an initializer or not:

  • if the statement began with a type identifier AST node, and had exactly two children, we know that there was no initializer (so can skip the following logic)
  • otherwise, the statement must have exactly four children (a type identifier, an ‘=’ character, an expression, and lastly a ‘;’ character)
  • we use the lower_ast_node method to lower the expression, passing in the type we deduced from the type identifier
  • and lastly use LLVM’s store instruction to assign the value of the expression into the memory backing the variable

To detect when we want to assign into an existing variable:

  • we know a variable declaration is an identifier, followed by an ‘=’ character, followed by an expression, and lastly a ‘;’ character
  • we then lookup the variable in the symbol table
  • we use the lower_ast_node method to lower the expression on the right-hand-side of the ‘=’ character, passing in the type of the type identifier (we use LLVMTypeOf on the value to deduce the type)
  • and lastly we use LLVM’s store instruction to assign the value of the expression into the memory backing the variable

Easy!

Modifying Expression Lowering

We need to slightly modify expression lowering to. Previously all values in the symbol table were directly correlated to the type used to declare the value in the source language. For example, a function parameter with type i32 would have the LLVM type i32 to match it. Now that we have function local variables that have a stack allocation backing them, we need to deduce when we find an identifier, if it was an identifier that came from a function local variable, or an identifier that came from a function parameter. In lower_ident, we now have an extra check to see if the value in the symbol table is an alloca instruction, EG. it requires to be loaded before we can use it in a further expression. If we find that it is an alloca, we simply use LLVM’s load instruction on it, and return this new LLVM value instead of the original value in the symbol table.

Remembering the Symbols to Murder

When we close a function, we would loop through the named parameters of the function and remove them from the symbol table. These symbols were only usable within the body of the function so we had to ensure we didn’t accidently extend their lifetime beyond the function close. This also now applies to variables declared within a function too! Now, to add a little complexity, since variables can be declared in any statement of a function, we need to keep a list of symbols that need to be destroyed when the function closes. We use a new NeilVector (at present just a wrapper around std::vector) to keep track of which symbols we need to destroy when we close the function. When calling lower_statement and having detected a new type identifier definition, we push back the name of the symbol that we need to cleanup onto our symbol cleanup vector, and then when the function closes we just loop through this vector, and erase the corresponding symbol in the symbol table.

What is Next?

So now we’ve added function local variables to the language, it is about time that we tackled a big lacking feature – branching! In the next post we’ll cover how we handle adding if statements to the language. Stay tuned!

15 Sep

json.h update – fixing bugs found by nativejson-benchmark

I was recently made aware of the awesome JSON parser/writer benchmark suite nativejson-benchmark  (thanks to the awesome @chadaustin).

The suite takes over 40 different JSON libraries and compares how fast the can parse/write JSON, but also has a set of test cases that cover some really obtuse corner cases that a lot of parsers missed. One of the parsers that failed some of these tests was my own one json.h, so in pull request #42 I’ve merged in the fixes for the failures uncovered by the test suite.

The main difference when users of json.h migrate to the latest code is the deprecation of the json_parse_flags_allow_string_simplification option. My reading of the JSON specifications was wrong in that I thought I had to preserve control characters like ‘\n’ into the end result. The json_parse_flags_allow_string_simplification option was my ‘solution’ to what I thought would be non-conforming behaviour. Now, with the tip code, effectively json_parse_flags_allow_string_simplification is always on by default, with no option to turn it off.

Another change is that any malformed characters that occurred after a valid JSON sequence in source would have previously been ignored. Say you had provided the following JSON:

{“a” : true, “b” : false} heyo this really shouldn’t be here

Previously, the parser would just parse the object and terminate when the object closed – EG. when the closing ‘}’ was detected. This was dangerous behaviour that nativejson-benchmark correctly failed my library on, so I’ve changed it such that it will fail if:

  • Any non-whitespace character is found after the close of the parent value
  • Or, if json_parse_flags_allow_c_style_comments is enabled, any non-whitespace character occurs outwith any comments that occur after the close of the parent value

json.h should now pass 100% of the tests provided in the nativejson-benchmark test suite, and be more useful to users going forward. Happy hunting folks!

09 Sep

Cleaning up the parser (MPC -> LLVM for the Neil Language #2)

In my previous post Hooking up MPC & LLVM I had started hooking up MPC to LLVM, to start developing a custom language I’m calling Neil – Not Exactly an Intermediate Language.

Since then, I’ve tried to rationalise the various hacks I had to do on the original version just to get something working, and actually try and have a clean codebase for generating LLVM from our MPC grammar. Check out the GitHub repository here if you want to follow the progress.

I decided to wrap the logic into a C++ class called ASTLowering – this will encapsulate an entire invocation of MPC -> LLVM, and hold some structures for the symbol table, type table, the LLVM IRBuilder we are using, and the current function we are generating a body for. I did this just to try and keep down the number of parameters I’m passing down between the recursive function calls we’ll need to use to walk the AST and output our LLVM IR.

So next up – let us cover a bunch of the core concepts we need to parse our language to LLVM IR.

LLVM Values & Types

Everything (near enough) in LLVM IR derives from the LLVM Value base class – in the C API this becomes LLVMValueRef. Functions, function Arguments, Instructions, Constants – they all derive from the LLVM Value class. It’s important to note that most of the things we’ll be dealing with are these values.

And everything that is an LLVM Value has an corresponding LLVM Type – Function Types, Integer Types, Pointer Types, etc, etc – we can query any value in LLVM to get its corresponding type.

Symbol Table

First up is the symbol table, specifically what is a symbol table in the context of our parser?

The symbol table is just a map of an identifier (the name of the variable/function) to the LLVM Value that represents the variable/function in the IR. For instance, lets say we had this function:

i32 foo(i32 a) {
  return a;
}

As we are parsing, when we encounter the function ‘foo’ in our AST, we add an entry in our symbol table to record that there is an LLVM Value (in this case, the LLVM Function) whose name is ‘foo’. We also add in, only within the scope of the body of the function, that there is a symbol called ‘a’, whose LLVM Value is the first LLVM Argument of the function ‘foo’.

And that is it!

Type Table

We keep a type table – a table of all known types that are being used and are valid in the program. At present, we just hardcode all the supported types we have – i1, i8, i16, i32, i64, u8, u16, u32, u64, f16, f32 and f64. The identifiers of each of these types in the Neil language (for example ‘u8’) is mapped to the corresponding LLVM Type – in the C API called LLVMTypeRef. Then, when we are parsing a type identifier in the AST (for example, the return type of a function definition), we simply look up the string name of the type in the type table, and use the corresponding LLVM Type when creating the value.

How our Parsing Works

Let’s take a simple example of our Neil language:

i32 foo(i32 x) {
  return x * 5 + 42;
}
i32 main() {
  return foo(13);
}

We feed this to our mpc parser for the Neil language, which produces us the following AST:

> 
  regex 
  procedure|> 
    type|regex:1:1 'i32'
    ident|regex:1:5 'foo'
    char:1:8 '('
    args|typeident|> 
      type|regex:1:9 'i32'
      ident|regex:1:13 'x'
    char:1:14 ')'
    body|> 
      char:1:16 '{'
      stmt|> 
        string:2:3 'return'
        lexp|> 
          term|> 
            factor|ident|regex:2:10 'x'
            char:2:12 '*'
            factor|literal|regex:2:14 '5'
          char:2:16 '+'
          term|factor|literal|regex:2:18 '42'
        char:2:20 ';'
      char:3:1 '}'
  procedure|> 
    type|regex:4:1 'i32'
    ident|regex:4:5 'main'
    char:4:9 '('
    char:4:10 ')'
    body|> 
      char:4:12 '{'
      stmt|> 
        string:5:3 'return'
        lexp|term|factor|> 
          ident|regex:5:10 'foo'
          char:5:13 '('
          lexp|term|factor|literal|regex:5:14 '13'
          char:5:16 ')'
        char:5:17 ';'
      char:6:1 '}'
  regex

Our parser starts at the root MPC node (always called ‘>’), and then looks for any functions defined in the source file. If we find a function, we’ll have a corresponding ‘procedure’ node in the AST.

We find the first procedure, which has the declaration:

procedure|>
  type|regex:1:1 'i32'
  ident|regex:1:5 'foo'
  char:1:8 '('
  args|typeident|> 
    type|regex:1:9 'i32'
    ident|regex:1:13 'x'
  char:1:14 ')'

The steps we follow to parse this are:

  • A procedure always starts with a type. We lookup the contents of the first child of the procedure in our type table, which returns us the LLVM type for a 32 bit integer.
  • The function has a identifier – the name of the function. We remember the second child’s contents is the name of the function.
  • The third child is the opening ‘(‘ for any arguments to the function.
  • While the next child is not the closing ‘)’ for the arguments to the function, we look for a type-identifier.
    • The first child of a type-identifier is the type, which we again lookup in our type table, and store this into a vector.
    • The second child of a type-identifier is the name for the argument of the type. We store this name into a vector.
  • Once we reach the closing ‘)’ for the arguments, we can now create the LLVM Function to populate the symbol table with.

To create an LLVM Function we first need the type of the function – we need the return type, and an array of the argument types. With these (which we just parsed above), we can create the function type. Once we have the function type, we combine this with the name of the function we discovered in the second child of our procedure, and create the new LLVM Function. The last thing we need to do to finalise the declaration of our function is name each of the arguments to the function. We run through the arguments of the function calling LLVMSetValueName() using the vector of names we remembered when parsing our type-identifiers earlier.

And huzzah! We have our function declaration.

Next up, we parse the body of the function:

body|> 
  char:1:16 '{'
  stmt|> 
    string:2:3 'return'
    lexp|> 
      term|> 
        factor|ident|regex:2:10 'x'
        char:2:12 '*'
        factor|literal|regex:2:14 '5'
      char:2:16 '+'
      term|factor|literal|regex:2:18 '42'
    char:2:20 ';'
  char:3:1 '}'

The function body consists of:

  • The first child is the opening ‘{‘ for the body of the function.
  • While the next child is not the closing ‘}’ for the body of the function, we loop through all of the statement AST nodes for the body of the function.
  • Within the statement AST node, we support two kinds of statements at present – return statements, and function call statements. In the example showing we find that:
    • We have a ‘return’ as the first child to the statement (thus it is a return statement type).
    • If the second child is not ‘;’, we are returning a value from the function, in the form of an lexp AST node.
      • The first child of an lexp AST node is always a term AST node.
        • The first child of a term AST node is always a factor. In the example above we can see we have a ‘factor|ident|regex’ named node – this highlights a cool feature of MPC I’ll talk about later*.
          • We parse the factor-that-is-an-identifier, which is the identifier ‘x’. We look this up in the symbol table, finding the argument to our function called ‘x’, and use that LLVM Value as the value for the factor.
        • If there is a second child of a term AST node, it is either multiply ‘*’, divide ‘/’ or remainder ‘%’, in our example it is a multiply.
        • We then parse the third child (if there was a second, there must be a third node according to the rules of our grammar), which is also a factor AST node.
          • We parse this factor-that-is-a-literal, which is the constant ‘5’. Now, in our Neil language, constants don’t inherently have a type – we infer the type from the closest typed expression. So in this instance, we know that the type of the left-hand-side of our multiplication operation is the type ‘i32’, so our constant gets parsed into an LLVM Value that is the constant ‘5’ that is a 32 bit integer.
      • if there is a second child of a lexp AST node, it is either addition ‘+’ or subtraction ‘-‘, in our example it is an addition.
      • We then parse the third child of the lexp AST node (if there was a second child, there must be a third child), which is a term AST node:
        • The term in this case is a term-that-is-a-factor-that-is-a-literal, which is the constant ’42’. Again, our constant is un-typed, so we infer that the result of the left-hand-side of our addition has the type ‘i32’, and use that as the type for this constant.
    • The last child of the statement must be a ‘;’ to close the statement.
  • And the last child of the procedure must be a ‘}’ to close the function.

And that is it! We have parsed our function, and we get the result in LLVM IR of:

define i32 @foo(i32 %x) {
  %1 = mul i32 %x, 5
  %2 = add i32 %1, 42
  ret i32 %2
}

Cool MPC Feature*

When going through how we parse, I eluded to a cool MPC feature that we’d have to come back to – I like to call them ‘multi-type-AST-nodes’. If a node in the AST has exactly one child, MPC will fold the nodes together. A great example in the above AST is:

term|factor|literal|regex:2:18 '42'

Which is a node that is a term -> factor -> literal -> regex! This means that we have to do a little more complex parsing when we look at the ‘tag’ of each AST node (see ‘lower_ast_node’ in the GitHub for Neil) – we can’t just say ‘this node is a term’, we have to look at the entire tag, and work backwards to work out how we should be parsing the value. While it increases the complexity of the parsing marginally, you can see why this would be a huge win if we had a super large input source file being parsed by MPC – we are using one node in place of four, a 25% saving in memory usage for the MPC format!

What is Next?

So we’ve got basic parsing support into our language, and we can make simple programs. But we don’t have some really basic features that make languages actually usable yet. In the next post, we’ll add type identifiers, so we can declare variables within our function bodies. Stay tuned!

28 Aug

Hooking up MPC & LLVM

I’ve been tinkering around with mpc and llvm recently – just to satisfy a few simple questions:

  • How easy is it to use mpc?
  • How easy is the LLVM c api to use?
  • How easy is it to connect a parser generator to an LLVM Module?

So given the above aims, I’ve embarked on creating my own stupid little language, that I’m calling ‘neil‘  – Not Exactly an Intermediate Language. It is utterly by chance that the name ‘neil‘ happens to be my own name, if you’ve believe that gratuitous lie!

A ‘neil‘ program currently consists of one allowed type – a 32 bit integer named i32, function definitions, function calls, and being able to return the results of one function from another. In short, it is (currently, and most likely permanently) a really stupid language.

An example legal ‘neil’ program is:

 i32 foo() { return 42; }

 i32 main() {
   return foo();
 }

So lets get right into how we construct this grammar in mpc. The mpc grammar for our ‘neil‘ language is as follows:

ident : /[a-zA-Z_][a-zA-Z0-9_]*/ ;

literal : /[0-9]+(\\.[0-9]+)?((e|E)(-|\\+)?[0-9]+)?/ ;

factor: <literal> | <ident> '(' <lexp>? (',' <lexp>)* ')' ;

term: <factor> ;

lexp: <term> ;

stmt: \"return\" <lexp>? ';'
    | <ident> '(' <ident>? (',' <ident>)* ')' ';' ;

type: (\"i1\" | /(i|u)(8|16|32|64)/ | /f(16|32|64)/) ;

typeident : <type> <ident> ;

args: <typeident>? (',' <typeident>)* ;

body: '{' <stmt>* '}' ;

procedure : <type> <ident> '(' <args> ')' <body> ;

lang: /^/ <procedure>* /$/ ;

It’s a very simple grammar, I looked at the mpc – smallc example to work out what to do. To parse using mpc, I have the following function:

int parse(
 const char* filename,
 const char* source,
 mpc_result_t* out_result) {
 mpc_parser_t* Ident = mpc_new("ident");
 mpc_parser_t* Number = mpc_new("literal");
 mpc_parser_t* Factor = mpc_new("factor");
 mpc_parser_t* Term = mpc_new("term");
 mpc_parser_t* Lexp = mpc_new("lexp");
 mpc_parser_t* Stmt = mpc_new("stmt");
 mpc_parser_t* Type = mpc_new("type");
 mpc_parser_t* Decls = mpc_new("decls");
 mpc_parser_t* Args = mpc_new("args");
 mpc_parser_t* Body = mpc_new("body");
 mpc_parser_t* Procedure = mpc_new("procedure");
 mpc_parser_t* Lang = mpc_new("lang");

 mpc_err_t* err = mpca_lang(MPCA_LANG_DEFAULT,
  " ident : /[a-zA-Z_][a-zA-Z0-9_]*/ ;                     \n"
  " literal : /[0-9]+(\\.[0-9]+)?((e|E)(-|\\+)?[0-9]+)?/ ; \n"
  " factor : <literal>                                     \n"
  " | <ident> '(' <lexp>? (',' <lexp>)* ')' ;              \n"
  " term : <factor> ;                                      \n"
  " lexp : <term> ;                                        \n"
  " stmt : \"return\" <lexp>? ';'                          \n"
  " | <ident> '(' <ident>? (',' <ident>)* ')' ';' ;        \n"
  " type : (\"i1\" | /(i|u)(8|16|32|64)/ | /f(16|32|64)/) ;\n"
  " typeident : <type> <ident> ;                           \n"
  " args : <typeident>? (',' <typeident>)* ;               \n"
  " body : '{' <stmt>* '}' ;                               \n"
  " procedure : <type> <ident> '(' <args> ')' <body> ;     \n"
  " lang : /^/ <procedure>* /$/ ;                         \n",
 Ident, Number, Factor, Term, Lexp, Stmt, 
 Type, Args, Body, Procedure, Lang, NULL);
 
 if (err != NULL) {
  mpc_err_print(err);
  mpc_err_delete(err);
  exit(1);
 }

 const int result = mpc_parse(filename, source, Lang, out_result);

 mpc_cleanup(11, Ident, Number, Factor, Term, Lexp, Stmt, 
 Type, Args, Body, Procedure, Lang);

 return result;
}

With this, we can then iterate through a successfully parsed ‘neil‘ input using the:

typedef struct mpc_ast_t {
 char *tag;
 char *contents;
 mpc_state_t state;
 int children_num;
 struct mpc_ast_t** children;
} mpc_ast_t;

type that mpc provides. The struct elements are as follows:

  • tag – the string name of the ast node’s type
  • contents – what the ast node is pointing at in the original source input
  • state – the line information for the ast node
  • children_num – the length of the children array
  • children – an array of AST node’s that are children of the current node

Next up, we need to create what we need from LLVM to produce a module for the current file. I’m using the LLVM C API because I’ve never used it before, having only ever used the C++ API in my day to day stuff at work, so I thought it would be useful to have a look at it.

First up we need an LLVM Module – this is an encapsulation of a ‘neil‘ input file for our uses. I use:

LLVMModuleRef module = LLVMModuleCreateWithName(filename);

To get a module to work with. Then, for each AST node that is a procedure, I create a corresponding LLVM function with:

mpc_ast_t *name = procedure->children[1];

LLVMTypeRef funcType = LLVMFunctionType(
 LLVMInt32Type(), 0, 0, false);

LLVMValueRef func = LLVMAddFunction(
 module, name->contents, funcType);

(Note I’m cheating for now because I know my functions return i32, and take no params, don’t do this in production code!).

Then, since I have no control flow within my functions, I can create one basic block to hold the body of the function, and an IR builder to help us make the instructions within the basic block, with:

LLVMBasicBlockRef block = LLVMAppendBasicBlock(func, "");

LLVMBuilderRef builder = LLVMCreateBuilder();

LLVMPositionBuilderAtEnd(builder, block);

And with this we can begin to parse the body of the functions!

I parse the return statement (the only allowed statement within our functions), and check if it returns a literal or the result of a call to a function:

if (weFoundALiteral) {
 LLVMValueRef literal = LLVMConstIntOfString(
  LLVMInt32Type(), stmt->children[1]->contents, 10);

 LLVMBuildRet(builder, literal);
} else {
 LLVMValueRef call = LLVMGetNamedFunction(module, ident_name);

 LLVMValueRef ret = LLVMBuildCall(builder, call, 0, 0, "");

 LLVMBuildRet(builder, ret);
}

And the result is:

; ModuleID = 'foo.neil'

define i32 @foo() {
 ret i32 42
}

define i32 @main() {
 %1 = call i32 @foo()
 ret i32 %1
}

From the example ‘neil‘ file I gave above.

TL;DR mpc is pretty cool, the LLVM C API is very easy to use, and I’ll be fleshing out my ‘neil‘ language in future blog posts once I try handling a more complicated input grammar.

PS the full source for the example is below:

// This is free and unencumbered software released into the public domain.
//
// Anyone is free to copy, modify, publish, use, compile, sell, or
// distribute this software, either in source code form or as a compiled
// binary, for any purpose, commercial or non-commercial, and by any
// means.
//
// In jurisdictions that recognize copyright laws, the author or authors
// of this software dedicate any and all copyright interest in the
// software to the public domain. We make this dedication for the benefit
// of the public at large and to the detriment of our heirs and
// successors. We intend this dedication to be an overt act of
// relinquishment in perpetuity of all present and future rights to this
// software under copyright law.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
// IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
// OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
//
// For more information, please refer to <http://unlicense.org/>

#include <assert.h>

#define __STDC_CONSTANT_MACROS
#define __STDC_LIMIT_MACROS
#include <llvm-c/Core.h>

extern "C" {
#include <mpc/mpc.h>
}

const char *src =
 "i32 foo() { return 42; }\n"
 "i32 main() {\n"
 " return foo();\n"
 "}\n";

int parse(const char* filename, const char* source, mpc_result_t* out_result) {
 mpc_parser_t* Ident = mpc_new("ident");
 mpc_parser_t* Number = mpc_new("literal");
 mpc_parser_t* Factor = mpc_new("factor");
 mpc_parser_t* Term = mpc_new("term");
 mpc_parser_t* Lexp = mpc_new("lexp");
 mpc_parser_t* Stmt = mpc_new("stmt");
 mpc_parser_t* Type = mpc_new("type");
 mpc_parser_t* Decls = mpc_new("decls");
 mpc_parser_t* Args = mpc_new("args");
 mpc_parser_t* Body = mpc_new("body");
 mpc_parser_t* Procedure = mpc_new("procedure");
 mpc_parser_t* Lang = mpc_new("lang");

 mpc_err_t* err = mpca_lang(MPCA_LANG_DEFAULT,
 " ident : /[a-zA-Z_][a-zA-Z0-9_]*/ ; \n"
 " literal : /[0-9]+(\\.[0-9]+)?((e|E)(-|\\+)?[0-9]+)?/ ; \n"
 " \n"
 " factor : <literal> \n"
 " | <ident> '(' <lexp>? (',' <lexp>)* ')' ; \n"
 " \n"
 " term : <factor> ; \n"
 " lexp : <term> ; \n"
 " \n"
 " stmt : \"return\" <lexp>? ';' \n"
 " | <ident> '(' <ident>? (',' <ident>)* ')' ';' ; \n"
 " \n"
 " type : (\"i1\" | /(i|u)(8|16|32|64)/ | /f(16|32|64)/) ;\n"
 " typeident : <type> <ident> ; \n"
 " args : <typeident>? (',' <typeident>)* ; \n"
 " body : '{' <stmt>* '}' ; \n"
 " procedure : <type> <ident> '(' <args> ')' <body> ; \n"
 " lang : /^/ <procedure>* /$/ ; \n",
 Ident, Number, Factor, Term, Lexp, Stmt, 
 Type, Args, Body, Procedure, Lang, NULL);
 
 if (err != NULL) {
 mpc_err_print(err);
 mpc_err_delete(err);
 exit(1);
 }

 const int result = mpc_parse(filename, source, Lang, out_result);

 mpc_cleanup(11, Ident, Number, Factor, Term, Lexp, Stmt, 
 Type, Args, Body, Procedure, Lang);

 return result;
}

int mpc2llvm(const char* filename, mpc_ast_t *ast) {
 LLVMModuleRef module = LLVMModuleCreateWithName(filename);

 // the root is always called '>'
 assert('>' == ast->tag[0]);

 // the first child is a regex for '^'
 assert(0 == strcmp("regex", ast->children[0]->tag));

 // the last child is a regex for '$'
 assert(0 == strcmp("regex", ast->children[ast->children_num - 1]->tag));

 for (int i = 1; i < (ast->children_num - 1); i++) {
 mpc_ast_t *procedure = ast->children[i];

 // our root nodes are procedures
 assert(0 == strcmp("procedure|>", procedure->tag));

 // we need to have at least 5 children in a procedure
 assert(5 <= procedure->children_num);

 mpc_ast_t *return_type = procedure->children[0];

 assert(0 == strcmp("type|regex", return_type->tag));

 mpc_ast_t *name = procedure->children[1];

 assert(0 == strcmp("ident|regex", name->tag));

 assert(0 == strcmp("char", procedure->children[2]->tag));
 assert('(' == procedure->children[2]->contents[0]);
 assert(0 == strcmp("char", procedure->children[3]->tag));
 assert(')' == procedure->children[3]->contents[0]);

 LLVMTypeRef funcType = LLVMFunctionType(
 LLVMInt32Type(), 0, 0, false);

 LLVMValueRef func = LLVMAddFunction(
 module, name->contents, funcType);

 LLVMBasicBlockRef block = LLVMAppendBasicBlock(func, "");

 LLVMBuilderRef builder = LLVMCreateBuilder();

 LLVMPositionBuilderAtEnd(builder, block);

 mpc_ast_t *body = procedure->children[4];

 assert(0 == strcmp("body|>", body->tag));

 assert(0 == strcmp("char", body->children[0]->tag));
 assert('{' == body->children[0]->contents[0]);
 assert(0 == strcmp("char", body->children[body->children_num - 1]->tag));
 assert('}' == body->children[body->children_num - 1]->contents[0]);

 for (int k = 1; k < (body->children_num - 1); k++) {
 mpc_ast_t *stmt = body->children[k];

 assert(0 == strcmp("stmt|>", stmt->tag));

 assert(0 < stmt->children_num);

 if ((0 == strcmp("string", stmt->children[0]->tag)) &&
 (0 == strcmp("return", stmt->children[0]->contents))) {
 // we found a return statement

 if (0 == strcmp("lexp|term|factor|literal|regex", stmt->children[1]->tag)) {
 LLVMValueRef literal = LLVMConstIntOfString(
 typeMap[return_type->contents],
 stmt->children[1]->contents, 10);

 LLVMBuildRet(builder, literal);
 } else if (0 == strcmp("lexp|term|factor|>", stmt->children[1]->tag)) {
 mpc_ast_t *factor = stmt->children[1];

 if (0 == strcmp("ident|regex", factor->children[0]->tag)) {
 const char * ident_name = factor->children[0]->contents;

 if ('(' == factor->children[1]->contents[0]) {
 // we have a function call!
 assert(')' == factor->children[2]->contents[0]);

 LLVMValueRef called_func = LLVMGetNamedFunction(
 module, ident_name);

 LLVMValueRef ret = LLVMBuildCall(builder, called_func, 0, 0, "");

 LLVMBuildRet(builder, ret);
 }
 }
 }
 }
 }
 }

 LLVMDumpModule(module);

 return 0;
}

int main() {
 const char * filename = "foo.neil";
 mpc_result_t result;

 if (parse(filename, src, &result)) {
 const int r = mpc2llvm(filename, (mpc_ast_t *)result.output);

 mpc_ast_print((mpc_ast_t *)result.output);
 mpc_ast_delete((mpc_ast_t *)result.output);

 if (r) {
 printf("mpc -> llvm error!\n");
 return -1;
 }

 return 0;
 } else {
 mpc_err_print(result.error);
 mpc_err_delete(result.error);

 return -1;
 }
}