I’ve been tinkering around with mpc and llvm recently - just to satisfy a few simple questions:

  • How easy is it to use mpc?
  • How easy is the LLVM c api to use?
  • How easy is it to connect a parser generator to an LLVM Module?

So given the above aims, I’ve embarked on creating my own stupid little language, that I’m calling ‘neil‘  - Not Exactly an Intermediate Language. It is utterly by chance that the name ‘neil‘ happens to be my own name, if you’ve believe that gratuitous lie!

A ‘neil‘ program currently consists of one allowed type - a 32 bit integer named i32, function definitions, function calls, and being able to return the results of one function from another. In short, it is (currently, and most likely permanently) a really stupid language.

An example legal ‘neil’ program is:

So lets get right into how we construct this grammar in mpc. The mpc grammar for our ‘neil‘ language is as follows:

It’s a very simple grammar, I looked at the mpc - smallc example to work out what to do. To parse using mpc, I have the following function:

With this, we can then iterate through a successfully parsed ‘neil‘ input using the:

type that mpc provides. The struct elements are as follows:

  • tag - the string name of the ast node’s type
  • contents - what the ast node is pointing at in the original source input
  • state - the line information for the ast node
  • children_num - the length of the children array
  • children - an array of AST node’s that are children of the current node

Next up, we need to create what we need from LLVM to produce a module for the current file. I’m using the LLVM C API because I’ve never used it before, having only ever used the C++ API in my day to day stuff at work, so I thought it would be useful to have a look at it.

First up we need an LLVM Module - this is an encapsulation of a ‘neil‘ input file for our uses. I use:

To get a module to work with. Then, for each AST node that is a procedure, I create a corresponding LLVM function with:

(Note I’m cheating for now because I know my functions return i32, and take no params, don’t do this in production code!).

Then, since I have no control flow within my functions, I can create one basic block to hold the body of the function, and an IR builder to help us make the instructions within the basic block, with:

And with this we can begin to parse the body of the functions!

I parse the return statement (the only allowed statement within our functions), and check if it returns a literal or the result of a call to a function:

And the result is:

From the example ‘neil‘ file I gave above.

TL;DR mpc is pretty cool, the LLVM C API is very easy to use, and I’ll be fleshing out my ‘neil‘ language in future blog posts once I try handling a more complicated input grammar.

PS the full source for the example is below: