I’ve been feeling a little overwhelmed by the languages I generally code in (C#, C++, Rust) and wanted to get my teeth into something a bit simpler again (obviously C!). One thing I’ve never been that happy with in C was the lack of a good command line argument parser. There is a great list of option parsers that @nothings has curated over the years, but most of them are C++, and the ones for C are just a bit underwhelming to me.

Since the thing I’m working on requires a command line application I thought ‘Why not write my own?' and herein lies the fun little trip I took.

The Requirements

So I want something that was:

  • Written in C entirely.
  • Does not dynamically allocate any memory to store the arguments.
  • Can automatically generate the --help output for the supported options.

To keep things simple I decided that short form options and long form options would have a defined structure:

  • They can take no args and are of the form of -h and --help.
  • Or if they take an argument they are structured like -o <arg> or --output=<arg>.
  • They can only be specified zero or one times.

I wasn’t aiming to make some library that I could parachute into any project, more that I just wanted something simple that I could own myself.

Abusing Include Files

I really dislike anything that causes me to copy and paste code around. My distaste stems from the fact that you regularly want to then update all N places where you copied something, and it becomes onerous to keep these in check. Concretely for doing a command line parser, I didn’t want to copy and paste code any time I added a new command line option.

Instead I wanted some way I could define all my options in a standard format and the rest of the tooling would pick this up and turn it into what I wanted. The only real way to do this in C is with the preprocessor. Since I knew that I’d be wanting to use the same list of options for multiple purposes (EG. I’d want to use them to parser argv but also print out their descriptions for --help), I decided to use an include file - a .inc. In general the notion for .inc files is that they have some preprocessor macro in their body that the includer defines before including the .inc file. Then they can include the .inc file multiple times and change what the macro does. This means that unlike most includes, .inc files do not have include guards / pragma once to stop multiple includes of them - because including it multiple times is exactly what we want to do.

#ifndef OPTION
#error OPTION(short-name, long-name, number-of-trailing-args, description) was not defined!
#endif

OPTION(o, output, 1, "the output file to write to")
OPTION(h, help, 0, "print all the options and exit")

#undef OPTION

Here is my really simple options.inc that I used. It firstly includes a little if check to ensure that the required OPTION macro was specified, with a description of how the macro is intended to be used. Then it contains the list of the options that I have. At present just two options - one to specify where the output of the tool will go, and the other to display the help message and exit. Note the minor quality of life thing whereby the .inc file undefined the OPTION macro - this just saves you either doing it yourself after each include of the .inc file, or having the compiler warn about multiply defines macros!

Now that I have this I can use it to actually implement the parsing.

Parsing Argv

So the first thing I wanted to ensure was that my OPTION macros were not malformed - specifically that the ‘number-of-trailing-args’ was either 0 or 1. Even though I’m the only one working on this and I hopefully shouldn’t be dumb, I’ve found over the years that any notion that you won’t be dumb in some way is generally disproven. We are all stupid meatbags at times and so while you are hyper aware of the constraints it is best to be as defensive a coder as possible. Your future self will thank you!

#define OPTION(s, l, n, d)                                                     \
  bootstrap_assert(n <= 1,                                                     \
                   "option" #s "/" #l " must have 0 or 1 trailing arguments");
#include "options.inc"

So you can see that I just #define the OPTION macro and in this instance make it call my custom assert. Then I just have to include the .inc file and it’ll do this for every option I’ve defined.

  const char *const arg_is_set = "set";
#define OPTION(s, l, n, d) const char *arg_##s = 0;
#include "options.inc"

Next I needed a location to store the results of each of my variable. Using the same pattern again I can define variables of the form arg_h for the help argument, arg_o for the output, etc. For arguments that do not have any arguments I added a single arg_is_set here too - this we will use later to check whether the argument was set or not.

Now to the parser. This one is a bit of a mouthful I’ll just warn you up front - it actually shows one of the big downsides to code generation with the preprocessor in C/C++ in that code in preprocessors is very difficult to debug within both compiler errors and when using a debugger. So remember that I didn’t want any runtime memory allocation. So the one place you generally would use this is for positional arguments. Think of things like the list of files you’d pass to a compiler, you can have N of these and they don’t require options to be passed in. To get this to work I’m going to reuse argv to store the list of the positional arguments. Since each positional argument always arrived in argv in the first place, it stands to reason that they can definitely fit!

  int positional_arguments = 0;

  for (int i = 1; i < argc; i++) {
#define OPTION(s, l, n, d)                                                     \
  {                                                                            \
    if (0 == strcmp("-" #s, argv[i])) {                                        \
      bootstrap_assert(arg_##s == 0, "option" #s "/" #l                        \
                                     " cannot be specified multiple times");   \
      if (n == 0) {                                                            \
        arg_##s = arg_is_set;                                                  \
      } else {                                                                 \
        bootstrap_assert((i + 1) < argc, "trailing argument for option" #s     \
                                         "/" #l " was not provided");          \
        arg_##s = argv[i + 1];                                                 \
        i++;                                                                   \
        continue;                                                              \
      }                                                                        \
    } else if (0 == strncmp((n == 0) ? "--" #l : "--" #l "=", argv[i],         \
                            strlen("--" #l "="))) {                            \
      bootstrap_assert(arg_##s == 0, "option" #s "/" #l                        \
                                     " cannot be specified multiple times");   \
      if (n == 0) {                                                            \
        arg_##s = arg_is_set;                                                  \
      } else {                                                                 \
        arg_##s = argv[i] + strlen("--" #l "=");                               \
        continue;                                                              \
      }                                                                        \
    }                                                                          \
  }
#include "options.inc"

    // Otherwise if we didn't find our option then it is a position argument!
    argv[positional_arguments++] = argv[i];
  }

So let’s break down the above:

  • First I do the check for the small option form s, by string comparing "-" #s against the argument.
    • I then check if the option has been specified before which I’m not handling.
    • If the option has zero trailing arguments, then just set the global variable for arg_##s to the arg_is_set helper I defined before.
    • If the option has one trailing argument, then I check there is definitely been a provided argument (EG. someone didn’t have -o at the end of the argument list!).
    • And store the value in argv[i + 1] into the variable for the option.
    • I need to bump i to skip the next argument because we’ve consumed it as the trailing argument of this one.
    • And then continue, this just gives us a nice easy way to skip the backup code to store the argument as a positional one.
  • Next I do the check for the long form l - which requires a slightly different string compare. Because long form options that take an argument come like --output=<arg> I need to check if the option had a trailing argument, and then use a strncmp for this.
    • The only other difference is when there is a trailing argument I need to not bump i this time - because the argument is included in the option. To get at the argument we just get the pointer to the first character after the --output= - by using strlen.
  • And then the fallback case is to just store the argument into the start of argv as a positional argument.

Using the Parsed Options

So the first obvious way to use the options is to print the help:

  if (arg_h) {
    printf("usage bootstrap");
#define OPTION(s, l, n, d)                                                     \
  if (n == 0) {                                                                \
    printf(" [ %s | %s ]", "-" #s, "--" #l);                                   \
  } else {                                                                     \
    printf(" [ %s <arg> | %s=<arg> ]", "-" #s, "--" #l);                       \
  }
#include "options.inc"
    printf(" <args>\n\n");

#define OPTION(s, l, n, d)                                                     \
  if (n == 0) {                                                                \
    printf(" %s | %s %s\n", "-" #s, "--" #l, d);                               \
  } else {                                                                     \
    printf(" %s <arg> | %s=<arg> %s\n", "-" #s, "--" #l, d);                   \
  }
#include "options.inc"

    return 0;
  }

Breaking this down I:

  • Check that the arg_h variable is non-null - which means the user requested help.
  • Use an OPTION macro to print the short usage guide - something like usage bootstrap [ -o <arg> | --output=<arg> ] [ -h | --help ] <args>.
  • Then I use a second macro to print the longer usage guide with the descriptions.

In the end this produces a help like:

usage bootstrap [ -o <arg> | --output=<arg> ] [ -h | --help ] <args>

 -o <arg> | --output=<arg> the output file to write to
 -h | --help print all the options and exit

The only thing that’d make this nicer would be to align the descriptions. But for now its good enough for me.

Just to test that all the options were set and the positional arguments were stored like I expected, I added a second macro during debugging:

#define OPTION(s, l, n, d)                                                     \
  fprintf(stderr, "%s = %s\n", #s, (arg_##s != 0) ? arg_##s : "not set");
#include "options.inc"

  fprintf(stderr, "Positional arguments:\n");
  for (int i = 0; i < positional_arguments; i++) {
    fprintf(stderr, "%s\n", argv[i]);
  }

All this does is print whether each option was set or not, and then the list of positional arguments we recovered from the command line.

In total this is around 100 lines of C and does what I want in a very simple and easily controlled manner. I’m very happy to be able to go back to basics and write some fun C code like this from time to time!