One of the things I was going to talk about in my COVID cancelled EuroLLVM 2020 talk was about a neat little trick we used to get the LLVM legacy pass manager to be significantly faster. So just in time for the legacy pass manager to finally sail off into the coding afterlife I thought I’d share the trick.

TL;DR running less passes makes things compile faster.

The longer version requires a bit more nuance though.

LLVM’s PassManagerBuilder

LLVM has a default pass optimization pipeline that you create with the PassManagerBuilder class:

llvm::legacy::PassManager passManager;

llvm::PassManagerBuilder passManagerBuilder;

// We can set the optimization level (-O0 .. -O3)
passManagerBuilder.OptLevel = 3;

// Or the size level (-Os -Oz)
passManagerBuilder.SizeLevel = 0;

// We can add passes at certain extension points.
passManagerBuilder.addExtension(
  llvm::PassManagerBuilder::EP_Peephole,
  [](const llvm::PassManagerBuilder& passManagerBuilder,
     llvm::legacy::PassManagerBase& passManager)
  {
    passManager.add(burst_CreatePeepholePass());
  });

passManagerBuilder.populateModulePassManager(passManager);
passManager.run(*module);

The pros for using the PassManagerBuilder:

  • The best way to start running optimizations on your code
  • Can specify opt/size levels
  • Allows you to inject your own passes via extensions
  • Any LLVM update will have new passes ran at the correct times

And the cons:

  • Pass pipeline optimized for C/C++ content (it’s used by Clang)
  • Some passes will probably do nothing as a result for your language
  • TBAA was unconditionally added to the pass pipeline - even though HPC# doesn’t have typed-based aliasiing guarantees

Overall I’d argue that using the default pass pipeline is what you should do, unless you are willing to sink a decent chunk of time with every major LLVM version bump to check all the new passes and work out if/where they should fit in your custom pass pipeline.

Reducing The Number of Passes

We wanted to be able to use most of what the PassManagerBuilder provided, but cut out the passes that did nothing on Burst.

To do this we extended the PassManager class and added our own custom add function that would let us do something before actually adding the pass into the pass pipeline.

struct MyPassManager final : public llvm::legacy::PassManager {
  void add(llvm::Pass* const pass) override {
    const char* name = pass->getPassName();
    llvm::errs() << count <<   << name << “\n;
    count++;
    llvm::legacy::PassManager::add(pass);
  }

  unsigned count = 0;
} passManager;

llvm::PassManagerBuilder passManagerBuilder;
passManagerBuilder.populateModulePassManager(passManager);
printf(Total %u\n, passManager.count);

Which at the time I last ran this (in 2020, whatever LLVM version was active then!) would produce:

0 Target Transform Information
1 Target Pass Configuration
2 Simplify the CFG
...
Total: 84

So we had 84 passes being run in the pass pipeline.

Now with Burst we use golden file generation to ensure we do not regress on the optimizations that the compiler performs. This means that for every test in our test suite, we generate the assembly for it and compare it against the last known good assembly. We then verify any changes to this on each commit, to ensure we maintain good code quality always. There are over 100,000 tests that produce assembly into these gold files, which raised an interesting idea - I could selectively disable each pass, run all the tests, and if there are no gold file changes then the pass did nothing and could safely be removed.

To do this I used an environment variable DISABLE_PASS_N and skipped the passes by:

struct MyPassManager final : public llvm::legacy::PassManager {
  void add(llvm::Pass* const pass) override {
    if (count == skip) {
      count++;
      delete pass;
      return;
    }

    count++;
    llvm::legacy::PassManager::add(pass);
  }

  MyPassManager() {
    skip = atoi(getenv(DISABLE_PASS_N));
  }

  unsigned count = 0, skip = 0;
} passManager;

llvm::PassManagerBuilder passManagerBuilder;
passManagerBuilder.populateModulePassManager(passManager);

I could then re-run the test suite in a loop, running DISABLE_PASS_N=0 ./run_tests for each of the 84 passes, and if the test suite passed then the gold files weren’t changed.

How Many Could We Disable?

Turns out that 13 passes did nothing on Burst. And so to disable them we used this lovely hackeroo:

struct MyPassManager final : public llvm::legacy::PassManager {
  llvm::SmallDenseSet<llvm::AnalysisID, 16> skipSet;

  void add(llvm::Pass* const pass) override {
    if (skipSet.count(pass->getPassID()) != 0) {
      delete pass;
      return;
    }

    llvm::legacy::PassManager::add(pass);
  }

  MyPassManager() {
    // We have to create all the passes so we can get their IDs
    const llvm::Pass* const passes[] = {
      llvm::createTypeBasedAAWrapperPass(),
      llvm::createForceFunctionAttrsLegacyPass(),
      llvm::createInferFunctionAttrsLegacyPass(),
      llvm::createCallSiteSplittingPass(),
      llvm::createCalledValuePropagationPass(),
      llvm::createPruneEHPass(),
      llvm::createSpeculativeExecutionPass(),
      llvm::createLibCallsShrinkWrapPass(),
      llvm::createPGOMemOPSizeOptLegacyPass(),
      llvm::createEliminateAvailableExternallyPass(),
      llvm::createGlobalDCEPass(),
      llvm::createAlignmentFromAssumptionsPass(),
      llvm::createStripDeadPrototypesPass(),
    };
    
    for (const llvm::Pass* const pass : passes) {
      // Insert the ID into the skip set, then delete the pass!
      skipSet.insert(pass->getPassID());
      delete pass;
    }
  }
} passManager;

Note: Burst doesn’t have global variable support - you have to use SharedStatic’s to get access to global data that is shared between managed C# code and HPC# with Burst. So a bunch of the global optimization passes are expensive and useless for us.

How Much Faster Was Compilation

In the end this change made our test suite 1.64x faster, from 391s to 237s. This translates into happier Burst developers (we’re waiting on testing less), happier CI (it takes less time to run the tests on PRs), and happier Unity users (Burst compiled their code faster).

The Future

Since the fun filled days pre pandemic when I wrote this into the talk we’ve moved away from the above approach in Burst. This was because:

  • We’ve moved to the new pass manager for nearly all of our compilations now, and the new pass manager doesn’t have the same fudgeability as the old one
  • Even with the above reduction in passes, we decided that the cost to the team to maintain a custom pass pipeline was worth it to improve compile time even more. Our test suite has grown in size by 2.5x and we’ve reduced the run time of the test suite down to 150s - overall the compiler is about 4x faster than what I was going to present about in 2020

Even though this is soon to be dead tech with the death of the legacy LLVM pass manager, I quite liked that this tiny code change could result in such a big win for us and our users.