I’ve been continuing my foray into Rust, and one thing I’ve been interested in is global allocators - Rust’s builtin support for switching the allocator used by an application.
One thing I’ve been wanting to try for a while is to see if a bump allocator used in conjunction with a short running application could result in performance gains. I couldn’t find while perusing crates.io any crate that added a global allocator that just bumps, so I’ve written my own - bump_alloc.
What is a bump allocator?
A bump allocator is a simple allocator where every new allocation is sourced from a ‘bump’ of the previously allocated value.
|Allocation 0||Allocation 1||Allocation 2||…|
When an allocated region is done with, and we dealloc it (free/delete the memory in the language of C/C++), the memory is not actually freed.
This means our bump allocator is effectively lossy with memory - we have no way to retrieve and reuse memory that was not required anymore. But the whole point of a bump allocator is that there is a class of applications that are short lived or constrained with their memory usage that we are happy to trade off not being able to reuse previously allocated memory for a little extra performance in the application.
How to use?
In your Cargo.toml you simply add:
[dependencies] bump_alloc = "0.1.0"
And then in your Rust application or library you simply include the allocator like:
extern crate bump_alloc; #[global_allocator] static A : BumpAlloc = BumpAlloc::new();
By default the bump allocator will reserve one gigabyte of allocatable memory for its use. But crucially we are not actually allocating one gigabyte of memory, we are simply reserving the ability to allocate that much by using mmap on Unix-based systems, and VirtualAlloc on Windows systems.
If you need more memory, you can use:
#[global_allocator] static A: BumpAlloc = BumpAlloc::with_size(1024 * 1024 * 4);
Which will reserve the number of bytes of memory as you require - so in the example above four megabytes will be reserved.
If you go over the amount of memory reserved then handle_alloc_error is called and the application will stop.
How is the performance?
So what started me thinking about the performance characteristics of a bump allocator was LLVM’s FileCheck. For those who don’t know - it is part of LLVM’s testing framework, and basically checks that the compiler’s output matched what was expected. The FileCheck application is short lived, and runs a thousand or more times for a single run of LLVM’s testing. This sounded like a perfect little application for me to use with a bump allocator.
I hooked up LLVM’s FileCheck to build the C++ application via Rust and using the cc crate, routing calls to new/delete into Rust’s global allocator via my cpp_new crate, and finally using my bump_alloc crate to replace the global allocator with my bump. I then compared the performance of this application versus LLVM’s build FileCheck using a torture test of my creation - a 500,000 line random assortment of characters that I’ll input into FileCheck:
And use a regex CHECK: to see if the output matches:
The results of comparing LLVM’s FileCheck versus my own bump allocated variant were taken using hyperfine, and are as follows:
|LLVM FileCheck (s)||bump_alloc FileCheck (s)||difference (%)|
So we’re 1.05x faster for just changing the memory allocation strategy! So now that we know we can go some amount faster by just changing the allocator, we need to understand how much extra memory are we using to achieve this:
|LLVM FileCheck (megabytes)||bump_alloc FileCheck (megabytes)||difference (%)|
So we are using quite a bit more memory to achieve this modest performance increase - 1.75x more!
We can achieve some performance improvement in short running applications by using a bump allocator (as others have found before me, I’m not claiming to have invented the technique), but obviously there are costs to memory for doing this. There are definitely a class of applications that will benefit from using my new bump_alloc crate, and I hope the crate proves useful to a bunch of you. The crate is available under the permissible CC0 1.0 Universal license, so please get downloading!