I asked on twitter the other day whether anyone had a hashmap that could work with string slices - parts of a string that are not null-terminated and thus have to have an explicit length to accompany the pointer.
I didn’t get any responses on this so I commented with a follow-up that I had grabbed some code written a few years back by the awesome Pete Warden of Google fame, and morphed it into what I required:
At the moment I'm using https://t.co/sOfvd7HnDR with a bunch of local changes. Might just morph this into a single header hashmap (since it is public domain!), and reference @petewarden as the source of truth?— Neil Henning (@sheredom) June 26, 2020
Much to my surprise Pete was happy not only for me to do these modifications, but also since he was no longer maintaining the hashmap code he’d happily redirect users to any effort I put together:
Please do, the current version is unmaintained by me, I'd be happy to update the readme to point to yours.— Pete Warden (@petewarden) June 26, 2020
So I’ve done the work and I’m now introducing my latest library, nearly entirely not written by me, hashmap.h!
Null-Terminated to Slices⌗
So Pete’s code was pretty solid as is. The main difference was that it relied on
null-terminated strings as the key, whereas I wanted to use string slices. My
first modification was to change the entry points that used a key to instead
take a key and a length. So
hashmap_put went from:
extern int hashmap_put(map_t in, char* key, any_t value);
HASHMAP_WEAK int hashmap_put(struct hashmap_s *const hashmap, const char *const key, const unsigned len, void *const value);
You’ll notice that I also went
const mad (and again wished that
the default and
mut was required on variables - sigh!), and
removed the typedef for
any_t. The last point is just a general stylistic
thing I have for my libraries - I really dislike that APIs like
abstract you so far away from the underlying types with all the SHOUTY CASE
LDPWORD's and such, that I generally try and have no typedef’s if I can get
away with it.
Supports UTF-8 Keys⌗
Pete’s code also used the
string.h ASCII-string functions of C to compare
whether the key ever matched. Since I wanted to use this hashmap in conjunction
with UTF-8 strings (using my utf8.h
library) I instead used
memcmp. Now that I have an explicit length for the
string slice this became possible.
The last major change I made was to smush the hashmap.c and hashmap.h files together into a single header. I am pretty obsessed with single headers as a way to get round the botched nature of C and C++'s package story (or lack thereof). This meant leveraging some exists hacks to stop the compiler complaining about multiple function definitions (by using weak function references instead).
Pete’s code was already marked explicitly public domain - do what you want with it. I’ve found over the years that while public domain is all fine and well, having an explicit license like the unlicense or the CC0 can make lawyers happy because there is at least some legal text to reference. It also makes GitHub’s license scraping happier because these licenses are already ones that it knows about.
So I’ve licensed this header under the unlicense - it matches what my existing single-header projects use and is something that my users already favour.
Hashmap All The Things⌗
So I’m pretty happy with the code I mostly did not write - and I hope that my packaging and testing of the header allows it to be more widely useful to some of you fine folks out there.
A big thanks again to Pete Warden for writing this code and being so gracious about me making these changes. I hope this proves useful to some of you out there too.