I asked on twitter the other day whether anyone had a hashmap that could work with string slices - parts of a string that are not null-terminated and thus have to have an explicit length to accompany the pointer.

I didn’t get any responses on this so I commented with a follow-up that I had grabbed some code written a few years back by the awesome Pete Warden of Google fame, and morphed it into what I required:

Authors note: this part used to contain a tweet, But Hellish Tusk / Space Karen / Elon Musk butchered the platform so it is now gone.

Much to my surprise Pete was happy not only for me to do these modifications, but also since he was no longer maintaining the hashmap code he’d happily redirect users to any effort I put together:

Authors note: this part used to contain a tweet, But Hellish Tusk / Space Karen / Elon Musk butchered the platform so it is now gone.

So I’ve done the work and I’m now introducing my latest library, nearly entirely not written by me, hashmap.h!

Null-Terminated to Slices

So Pete’s code was pretty solid as is. The main difference was that it relied on null-terminated strings as the key, whereas I wanted to use string slices. My first modification was to change the entry points that used a key to instead take a key and a length. So hashmap_put went from:

extern int hashmap_put(map_t in, char* key, any_t value);

To:

HASHMAP_WEAK int hashmap_put(struct hashmap_s *const hashmap,
                             const char *const key,
                             const unsigned len,
                             void *const value);

You’ll notice that I also went const mad (and again wished that const was the default and mutable or mut was required on variables - sigh!), and removed the typedef for any_t. The last point is just a general stylistic thing I have for my libraries - I really dislike that APIs like Windows.h abstract you so far away from the underlying types with all the SHOUTY CASE LDPWORD’s and such, that I generally try and have no typedef’s if I can get away with it.

Supports UTF-8 Keys

Pete’s code also used the string.h ASCII-string functions of C to compare whether the key ever matched. Since I wanted to use this hashmap in conjunction with UTF-8 strings (using my utf8.h library) I instead used memcmp. Now that I have an explicit length for the string slice this became possible.

Single Header

The last major change I made was to smush the hashmap.c and hashmap.h files together into a single header. I am pretty obsessed with single headers as a way to get round the botched nature of C and C++’s package story (or lack thereof). This meant leveraging some exists hacks to stop the compiler complaining about multiple function definitions (by using weak function references instead).

The License

Pete’s code was already marked explicitly public domain - do what you want with it. I’ve found over the years that while public domain is all fine and well, having an explicit license like the unlicense or the CC0 can make lawyers happy because there is at least some legal text to reference. It also makes GitHub’s license scraping happier because these licenses are already ones that it knows about.

So I’ve licensed this header under the unlicense - it matches what my existing single-header projects use and is something that my users already favour.

Hashmap All The Things

So I’m pretty happy with the code I mostly did not write - and I hope that my packaging and testing of the header allows it to be more widely useful to some of you fine folks out there.

A big thanks again to Pete Warden for writing this code and being so gracious about me making these changes. I hope this proves useful to some of you out there too.