23 Jul

utf8.h progress!

Since my last post introducing utf8.h I’ve been frantically working on fleshing out the core utf8* functions to match the str* ones, and also listening to developer feedback!

Firstly, you can check out the one header C/C++ library here – utf8.h.

  • @daniel_collin suggested adding an ASCII only utf8casecmp, which has been added. I’m looking into extending this to support more of the characters in Unicode (the most obvious ones that I can understand are ASCII characters with accents).
  • @mcclure111 suggested I actually document the code where appropriate, and I’ve undertake efforts to remedy this.

Next up I plan to tackle the utf8canon that @KmBenzie suggested, to canonicalize poorly formed utf8 codepoints into the correct form (for example, utf8 ascii values can be encoded erroneously in a 4-byte codepoint which is regarded as poor form).

07 Jul


I’ve been tinkering around with a new one header library (having been inspired by the awesome stb libraries that @nothings created), a utf8 string library https://github.com/sheredom/utf8.h.

The utf8.h header adds some new utf8* prefixed functions that match the str* functions you would find in string.h, except that they are all written for utf8 exclusively. For example, utf8len will return the number of utf8 codepoints found in the provided utf8 string.

The code works for gcc and clang currently (I haven’t had the time to port it to Windows and MSVC), and I would love it if you would check it out!