24 Aug

json.h performance (vs other C/C++ JSON parsers)

One of the first questions I had when I open sourced my json.h library was how fast it can parse compared to other commonly used JSON libraries around. I’ve gone through all of the JSON libraries in C/C++ that I find tolerable (and I’ll document which ones I wouldn’t touch with a barge pole and why too!) and performed a performance comparison on them.

First off, I tested the following JSON C/C++ libraries;

  • gason – C++ library with MIT license.
  • json.h – my own JSON library, written in C and licensed under the unlicense.
  • minijson_reader – C++11 library with BSD 3-Clause license.
  • rapidjson – C++ library with MIT license.
  • sajson – C++ library with MIT license.
  • ujson4c – C library with BSD license.

And I rejected the following JSON C/C++ libraries (with some comments reasoning why I haven’t used them);

  • jsmn – I initially did add this to the benchmark, but I found that when testing large input files, the time taken to parse was exponentially linked to the file size. Thus after around 1 minute trying to parse a 190MB file I made my benchmark timeout, and have removed this library from my benchmarking.
  • cson – Uses some bizarre SCM called Fossil SCM which was enough of a roadblock to stop any further explorations.
  • frozen – GPL license means it isn’t worth considering any further.
  • jansson – absolutely crazy CMake files required to get it to build.
  • js0n – only allows for searching of arbitrary tokens in a JSON stream, does not do a full parse and exploration of the DOM.
  • jsoncpp – another crazy CMake mess, not touching this library.
  • json++ – requires Flex/Bison, stupid requirement for a simple library.
  • json-c – Absolutely no idea what is going on with this repository. Clusterfuck is too kind an explanation for what is going on.
  • mjson – Requires compile time knowledge of JSON structure (only allows parsing of ‘known’ JSON structures).
  • nxjson – Another GPL licensed library means it is a write-off.
  • vjson – code did not have a license, too risky to explore it further.

I ran two kinds of testing, the cost of parsing some JSON files and storing them in the intermediate form, and also the cost of parsing then traversing the JSON to count all the numbers in the JSON structure.

json-generator

 

The first JSON file I tested was a 9KB file. In this test my own json.h library is second worst of the six tested, a full 4x slower than the best performing gason library.

AllSetsOuch, on the 12.9MB file we perform worst of all. The reason minijson_reader seems to beat us here is that the input file AllSets.json is simply one large JSON array containing identical JSON objects in each element. The bigger the depth of the JSON objects (EG. objects, within arrays, within objects, etc.) the worst minijson_reader performs.

sf-city-lots-jsonOn the largest of our inputs, the large 190MB file, we perform second worst again. This library has many levels of objects and arrays, which is why minijson_reader performs atrociously bad.

One thing to note among all the examples is the difference between the parsing, and parse + traversing. Our library has a much narrower gap between these two than the other libraries, which means once the parsing has completed, iterating through our in memory representation is much quicker than the alternatives, I just need to work out now how to improve the parsing speed!

So in conclusion, my library is slower than I’d like – the only heartening thing is the speed of the library is a constant (at least I don’t have exponentially bad parsing!). I’ve already started to profile and re-write the offending part of the parsing, something I’ll cover in a future blog post.

Please check out the library here – I am more than happy to accept merge requests and feedback (I’ve already changed things based on user feedback!).

 

 

 

18 Aug

json.h

Back in May, the rather awesome imgui creator @ocornut asked the following;

And it got me thinking – why isn’t there a simple JSON reader/writer lib in the same vein as the stb_* libraries that performs a single call to malloc to encode the state? I couldn’t find one, so I decided to write my own.

I’m introducing json.h – my one header/one source json library that will parse a JSON source into a single allocation buffer, and also has functions to write out the minified version of the JSON, and a pretty print function (for human readable JSON).

Lets go through a worked example, lets take the following trivial JSON;

{“a” : [123, null, true, false, “alphabet”]}

The above example covers all the core concepts inherent within JSON, so serves as a good coverage tool for our parsing. The above will be parsed (using json_parse) into a single-malloc’ed buffer, with the start of that buffer being a json_value_s* – a pointer to the root value. The Document Object Model (DOM) for this JSON is;

chart2

And the single allocation in-memory view of the above is;

chart3

In terms of speed of the library I’ve used these JSON files for reference;

Which produces the following chart;

chartCurrently, parsing is averaging around 55 MB/s, pretty writing around 300 MB/s and minified writing around 500 MB/s on my Intel Core i7-2700k 3.5GHz.

My next step will be to look into my parsing approach and see if anything can be done to speed up parsing of JSON!

I hope this library is useful, and I’m happy to have any comments/critiques on my approach.