9 unstable releases (4 breaking)
| 0.5.0 | Dec 16, 2025 |
|---|---|
| 0.4.0 | Oct 28, 2025 |
| 0.3.0 | Oct 21, 2025 |
| 0.2.3 | Jul 21, 2025 |
| 0.1.1 | Jan 20, 2023 |
#146 in Parser implementations
193,869 downloads per month
Used in 34 crates
(4 directly)
64KB
1K
SLoC
High-fidelity JSON lexer and parser
hifijson is a Rust crate that provides a high-fidelity JSON lexer and parser.
In this context, high-fidelity means that unlike many other parsers,
hifijson aims to preserve input data very faithfully, in particular numbers.
- Zero dependencies: Not even
allocis obligatory! no_std: Can be used on embedded systems without standard library.- Reading from slices and from byte iterators: This is important if you are writing an application that should read from files as well as from standard input, for example.
- Performance
- Portability
- Mostly zero-copy deserialisation:
Due to the presence of escaped characters in JSON strings,
full zero-copy deserialisation of JSON data is not possible.
However,
hifijsonattempts to minimise allocations in presence of strings. - Deserialisation via
serde
Comparison to serde_json
serde_json is currently the most popular JSON parser written in Rust.
However, there are some deficiencies of serde_json:
- Numbers can be parsed with arbitrary precision
(via the feature flag
arbitrary_precision), but they cannot be deserialised (by implementing theDeserializetrait) to anything else than aserde_json::Value#896. Instead, one has to deserialize toserde_json::Value, then convert that to something else, which costs time. - When using
arbitrary_precision,serde_jsonincorrectly parses or rejects certain input; for example, it incorrectly parses{"$serde_json::private::Number": "1.0"}as number 1.0 and incorrectly rejects{"$serde_json::private::Number": "foo"}. I consider both of these to be bugs, but although they are known, theserde_jsonmaintainers are "fine sticking with this behaviour". - The behaviour of
serde_jsoncan be customised to some degree via feature flags. However, this is a relatively inflexible solution; for example, you can specify whether to preserve the order of keys in objects by using thepreserve_orderfeature flag, but what happens when you have an object that contains the same key several times, for example{"a": 1, "a": 2}? Currently,serde_jsonparses this as{"a": 2}, silently discarding information. What if you would like to fail in this case? Well, you can just implementDeserializeyourself. Except ... that you cannot, if you are usingarbitrary_precision. Ouch.
You should probably use serde_json if you want to
serialise / deserialise your existing Rust datatypes.
However, if you want to
process arbitrary JSON coming from the external world,
require some control over what kind of input you read, or
just care about fast build times and minimal dependencies,
then hifijson might be for you.
There is also serde-json-core for embedded usage of JSON;
however, this crate neither supports
arbitrary-precision numbers,
reading from byte iterators, nor
escape sequences in strings.
Performance
cargo run --release --example bench
measures the time that serde_json and hifijson take to
parse large JSON data to their respective Value types.
For better comparability, I enabled serde_json's arbitrary_precision flag,
which parses numbers to strings like hifijson.
Still, this is somewhat of an apples-to-oranges comparison because
a serde_json Value uses String for numbers and strings where
a hifijson Value uses &str for numbers and Cow<str> for strings.
This gives hifijson an advantage for the "pi" and "hello" benchmarks,
but a disadvantage for the "hello-world" benchmark.
| Benchmark | Size | serde_json |
hifijson |
|---|---|---|---|
| null | 47 MiB | 241 ms | 317 ms |
| pi | 66 MiB | 1138 ms | 648 ms |
| hello | 76 MiB | 702 ms | 543 ms |
| hello-world | 143 MiB | 816 ms | 1133 ms |
| arr | 28 MiB | 473 ms | 387 ms |
| tree | 39 MiB | 1413 ms | 1601 ms |
The results are mixed: While hifijson
is faster on numbers, strings not containing escape sequences, and deeply nested arrays, it
is slower on keywords (null, true, false) and strings with escape sequences.
Also note that serde_json parses numbers much faster without arbitrary_precision.
Suggestions on how to improve hifijson's performance are welcome. :)
Lexer
Writing a JSON parser is remarkably easy --- the hard part is actually lexing.
This is why hifijson provides you first and foremost with a lexer,
which you can then use to build a parser yourself.
Yes, you. You can do it.
hifijson tries to give you some basic abstractions to help you.
For example, the default parser is implemented in less than 40 lines of code.
Default parser
Parsing JSON is a minefield, because the JSON standard is underspecified or downright contradictory in certain aspects. For this reason, a parser has to make certain decisions which inputs to accept and which to reject.
hifijson comes with a default parser that might be good enough for many use cases.
This parser makes the following choices:
- Validation of strings: The parser validates that strings are valid UTF-8.
- Concatenation of JSON values:
Many JSON processing tools accept multiple root JSON values in a JSON file.
For example,
[] 42 true {"a": "b"}. However, defining formally what these tools actually accept or reject is not simple. For example,serde_jsonaccepts[]"a", but it rejects42"a". The default behaviour of this parser is to accept any concatenation ofJSON-text(as defined in RFC 8259) that can be somehow reconstructed. This allows for weird-looking things likenulltruefalse,1.0"a", but some values cannot be reconstructed, such as1.042.0, because this may be either a concatenation of1.0and42.0or a concatenation of1.04and2.0. In that sense,hifijsonattempts to implement a policy that is as permissive and easily describable as possible.
Furthermore, the parser passes all tests of the JSON parsing test suite.
Fuzzing
To run the fuzzer, install cargo-fuzz.
Then, if you do not wish to use the nightly Rust compiler as default,
run the fuzzer by cargo +nightly fuzz run all.
Dependencies
~165KB