Parsing JSON in ReScript Part I: Prerequisites and Requirements
There are few things more satisfying than a slick, readable, and safe JSON parser. It's one of the joys of functional programming. Using a good JSON parsing pipeline can feel like magic. This series seeks to lift the veil and empower readers (and, importantly, my future self) to build their own customizable and extensible parsing libraries. This article, the first of several, will be a skimmable introduction to the subject as I see it.
Prerequisites
Unfortunately, I'm probably not yet a skilled enough writer to write this article for a very junior audience. In order to follow this series, you should be fairly familiar with
If you aren't familiar with all of these, feel free to read on, and if you get stuck on something, as always, feel free to open an issue or @ me, and I'll take another swing at it.
Examining the need for a custom parsing solution in ReScript
As always, it's reasonable to ask why I'm reinventing the wheel here. I have a few reasons for wanting a custom solution for my use case which I will enumerate here, but while I'm at it, I'd like to just say: I really love parsers. They're fun, and you may find you like them, too.
Why not just use the built-in Js.Json
parsing methods?
Let me say, first of all, that I'm going to use Js.Json
, but if you try to build a nontrivial parser out of the default Js.Json
module, you'll end up with quite a bit of nesting, and that quickly gets difficult to manage. I would want an additional wrapper around Js.Json
if my objects had more than just a couple of properties.
Why not use an existing solution?
A casual search of npm shows there are plenty of handy JSON parsing helpers in ReScript (formerly BuckleScript/ReasonML). They're all good! If they suit your use case, you should use one of them. However, there may be times when it doesn't suit your use case. I have had one of those use cases recently.
The API I'm calling used dates in ISO-8601, and I needed them in both ISO-8601 and in posix time. Libraries tend to pick one or the other.
The API I'm calling uses numeric strings instead of numbers.
Most libraries parse data into
option
monads, but I'd like to have some fairly granular logging so I can quickly tell what failed if my parser isn't configured right or if the API ships a breaking change, so I'd like to use aResult
type with a string error message.
The first two of these problems could be resolved if I separated my concerns more--I could have a separate record type that contains all the fields from the API as strings, the way the API presents it, and then define some additional translation layer to go from API models to my data models.
While I understand the argument for doing something like this, I don't think this is always the best route. Parsing date strings and numeric strings into dates and numbers is logic that belongs in the parsing layer, not in some additional, separate business layer.
Defining the requirements of our parsing library
The main point of any parsing library is to use functions to flatten the nested logic required to cover the success and failure cases of the decoding cases of each property. Secondly, as I've said above, I want this library to return a Result type with a nice error message I can decide to log if the parse fails.
I also want it to conform to the ReScript convention of a pipe-first structure, and unlike many pipelines, I'd like to start with some defaults and build the pipeline incrementally.
For reference, here's an example of how I'm using my own library. This is just an abbreviated version of Parsing.res which defines parsers for a few models.
open Belt.Float; // for * multiplication and / division
let initializeRollingCaseRate: Models.rawRollingCaseRate = {
dateStr: "",
posix: 0.,
caseRate: 0.
};
let parseRollingCaseRate =
(json: Js.Json.t): Belt.Result.t<Models.rawRollingCaseRate, string>
=> switch Js.Json.classify(json) {
| Js.Json.JSONObject(dict) => Belt.Result.Ok(initializeRollingCaseRate)
-> Decode.req("date", Decode.str, dict, (obj, dateStr) => {
...obj,
dateStr: dateStr |> Js.String.substring(~from=0, ~to_=10)}) //strip time.
-> Decode.req("date", Decode.posix, dict, (obj, posix) => {
...obj, posix: posix })
-> Decode.req("cases_rate_total", Decode.numeric, dict, (obj, caseRate) => {
...obj,
caseRate: caseRate })
| _ => Belt.Result.Error("Parse error: not an object. ");
};
let flatLog = (accumulator: array<'t>, item: Belt.Result.t<'t, 'error>) => {
switch item {
| Belt.Result.Ok(ok) => accumulator |> Js.Array.concat( [ ok ] )
| Belt.Result.Error(error) => {
Js.log(error);
accumulator
};
}
};
let parseRawRollingCaseRates = (json: Js.Json.t):
Belt.Result.t<array<Models.rawRollingCaseRate>, string>
=> switch Js.Json.classify(json) {
| Js.Json.JSONArray(jsons) => Belt.Result.Ok(jsons
|> Js.Array.map(parseRollingCaseRate)
|> Js.Array.reduce(flatLog, [])
)
| _ => Belt.Result.Error("Parse issue: root not array. ")
};
There's a lot going on here (perhaps too much). Just to break it down, the req
function takes a Result
of a record model, and, if that Result
is itself okay, it tries to read (1) the given string-identified property using (2) the given function that goes from a JSON object to a properly parsed member from (3) the given dictionary and then uses that property to update the record using (4) the given function. It returns another Result
which can be piped into another Decode.req
, and the game begins again. The result is one call to Decode.req
for each property of the Models.rawRollingCaseRate
I'm decoding in parseRollingCaseRate
.
In Conclusion
I hope this has been a useful introduction to my thinking on parsers to contextualize the code I will introduce in the coming posts. The next post will introduce some underlying utilities that will form the building blocks of our parsing library.