POSTS

Extracting JSON Data With jq

Blog

I recently found a tool that has really improved my life: jq. jq can extract, restructure and change JSON data. Professional programmers often call this operation “munging,” “massaging,” or “frobnicating.” The rest of this post will take you through an example of some frobnication.

Example Scenario

Yesterday I had a 5.2MB JSON file. It was a captured result from a system that I don’t have SQL database console access to. So this output was my most reliable, structured starting point. Using jq, I was then able to change the data so that my team was able to address an operational challenge right now.

I had the input and the jq command, the next thing I needed was to program the transformation. Fortunately, jq’s manual is excellent.

Simple Example

Consider:

cat source |jq '.["batches"] | .[] | {iteration, id, stat_counts}

This means:

Read the JSON in source, extract the "batches" attribute’s value, an Array. For each element in that Array, extract out "iteration" (String), "id" (String), and "stat_counts" (Object) into a new Object with keys that match those values.

Typically I’d have to reach for Python, Ruby, or JavaScript to do this, but here it was on the command line. It was an auspicious start.

Meta-Functions

jq also provides meta functions. In the previous example, I extracted 3 attributes I wanted. But how did I know what attributes were available? One can use the keys function among a host of others to explore the JSON.

Here I pull the first Object of the value pointed to by the "batches" attribute and I ask it its keys.

cat source | jq '.["batches"][0] | keys'

Extended Functions

Building on the previous, let’s do something complex.

Find only the set for whom some criterion matches either of two regex patterns and set null values for one attribute ("students") to 0 instead of null.

cat source |jq '
  .["batches"] |
  .[] |
  {iteration, id, stat_counts} |
  select(
    .iteration |
      test(".*-prefix-[0-9]{4}20$"),
      test(".*-prefix-(10|11|12)[0-9]{2}19$")
  ) |
  {iteration, id, students_count: (.stat_counts["students"] | if . == null then 0 else . end)}'

Outputs:

{
  "iteration": "location-prefix-111819",
  "id": -100,
  "students_count": "Hidden, for demo"
}
{
  "iteration": "location-prefix-120919",
  "id": -200,
  "students_count": "Hidden, for demo"
}

Holy cow! That’s pretty powerful stuff.

It felt a bit like learning a programming language. Honestly, I wasn’t that enthused about learning a new domain specific language for processing JSON; however, after some time investment, I think you get a lot of bang for the buck. A huge part of that win is cutting out the Python and Ruby scaffolding component.

Additional Resources