POSTS
Extracting JSON Data With jq
BlogI recently found a tool that has really improved my life: jq
. jq
can
extract, restructure and change JSON data. Professional programmers often call
this operation “munging,” “massaging,” or “frobnicating.” The rest of this post
will take you through an example of some frobnication.
Example Scenario
Yesterday I had a 5.2MB JSON file. It was a captured result from a system that I
don’t have SQL database console access to. So this output was my most reliable,
structured starting point. Using jq
, I was then able to change the data so
that my team was able to address an operational challenge right now.
I had the input and the jq
command, the next thing I needed was to program the
transformation. Fortunately, jq
’s manual is excellent.
Simple Example
Consider:
cat source |jq '.["batches"] | .[] | {iteration, id, stat_counts}
This means:
Read the JSON in
source
, extract the"batches"
attribute’s value, anArray
. For each element in thatArray,
extract out"iteration"
(String
),"id"
(String
), and"stat_counts"
(Object
) into a newObject
with keys that match those values.
Typically I’d have to reach for Python, Ruby, or JavaScript to do this, but here it was on the command line. It was an auspicious start.
Meta-Functions
jq
also provides meta functions. In the previous example, I extracted 3
attributes I wanted. But how did I know what attributes were available? One can
use the keys
function among a host of others to explore the JSON.
Here I pull the first Object
of the value pointed to by the "batches"
attribute and I ask it its keys.
cat source | jq '.["batches"][0] | keys'
Extended Functions
Building on the previous, let’s do something complex.
Find only the set for whom some criterion matches either of two regex patterns and set null values for one attribute (
"students"
) to0
instead ofnull
.
cat source |jq '
.["batches"] |
.[] |
{iteration, id, stat_counts} |
select(
.iteration |
test(".*-prefix-[0-9]{4}20$"),
test(".*-prefix-(10|11|12)[0-9]{2}19$")
) |
{iteration, id, students_count: (.stat_counts["students"] | if . == null then 0 else . end)}'
Outputs:
{
"iteration": "location-prefix-111819",
"id": -100,
"students_count": "Hidden, for demo"
}
{
"iteration": "location-prefix-120919",
"id": -200,
"students_count": "Hidden, for demo"
}
Holy cow! That’s pretty powerful stuff.
It felt a bit like learning a programming language. Honestly, I wasn’t that enthused about learning a new domain specific language for processing JSON; however, after some time investment, I think you get a lot of bang for the buck. A huge part of that win is cutting out the Python and Ruby scaffolding component.