POSTS

Using dot to Help Visualize Data Structures

Blog

Recently I was practicing building an implementation of a binary search tree (BST), an important data structure often discussed in interviews, in JavaScript. While iterating, I found the ability to draw pretty pictures (“directed graphs”) of the structure helpful. This became even more important when the data collection got big or when the operation got complex e.g. deletion of an interior node with re-homing of “orphans”.

In this post, I’ll show how to get started with visualizing complex structures with directed graphs. At the end of this post, you’ll see how I created a BST from a shuffled Array of Integers and displayed its structure as an image file using Graphviz. While we’ll be working with a BST, this technique is helpful for many data structures, and the odd diagram needed to communicate with your team. Along the way, we’ll see how the approach we’re taking aligns with “The Unix Philosophy.”

With this knowledge, instead of inspecting a BST like this with console.log():

{
  root: Node {
    value: 33,
    left: Node { value: 3, left: null, right: null },
    right: Node { value: 72, left: null, right: null }
  }
}

we can look at:

Directed graph of a balanced search tree of the integers 1 through 100

That seems easier to reason about to me!

The Unix Philosophy

The Unix Philosophy” is some timeless wisdom. It has literally never steered me wrong. The majority of design patterns, the current vogue around functional concepts, strategies for not getting project deliverables wrong all have, at their heart, “The Unix Philosophy.” Written by Doug McIlroy, these four programming admonitions save lives, save time, and unleash potential. I’ve added small annotations in brackets. I admit I am standing on the shoulders of giants.

  1. Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”. [This is currently having a renaissance in the resurgence of functional-style programming — composition of simple, easily-tested pure functions with no mutated state is vastly superior to complex decision code.]
  2. Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input. [This is often seen in the SOLID OOP design maxim: “Open for extension, closed for modification.” Code should be willing to hand-off and share with other code.]
  3. Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them. [Amazing! McIlroy prefigures Agile by about 25 years and says it in one sentence.]
  4. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them. [Amazing! Don’t throw people at problems, automate for consistent debug cycles. This drives the convention over configuration decisions that we saw in Rails and which the React ecosystem has adopted. It’s also the heart of TDD.]

Source Wikipedia

“The Unix Philosophy” helps us work fast-and-dirty by building on each others’ programs. It encourages “falling forward” and rapid iteration cycles. Discussing how this philosophy should be part of your daily command-line approach is beyond the scope of this post, but is a worthy use of time.

By the logic of “The Unix Philosophy,” then:

If I were to find a program that creates pretty visuals of graph data structures from ASCII, if I could get my program to generate the proper ASCII output, then I could get a pretty image of my tree. I’d contrast this to “The Non-Unix Philosophy” of “I need to download a graphics library and put that in with my BST implementation.” The first approach encourages collaboration, modularity, letting other experts buoy you up. I opted for that one.

Fortunately, such a graphics-generating program already exists, dot on OSX provided by the Graphviz package available via Homebrew.

dot

$ ls -l `which dot`
lrwxr-xr-x  1 stevenharms  admin  33 Nov  9 10:31 /usr/local/bin/dot -> ../Cellar/graphviz/2.42.2/bin/dot

The dot program has a ton of options, but let’s start with the a typical use invocation:

A typical use of dot looks like:

dot -Ttype -oOutputFile inputfile

Common type values are pdf or png. The OutputFile name should accord with the -t you choose.

Here’s how I generated that simple DAG from earlier:

dot -Tpng -o simple.png simple.dot

But wait, what’s in simple.dot? Well, amazingly enough, it’s just a plaintext file where nodes’ values point to each other with simple “skinny arrows” (->).

Digraph G{
  "2" -> "1"
  "2" -> "3"
}

In accordance with the Unix Philosophy, all I needed is for my_program to generate some plain text that I could hand to dot to convert to a PDF or PNG for me.

It’s beyond the scope of this post to discuss all the features of the dot source-file syntax, but these are the resources I have thoroughly understood hackishly copy-and-garble-in-hopes-it-works.

  1. https://graphviz.gitlab.io/_pages/pdf/dotguide.pdf
  2. https://graphviz.herokuapp.com
  3. https://renenyffenegger.ch/notes/tools/Graphviz/examples/index
  4. https://www.graphviz.org/pdf/dotguide.pdf
  5. https://graphs.grevian.org/example

Build a Non-Trivial DAG

We know that we want to generate a DOT source file. I’m not including the code for the BST (try it out yourself!), but I do need to explain two methods of its API:

  1. It features a method called insert() that takes a value, build it into a Node, and then positions that Node according to BST rules
  2. It features a method called printBFS() that uses breadth-first-search to print out all the nodes in the BST in dot source-file-compatible output
Code Showing Shuffle and BST Load
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
'use strict';

const BST = require('./bst')

const shuffle = (src, dest = []) => {
  while ( src.length > 0 ) {
    const randInt = Math.trunc( Math.random() * src.length )
    dest = [ ...dest, ...src.splice( randInt, 1 ) ]
  }
  return dest
}

const myRange = (lowNumber = 0 , highNumber = 1, doInclusive = true) => {
  const nums = Array.from( Array(highNumber + ( doInclusive ? 1 : 0) - lowNumber).keys() )
  return lowNumber === 0 ? nums : nums.map( i => i + lowNumber )
}

const shuffledRange = (x, y, ...args) => shuffle(myRange(x, y, ...args))

const tree = new BST()
shuffledRange(1, 100).forEach( i => tree.insert(i) )
tree.printBFS()

Which outputs:

Digraph G{
  "32" -> "13"
  "32" -> "61"
  "13" -> "8"
  ...
}

We save this output to a file by means of the redirection operator (>) and then use that file as input to the dot command. Once the dot artifact is created, we open that with the Preview application.

node binary_search_tree/runner.js > simple.dot && dot -Tpng -o simple.png simple.dot && open -a "Preview" simple.png

Protip: Consider binding mini-scriptlets like these to keybindings in your editor. This is easily done in vim with remap! and VSCode with the tasks.json file.

Preview opens up something like this:

Directed graph of a balanced search tree of the integers 1 through 100

Conclusion

In this post we’ve learned (or reviewed) “The Unix Philosophy,” an approach for getting complex software collaborating in order to save us work. We’ve built a program in JavaScript that builds a dot-compatible text source file which can be given to dot to create beautiful directed graphs. In the next lesson in this series, we’ll make our DAG source files richer so that we can visually confirm that our BST operation of node-deletion worked.

As a preview of the future, let’s suppose we’ve decided to delete 77 and want to see what happens to its children in another graph. We might start with something like this:

Delete preview DAG