Models of Computer Science's Science: Formal versus Empirical



On my walk home from work I often listen to the Ruby Rogues podcast. In their episode with Glenn Vanderburg on “Complexity,” there’s a fascinating discussion on the nature of science, and how that defines the modus operandi of “computer science.” Dave, a panelist, asserts that: “computer science is heaving with science, but its heaving with formal science as opposed to empirical science.” I believe development is “heaving” with “formal science” because of of an all-too-human reason: vanity. Developers esteem formal reasoning over empirical reasoning out of peer pressure to seem “smart” which is associated more with the formal style of reasoning. I’d like to look at why this is the case.

Formal Correctness Versus Empirical Correctness

Given a set of propositions, ideas, lemmas, rules, etc., one can manipulate the input data, via a series of actions and “prove” that the system is “correct.” Much like logical truth, more helpfully called consistency in logic, this “correct” is not the “correct” that tests or use by end-users illustrates. Rather, this consistency or correctness follows from the form of the argument and the definition of the atoms and their transformative processes. The computer scientist Knuth’s humorous statement is recalled: “Beware of bugs in the following code. I have merely proved it correct. I have not tested it.”

This is the kind of “computer science” that can be done with a pencil and paper. It looks a lot like math or symbolic logic ergo formal science. It feels old-school, punch card, ur-nerd, Rand Corp, programs in 16K of memory badass.

As mentioned above, “correct” for end-users has nothing to do with the aforementioned formal correctness per se but translates to “the app did what I expected to. It didn’t blow up.” Like an 8-year-old discovering the miracle of baking soda and vinegar, empirical correctness is based on “did that blow up in the way I expected or not?”

This is the kind of “computer science” that requires a compiler and a unit test framework. It benefits from more computer tooling that makes the feedback loop between failing test, passing test, beautiful code being sped up. It doesn’t look like that staid horn-rims and tie computer science, it looks like SoMa and people command-tabbing through screens like monkeys undergoing attention defeceit research.

Developers’ Esteeming Formal Versus Empirical Correctness

I find that developers seem to have a natural, per-person bias towards empirical or formal testing of their code and that there is a privileged association with formal reasoning about code. Dave, speaking of his clients said:

Yeah. I’m actually facing with multiple clients recently this resistance to [empirical] testing. And there’s just this innate need to think about code and to logic your way to the facts.

Dave, I believe, used “innate” hyperbolically, I believe he really meant that “developers see empirical testing as less prestigious, B-team and don’t want to admit that they could benefit from it.”

Let’s imagine a bug crops up in a system, who wouldn’t like be the formal adept in this scenario:

You approach debugging from a mathematical point of view: proceeding from inputs what must be the case. You look at the code and fold you hands and, with the help of a single console-logging statement or debugger statement (or two) you find where in the logical exercise that things go the wrong way. You edit an if/else and the bug disappears.

Who wouldn’t like to be thought of like that? It makes you sound wicked smart. Alternatively:

You look at the bug report and see that the inputs were X, Y and 2. You know that the code blew up at line 22 so you put a debugger on line 21. You run the code with those inputs and look around the system. You manually make the deadly code on line 22 run, you evaluate its results. You see the bug. You write a test case to run the deadly input set. You run the test over and over (automatedly) while you fix the bug. You see the bug is fixed (via the test) you check in your test and your patched code and move on.

Doesn’t this sound less ivory tower, less ethereal, more plebeian? So if a consultant came in and told you in a meeting with your peers that your team should embrace automated testing would you be the one who would be brave enough to break the culture rank to say “Y’know, that’s a good idea” when the implicit cultural tenet is: “The best of us are smart enough to always logic our way through?”

In computer science, full of its math envy as it is, we want to look like those early mathematicians turned computer scientist like von Neumann, Turing and Adm. Hopper - formalists.

Out of fear of appearing like the weak one (lower peer/boss review, less pay, less respect, less prestige), the emperically-biased programmer stifles her dissent (something humans have a hard time with anyway) and presents, if not feigns, dislike of empirical methods. Perhaps this behavior is so universal that it seems “innate,” as Dave related.

It’s vanity and peer-pressure: the same forces behind most bad decisions in human history.


I think that all developers should be strong in both approaches: As an empiricist, I’ve worked hard to get better at formal evaluation and at writing code that’s easier to reason about so that I don’t have to step through it or make it blow up several times to figuring it out (empirical methods). However, I think we should all have the humility to admit when an empirical test helps figure out the bug.

As Kernighan said in “Unix for Beginners” (1979):

The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.

Even the masters need visual confirmation sometimes!

Coda: Honest Talk

I feel like I would be dishonest if I were to not explain my own background on this question. In part I hope that if you read the previous section and thought “Wow, thanks for saying that” you might enjoy my being honest with you about how I mix the two disciplines. Here it is:

I didn’t come to development via computer science but by a mongrel polymath background so it’s less (just barely) embarrassing for me to say that I like empirical debugging. It feels good to me to make stuff blow up on command and then to know that when I’m done I’ve left a fix and a test behind, a check on other things blowing up in the future if someone else changes a part of the system. I like the emergent phenomenon that several small tests, in aggregate, might serve to cover more uses and possible error states than we had thought of by following the formal structure of the code.

But that comes with a catch: if you look at code and errors this way you may cover only one case of a whole variety of bug cases, you’re only being guided by the latest disaster. Eventually you’ll stop the bleeding but by layering a thousand bandages versus one perfectly placed suture. What to do? Here are the steps I follow:

Practical Debugging Steps

I found an interesting discussion on this topic by Jeff Vroom who said use the one to focus the other (formal to empirical and vice versa). I think this is good advice. Here’s what I, an empirically-biased developer, try to do to engage that “enhance the other” virtuous cycle:

  1. Something blew up
  2. Where?
  3. Generate a test case where those same inputs cause the blow up to happen. We future-proof against this error. Just as I don’t need accessibility features (big print, ramps, handicap door openers) I’m glad they’re there when I’m tired or my hands are full. Even the most brilliant formal thinkers will eventually be thankful for your empirical test
  4. Why did it blow up? I switch to formal analysis here
  5. Which of the inputs was invalid?
  6. How did it get set incorrectly?
  7. If you have code that adheres to the single responsibility principle and is nicely structured so that it has small parts you should be able to reason through the process easily
  8. If you need to, add a logger or debugger breakpoint to “get” the failure.
  9. If this code is hard to reason about promise to make it easier to reason about after you patch this bug
  10. Fix the failure
  11. Switch back to empiricism: is the test passing?
  12. Fulfill the promise: was it hard to reason about? Can we write the code in a way such that things are easier to reason about? This is refactoring. There are many good books on the topic and you only get better by practice
  13. Commit and move on

These are the steps by which I’m trying to make complexity more graspable for humans of all reasoning biases more quickly. Sometimes I do better than others, but there’s where I’m at today.