POSTS

Use ack instead of grep to parse text files

Blog

You know this guy?

grep needle haystack |grep special_needle

or his inverted cousin:

grep needle haystack |grep -v unspecial_needle

These are common staples of searching through text files.

You should stop using them. Now.

You can do much better by writing more consice regular expressions and using ack or one of its relatives (ack-grep, or rak).

The primary virtue of these commands is that they use the Perl regular expression engine. Most programmers with experience in any of the major scripting languages will find this more comfortable than grep’s use of the GNU regex syntax.

I recently encountered a need to search through many files based on a complex regular expression that required lookahead and lookbehind asserions. I have no idea how that would work in GNU regex land where, honestly, I still have a hard time getting simple capture and alternation. After learning look around syntax, I was glad to know that ack could directly implement it.

Given haystack, a text file:

tin needle
silver needle
lead needle
ocelot
monkey

Find the needles (this is where most grep users get to and never leave):

$ ack needle haystack
tin needle
silver needle
lead needle

Look at that one character shorter than grep and just as easy. Now if you want the silver needle, the unsophisticated, greppy way of doing this would be:

$ grep needle haystack|grep silver
silver needle

This sucks. Try:

$ ack '(?=silver).*needle' haystack
silver needle

Or, “all things in the haystack that are needles, but not the lead one”

$ ack '^(?!^lead).*needle.*$' haystack
tin needle
silver needle

I know there’s a lot more to the power of the lookaround assertion, but if I can re-train myself out of this habit I think it’ll be a big win. Granted, for smaller searches the “double-grep” method is probably fine, but any time you’re doing a recursive descent and are looking for true needles in the haystack, ack’s is the superior approach.

The case that I was working on was where I needed to search all my Rails models for all named scopes that did not use the lambda form and I wanted five lines of context around the matches so that I could be understand the behavior.

ack -C5 'scope(?!.*lambda)' app/models