The grep Command
The grep command searches text files looking for strings that match the search patterns you provide on the command line. The power of grep lies in its use of regular expressions. These let you describe what you’re looking for, rather than have to explicitly define it.
The birth of grep pre-dates Linux. it was developed in the early 1970s on Unix. It takes its name from the g/re/p key sequence in the ed line editor (incidentally, pronounced “ee-dee”). This stood for global, regular express search, print matching lines.
grep is famously—perhaps, notoriously—thorough and single-minded. Sometimes it’ll search files or directories you’d rather it didn’t waste its time on, because the results can leave you unable to see the wood for the trees.
Of course, there are ways to reign grep in. You can tell it to ignore patterns, files, and directories so that grep completes its searches faster, and you’re not swamped with meaningless false positives.
Excluding Patterns
To search with grep you can pipe input to it from some other process such as cat , or you can provide a filename as the last command line parameter.
We’re using a short file that contains the text of the poem Jabberwocky, by Lewis Carroll. In these two examples, we’re searching for lines that match the search term “Jabberwock.”
The lines that contain matches to the search clue are listed for us, with the matching element in each line highlighted in red. That’s straightforward searching. But what if we want to exclude lines that contain the word “Jabberwock” and print the rest?
We can accomplish that with the -v (invert match) option. This lists the lines that don’t match the search term.
The lines that don’t contain “Jabberwock” are listed to the terminal window.
We can exclude as many terms as we wish. Let’s filter out any lines that contain “Jabberwock” and any lines that contain “and.” To achieve this we’ll use the -e (expression) option. We need to use it for each search pattern we’re using.
There’s a corresponding drop in the number of lines in the output.
If we use the -E (extended regexes) option, we can combine the search patterns with “|“, which in this context doesn’t indicate a pipe, it’s the logical OR operator.
We get exactly the same output as we did with the previous, longer-winded command.
The format of the command is the same if you want to use a regex pattern instead of an explicit search clue. This command will exclude all lines that start with any letter in the set of “ACHT.”
To see lines that contain a pattern but which also don’t contain another pattern, we can pipe grep into grep . We’ll search for all lines that contain the word “Jabberwock” and then filter out any lines that also contain the word “slain.”
Excluding Files
We can ask grep to look for a string or pattern in a collection of files. You could list each file on the command line, but with many files that approach doesn’t scale.
Note that the name of the file containing the matching line is displayed at the start of each line of output.
To reduce typing we can use wildcards. But that can be counterintuitive. This appears to work.
However, in this directory there are other TXT files, with nothing to do with the poem. If we search for the word “sword” with the same command structure, we get a lot of false positives.
The results we want are masked by the deluge of false results from the other files that have the TXT extension.
The word “vorpal” didn’t match anything, but “sword” is included in the word “password” so it was found many times in some pseudo-logfiles.
We need to exclude these files. To do that we’ll use the –exclude option. To exclude a single file called “vol-log-1.txt” we’d use this command:
In this instance, we want to exclude multiple log files with names that start with “vol.” The syntax we need is:
When we use the -R (dereference-recursive) option grep will search entire directory trees for us. By default, it will search through all files in those locations. There may well be multiple types of files we wish to exclude.
Beneath the current directory on this test machine, there are nested directories containing logfiles, CSV files, and MD files. These are all types of text files that we want to exclude. We could use an –exclude option for each file type, but we can achieve what we want more efficiently by grouping the file types.
This command excludes all files that have CSV or MD extensions, and all TXT files whose names start with either “vol” or “log.”
Excluding Directories
If the files we want to ignore are contained in directories and there are no files in those directories that we do want to search, we can exclude those entire directories.
The concept is very similar to that of excluding files, except we use the –exclude-dir option and name the directories to ignore.
We’ve excluded the “backup” directory, but we’re still searching through another directory called “backup2.”
It’ll come as no surprise that we can use the –exclude-dir option multiple times in a single command. Note that the path to excluded directories should be given relative to the directory the search will start in. Don’t use the absolute path from the root of the file system.
We can use groupings too. We can achieve the same thing more succinctly with:
You can combine file and directory exclusions in the same command. If you want to exclude all files from a directory and exclude certain file types from the directories that are searched, use this syntax:
Sometimes It’s What You Leave Out
Sometimes with grep it can feel like you’re trying to find a needle in a haystack. it makes a big difference to remove the haystack.
RELATED: How to Use Regular Expressions (regexes) on Linux