Grep is one among the system administrator’s “Swiss Army knife” set of tools, and is extremely useful to search for strings and patterns in a group of files, or even sub-folders. This article introduces the basics of Grep, provides examples of advanced use and links you to further reading.
Grep (an acronym for “Global Regular Expression Print”) is installed by default on almost every distribution of Linux, BSD and UNIX, and is even available for Windows. GNU and the Free Software Foundation distribute Grep as part of their suite of open source tools. This tutorial focuses primarily on this GNU version, as it is currently the most widely used.
Grep finds a string in a given file or input, quickly and efficiently. While most everyday uses of the command are simple, there are a variety of more advanced uses that most people don’t know about — including regular expressions and more, which can become quite complicated.
The tool has its roots in an extended regular expression syntax that was added to UNIX after Ken Thompson’s original regular expression implementation. The latter searches for any of a list of fixed strings, using the Aho-Corasick algorithm. These variants are embodied in most modern Grep implementations as command-line switches (and standardised as -E
and -F
in POSIX.2). In such combined implementations, Grep may also behave differently depending on the name by which it is invoked, allowing fGrep, eGrep, and Grep to be links to the same program.
There are two ways to provide input to Grep, each with its own particular uses. First, Grep can be used to search a given file or files on a system (including a recursive search through sub-folders). Grep also accepts inputs (usually via a pipe) from another command or series of commands.
A regular expression, often shortened to “regex” or “regexp”, is a way of specifying a pattern (a particular set of characters or words) in text that can be applied to variable inputs to find all occurrences that match the pattern. Regexes enhance the ability to meaningfully process text content, especially when combined with other commands.
Usually, regular expressions are included in the Grep command in the following format:
GNU Grep uses the GNU version of regular expressions, which is very similar (but not identical) to POSIX regular expressions. In fact, most varieties of regular expressions are quite similar, but have differences in escapes, meta-characters, or special operators.
GNU Grep has two regular expression feature sets: Basic and Extended. In basic regular expressions, the meta-characters ?
, +
, {
, |
, (
, and )
lose their special meaning (whose uses are described later in this article). As mentioned below, to switch to using extended regular expressions, you need to add the option -E
to the grep
command.
It is customary to enclose the regular expression in single quotation marks, to prevent the shell (Bash or others) from trying to interpret and expand the expression before launching the grep
process. For example, if a pair of back-ticks in the regexp is not quoted, it would result in the text between the back-ticks being executed as a Bash sub-process — and if this happens to be a valid command, the text returned by it takes the regular expression’s place in the command-line parameters given to Grep! Not at all what we want.
Again, due to shell behaviour, you can also enclose the regex in double quotes — in this case, you can use environment variables in the regex, and the shell will substitute them before calling Grep. This can be very useful, depending on what you’re trying to do — or it could turn out to be a nuisance. Remember the difference in behaviour.
Now let’s go on to some practical examples of using Grep. To better understand the results, I’ve created a simple text file on which we will run our Grep searches; the file contains the following lines:
Case-insensitive search (grep -i
):
As you can see, the -i
flag causes a search for “abcd” to return matches that have different cases for the characters from what the search string does.
Whole-word search (grep -w
):
This type of search only returns lines where the sought-for string is a whole word and not part of a larger word.
Recursively search through sub-folders (grep -r <pattern> <path>
):
Inverted search (grep -v
):
This prints all the lines in the file, except the line that contains the word “practical”.
An interesting relative is the -L
flag (you can also use --files-without-match
), which outputs the names of files that do NOT contain matches for your search pattern. The matches for your search pattern are not themselves printed, only the names are.
The “opposite” flag to -L
is -l
or --files-with-matches
, which prints out (only) the names of files that do contain matches for your search pattern.
Print additional (trailing) context lines after match (grep -A <NUM>
):
For each line that matches the search, Grep prints the matching line, as well as the next one line after the match. Varying the number provided to -A
changes the number of additional lines that are in the output.
Print additional (leading) context lines before match (grep -B <NUM>
):
Print additional (leading and trailing) context lines before and after the match (grep -C <NUM>
):
As you can see, this has printed out two lines before and after the single match found in the file; if there are multiple matches, Grep inserts a line containing --
between each group of lines (each match and its context lines).
Print the filename for each match (grep -H <pattern> filename
):
Now, let’s run the search a bit differently:
When the stream that Grep is asked to search is passed to its standard input via a pipe from a previous command in the chain, grep -H
displays (standard input) as the filename.
Run in “quiet” mode (grep -q
): When run with this flag, Grep does not write anything to standard output, but sets its return value (also known as exit status) to reflect whether a match was found or not. This option is mainly used in scripts that need to check if a given file contains a particular match. A return status of 0 (zero) indicates that a match was found; 1 indicates that no match was found.
In the search above, .
is used to match any single character — which is why it matches “car” in “carry”. Grep has a powerful regular expression matching engine, which we can’t hope to cover in depth here, but we will include a few important points:
As you can see, preceding .
with a backslash has removed its significance as a meta-character.
A regular expression may be followed by one of several repetition operators:
However, the repetition operators are part of GNU Grep’s extended regular expression syntax, so to use these effectively, remember to add the -E
option to your command.
Read this tutorial for an introduction to more of Grep regular expression features. For more information on regular expression syntax, refer to the Regular Expressions chapter in the Grep manual. Meanwhile, we will present some examples of regular expressions and try to show how they work.
The “character class” tool is one of the more flexible and often-used features of regular expressions. There are two basic ways to use character classes: to specify a list of characters (for example, [aeiou]
is a list of vowel characters), or a range (like [m-t]
, which expands to [mnopqrst]
). Ranges are a convenience that saves having to type an entire sequence of characters. A character class can also include a list of special characters, but they can’t be used as a range.
A single character class instance will match only one character; to match multiple occurrences of the class, you would need to add a repetition operator, like those mentioned above. For example, to find an eleven-letter string comprising only lower-case alphabets, the regex would be: [a-z]{11}
. As mentioned earlier, to use the repetition operators, we need to add the option -E
. Let’s run this on our test file:
Here, “expressions” is the only all-lowercase 11-character string in the file; so this is the only line printed as the output.
There are quite a few character classes that are very commonly used in regular expressions, and these are provided as named classes. For example, the [a-z]
class of lower-case alphabets that we used above, has the named class [:lower:]
. Naturally, [:upper:]
is upper-case letters A to Z, and [:alpha:]
is all alphabetic characters, equivalent to [:lower:]
plus [:upper:]
. [:digit:]
is the digits 0 to 9, and [:alnum:]
is alphanumeric characters — a combination of [:alpha:]
and [:digit:]
. The Grep manual lists out more of these named classes.
When a carat (^
) is used as the first character in a character class, it is a negation of the class, effectively meaning, “none of these characters”.
The ^
anchor specifies that the pattern following it should be at the start of the line:
The $
anchor specifies that the pattern before it should be at the end of the line.
The operator <
anchors the pattern to the start of a word.
Similarly, >
anchors the pattern to the end of a word.
The b
(word boundary) anchor can be used in place of <
and >
to signify the beginning or end of a word:
Finally, we look at the |
(alternation) operator, which is part of the extended regex features. A pattern containing this operator separately matches the parts on either side of it; if either one is found, the line containing it is a match. The parts can themselves be complex regular expressions, so this means you can check each line in a file for multiple search patterns in one pass.
That was pretty simple; so let’s try a more complicated one. Can you reason out why the output lines for this regex are as shown below?
As mentioned earlier, if you don’t single-quote the pattern passed to Grep, the shell could perform shell expansion on the pattern and actually feed a changed pattern to Grep. This can also be done intentionally, when you need it — let’s look at a few examples.
Here, we intentionally use double quotes to make the Bash shell replace the environment variable $HOME
with the actual value of the variable (in this case, /root
). Thus, Grep searches the /etc/passwd
file for the text /root
, yielding the two lines that match.
Here, back-tick expansion is done by the shell, replacing `whoami`
with the user name (root) that is returned by the whoami
command.
Well, we hope this has set you on your way to using this very efficient tool.
Thank you for this tutorial. It cleared lot of basic doubts about grep from my brain.
Thanks. There’s a lot I could learn from this. 😉
Thanks. This is very helpful information.
Mast likhaa hai yaar!
Answer to the “Can you reason” script:
Expression “^[t-z]+” prints lines having one or more occurrences of any character from t to z at the beginning.
Expression “[^a-z]+$” prints lines NOT having one or more occurrences of any character from a to z at the end.
The overall script prints the lines which satisfy either or both of the above command expressions.
Therefore, lines
“this” and
“to carry out few regular expressions”
satisfy the first command expression, while lines
“123 456”
“ABCD”
satisfy the second command expression.
[…] 1. Grep in Depth 2. Open Source For U 3. GNU Grep […]
Please explain this command “grep -a –null-data U-Boot u-boot.img ” What does U-Boot stand for here, is it a file name, an alias ? u-boot.img is a file I can see it .
Thank you so much for this. Learned a lot.
Great tutorial on Grep, thanks.