GREP is search tool on the command line. It searches through files and or standard input. That’s it, UNIX-style!
In order to search for a regular expression in the terminal
and print out the results one had to type /$SEARCHTERM/p.
To make this search global one then added the g flag
like so : g/$SEARCHTERM/p. This became such a regular
occurrence that someone created a smaller program that only performed
these global regular expressions. The name GREP merely
describes the search pattern : g/re/p (global/regular
expression/print).
A description of GREPS flags.
-i case insensitive search. Searching for ‘word’ also
matches ‘WORD’.-w whole word search. Searches for whole words so if
one searched for ‘is’ it would not return ‘this’.-A <N> prints the nth number of lines after the
match.-B <N> prints the nth number of lines before the
match.-C <N> prints the nth number of lines before and
after the match.-r recursive search to go through subdirectories. Be
wary!-v displays all the lines that do not match the
supplied search term.-e extended regular expression support.-c counts the number of matches and returns the
number.-l returns the file name only which contains the
match.-o only show the matched search term rather than the
whole line.-o -b returns the position of the matched text.-n print the line number for the returned match.GREP uses regular expressions to complete searches but let’s start with the wildcard.
>cat file
big
bad bug
bag
bigger
boogy
>grep b.g file
big
bad bug
bag
bigger‘Boogy’ doesn’t match in this case because the wildcard matches precisely one character.
We use the astrix to repetitions of characters. Here is the description of how it works :
the expression consisting of a character followed by a star matches any number (possibly zero) of repetitions of that character. In particular, the expression “.*” matches any string, and hence acts as a “wildcard”.
Observe :
>cat file
big
bad bug
bag
bigger
boogy
>grep "b.*g" file
big
bad bug
bag
bigger
boogy
>grep "b.*g."
bigger
boogy
>grep "ggg*" file
biggerThe repetition character does not behave as a wildcard in GREP and it matches zero or more characters. The pattern “g*” matches the string ““,”g”, “gg”, etc. Likewise, the pattern “gg*” matches “g”, “gg”, “ggg”, so “ggg*” matches “gg”, “ggg”, “gggg” and so on and so forth.
Wildcards are a start but the idea can be taken further. For example, suppose we want an expression that matches Frederic Smith or Fred Smith. In other words, the letters eric are optional.
First, we introduce the concept of an “escaped” character.
An escaped character is a character preceded by a backslash. The preceding backslash does one of the following: (a) removes an implied special meaning from a character (b) adds special meaning to a “non-special” character
To search for a line containing text hello.gif, the correct
command is grep 'hello\.gif' file. Since
grep 'hello.gif' file will match lines containing
“hello-gif”, “hello1gif”, “helloagif” , etc.
Now we move on to grouping expressions, in order to find a way of making an expression to match Fred or Frederic. First we start with the ? operator.
An expression of a character followed by an escaped question mark matches one or zero instances of that character.
bugg\?y matches all of the following : “bugy”, “buggy”
but not “bugggy”.We move onto “grouping” expressions. In our example, we
want to make the string “ederic” following “Fred” optional, we don’t
just want one optional character.
An expression surrounded by “escaped” parenthesis is treated by a single character.
Fred\(eric\)\? Smith matches “Fred Smith” or “Frederic
Smith”. \(abc\)* matches “abc”, “abcabcabc” etc. It’s worth
pointing out at this moment that we need to enclose the search term in
quotes so that the shell doesn’t misinterpret white spaces or stars. The
previous example would search for “Fred/eric” in the file “Smith”.
To match a selection of characters use [].
[Hh]ello matches lines containing “hello” or Hello”. Ranges
of characters are also permitted.
[0-3] is the same as [0123]
[a-k] is the same as [abcdefghijk]
[A-C] is the same as [ABC]
[A-Ca-k] is the same as [ABCabcdefghijk]
[[:alpha:]] is the same as [a-zA-Z]
[[:upper:]] is the same as [A-Z]
[[:lower:]] is the same as [a-z]
[[:digit:]] is the same as [0-9]
[[:alum::]] is the same as [0-9a-zA-z]
[[:space:]] matches any white space including tabsThe alternate forms are preferable to the direct methods. Also note that [] can be negated by inputing a caret^ as the first character.
grep "([^()]*)a" file returns any line containing a pair
of parenthesis that are innermost and are followed by the letter “a”. It
would match these lines :
(hello)a (asdfasdfasdf asdf ffasdfsdf)a
But not :
x=(y+2(x+1))a
In order to limit the number of repetitions to find in a pattern we
use curly braces {}. To search for a 7 digit phone number you
could try this :
grep "[[:digit::]]\{3\}[ -]\?[[:digit:]]\{4\}" file. This
will match any 3 numbers that are suceded by and optional whitespace or
hyphen and then a further 7 numbers.
So here’s what we want: we need a line of text with the word ‘hello’ preceded by some whitespace and nothing after it. Let’s look at a simple example :
>cat file
hello
hello world
hhello
>grep hello file
hello
hello world
hhellowWhat went wrong? GREP simply returned any lines with ‘hello’ in it. We need to be more specific to get what we want. > The $ character matches the end of the line. The ^ character matches the beginning of the line.
Let’s change the GREP command above to one which
will work. grep "^[[:space:]]*hello[[:space:]]*$" file will
return one line, based on the previous example, but would also return
‘hello’ without any whitespace at the start. Admittedly this is
confusing because it is made out that the whitespace at the start is
essential rather than optional (uses ‘*’).
grep "^From.*mscharmi" /var/spool/mail/elflord is another
example that searches the mail folder for headers from a specific
person. Surely one can see how this could be useful?
The expression consisting of two expressions separated by the or operator | matches lines containing either of those two expressions.
Nb. This must be enclosed within single or double quotes.
grep "cat\|dog" file matches the word ‘cat’ or ‘dog’.
grep "I am a \(cat\|dog\)" file matches lines containing
the string “I am a cat” and “I am a dog”.
How would one search for a certain substring that appears in more
than one place? An example is the heading tag in HTML. To search for all
heading tags, H1-6, could be written as
<H[1-6]>.*</H[1-6]> doesn’t work fully as we
might end up matching incorrectly paired headers. To match correctly
paired tags we need to use backreferences.
The expression where n is a number, matches the contents of the n’th set of parenthese in the expression.
<H\([1-6]\).*</H\1> matches what we were trying
to match before. The escaped ‘1’ after the second ‘H’ refers to the
first group of the pattern. Groups are defined by parenthesis and in
this case we have captured the number that sucedes the opening
‘H’ tag. We then reference that group in the closing tag.
Certain characters when used with GREP need to be escaped. It is also worth pointing out at this time that EGREP is a similar tool that utilises extended regular expressions, though they are no more functional than GNU GREP, and have a greater list of metacharacters that need escaping. The following characters need to be escaped :
? . [ ] ^ $
Single quotes are the safest to use as they protect the regular
expression from the shell. For example grep "!" file
will often produce and error as the shell thinks that “!” is referring
to the shell history command. On the other hand if one is want to use
shell variables in the search then it is necessary to use double quotes
like grep "$HOME" file. Should you try
grep '$HOME' file instead you will search file for the
string ‘$HOME’ rather then the variable value.
We previously mentioned the existence of of egrep that allows extended regular expression. Funnily enough egrep actually has less functionality as it is designed for compatibility with traditional egrep. A better way to run an extended GREP is to use the ‘-E’ flag.
| grep | grep -E | used in egrep? |
|---|---|---|
| a\+ | a+ | yes |
| a\? | a? | yes |
| expression1\|expression2 | expression1|expression2? | yes |
| \(expression1\) | (expression1) | yes |
| \{m,n\} | {m,n} | no |
| \{,n\} | {,n} | no |
| \{m,} | {m,} | no |
| \{m} | {m} | no |
GREP is usually first used to search through the
contents of their files. To find the file that contained the password to
another computer you could run grep password *.
The output will contain all files and all lines where the search term is found e.g.
notes : password for the system "bigvax" is "guest", remember to
notes : delete this message, as it is a bad idea to keep passwords
message : Do you know the password for bigvax? I forgot whatThe above example found two files that contained the term and one of those files contained it twice.
The previous search would only match the word password and
password exactly. To make the search case-insensitive use the
i flag like so grep -i password *.
GREP can be used on standard input to filter. File
names won’t be outputed as GREP won’t know what the
name is in this instance. Example:
cat document.txt | grep -i $SEARCHTERM. This example is
very useful but nonetheless it pipes the content of
document.txt into GREPand makes a
case-insensitive search for the search term.
GREP does not print the filename is one single
argument is specified. For example : grep password message
would output
Do you know the password for bigvax? I forgot what.
The output was not prefixed with the filename as it had been in the
earlier example. To make sure a filename is printed one must provide at
least two files to search. Why would you do this if you only want to
search one specific file? Well you could supply a file that is always
there and always empty. grep password message /dev/null
This command is convenient when writing shell scripts and you do not know haw many files you will be told to search. A simple example of such a script that prints the filenames with the results would be :
#!/bin/sh
grep -i $* /dev/nullA simple use of GREP is to remove lines that contain
a pattern. To remove all lines that contain the word “junk,” use the
-v option ” grep -v junk".
This is typically used as a filter :
grep -i password * | grep -v junk. Another example is to
eliminate excess lines. Suppose one wants to search for the word
“every,” but does not want “everyone,” “everybody,” or “everywhere.” The
following would suffice :
grep every * | grep -v one | grep -v body | grep -v where.
!! | grep -v ignoreThisWord : this command is handy as
you can repeat the last command and remove lines that contain certain
words.
find . -print | grep -v '.old$' | grep -v '[%~]$' : This
command searches for files but excludes backups and any other additional
terms to ignore.
Looking for certain terms can de difficult. How would one search for
‘-i’? We already know that ‘-i’ is an option one can supply to
GREP. When we run grep -i file
GREP will check for the term ‘file’ on standard input.
This means that nothing will happen until one presses
ctrl-d.
An Introduction to GREP by … UNIX and Linux : GREP by Elflord 15 Practical Grep Command Examples in Linux/Unix