Regular expressions
A regular expression provides a compact description of a set [of strings],
without having to list all elements. There are several versions of regexp:
basic (BRE), extended (ERE), perl compatible
(PCRE) and other [less important] versions related to misc
programming languages and applications. In BRE metacharacters
?
, +
, {
, |
,
(
, )
lose their special meaning, instead the
backslashed versions should be used:
\?
, \+
, \{
, \|
,
\(
, and \)
. See reference table below.
Anchors
^ | beginning of line; |
$ | end of line; |
< | left word boundary; |
> | right word boundary; |
\b | word boundary; |
\B | not a word boundary; |
Quantifiers
? | 0 or 1; |
* | 0, 1 or more; |
+ | 1 or more; |
{ n} | exactly n; |
{ n,} | n or more; |
{, m} | at most m; |
{ n, m} | at least n, but no more than m; |
By default ?
, *
, +
are greedy
quantifiers (i.e., they match as much as possible). To make them lazy
(matching as little as possible) add ?
(??
,
*?
, +?
).
Alternation
| | separates alternative patterns to be matched; |
Most chars are treated as literals (they match only themselves). Any metachar with special meaning may be quoted by preceding it with a backslash.
Matches (Unix-style)
. | any single character; |
[ ...] | any single character contained within [ ] ; |
[^ ...] | any single character not contained within [ ] ; |
\d | any single digit; |
\D | any single non-digit; |
\s | any single whitespace (space, \t , \v , \n , \r , \f ); |
\S | any single non-whitespace; |
\w | any single alphanumeric character; |
\W | any single non-alphanumeric character; |
\c | control character (example: \c[ matches Esc); |
\n | newline; |
\r | carriadge return; |
\t | Tab (horizontal Tab); |
\v | vertical Tab; |
\( \) | define a marked subexpression; |
\ n | where n is a digit (1..9); matches what the nth marked subexpression matched; |
Matches (POSIX-style)
[:alnum:] | any alphanumeric character ([0-9A-Za-z] ); |
[:alpha:] | any alpha character ([A-Za-z] ); |
[:blank:] | space or Tab; |
[:ctrl:] | any control character; |
[:digit:] | any digit ([0-9] ); |
[:graph:] | any pseudographic character; |
[:lower:] | any lowercase character; |
[:print:] | any printable character; |
[:punct:] | any punctuation character; |
[:space:] | any whitespace (space, \t , \n , \r , \f , \v ); |
[:upper:] | any uppercase character; |
[:xdigit:] | any hexadecimal digit; |
Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole subexpression may be enclosed in parentheses to override these rules.
. |
[ ] |
^ |
$ |
\( \) |
\{ \} |
? |
+ |
| |
( ) | |
awk |
x | x | x | x | x | x | x | x | grep |
x | x | x | x | x | x | egrep |
x | x | x | x | x | x | x | x | x | fgrep |
x | x | x | x | x | sed |
x | x | x | x | x | x | perl |
x | x | x | x | x | x | x | x | x | vi |
x | x | x | x | x |
Examples
/^$/ | an empty line; |
/./ | a line with at least one char; |
/^/ | all lines; |
/thing/ | thing somewhere in the line; |
/^thing/ | thing at the beginning of the line; |
/thing$/ | thing at the end of the line; |
/^thing$/ | a line consisting of thing only; |
/thing.$/ | thing plus some other chars; |
/thing\.$/ | thing. at the end of the line; |
/\/thing\// | /thing/ somewhere in the line; |
/[tT]hing/ | thing or Thing; |
/thing[0-9]/ | thing followed by one digit; |
/thing[^0-9]/ | thing followed by a non-digit; |
tele(f|ph)one | telefone or telephone; |
/thing[0-9][^0-9]/
thing followed by digit and non-digit;
/thing1.*thing2/
thing1 followed by some chars, then thing2;
/^thing1.*thing2$/
thing1 at the beginning, thing2 at the end;