Examples of Regular Expressions
- Atomic regexp:
- any non-special character matches exactly same character
- a dot “
.
” matches any one character
- ”
.
” → «E
»
- ”
.
” → «:
»
- ”
.
” → «.
»
- a set of characters matches any character from the set:
- ”
[quack!]
” → «a
»
- ”
[quack!]
” → «!
»
- ”
[a-z]
” → «q
» (any small letter)
- ”
[a-z]
” → «z
» (any small letter)
- ”
[a-fA-F0-9]
” → «f
» (any hexadecimal digit)
- ”
[a-fA-F0-9]
” → «D
» (any hexadecimal digit)
- ”
[abcdefABCDEF0-9]
” → «4
» (any hexadecimal digit)
- a negative set of characters matches any character not from the set:
- ”
[^quack!]
” → «r
»
- ”
[^quack!]
” → «#
»
- ”
[^quack!]
” → «A
»
- any atomic regexp followed by “
*
” repeater matches a continuous sequence of substrings,
including empty sequence, each matched by the regexp
- “
a*
” → «aaa
»
- “
a*
” → «``»
- “
a*
” → «a
»
- ”
[0-9]*
” → «7
»
- ”
[0-9]*
” → «``»
- ”
[0-9]*
” → «1231234
»
- ”
.*
” → any string!
- any complex regexp enclosed by special grouping parenthesis “
\(
” and “\)
” (see below)
- Complex regexp
- A sequence of atomic regexps
- Matches a continuous sequence of substrings, each matched by corresponded atomic regexp
- “
boo
” → «boo
»
- “
r....e
” → «riddle
»
- “
r....e
” → «r re e
»
- ”
[0-9][0-9]*
” → any non-negative integer
- ”
[A-Za-z_][A-Za-z0-9]*
” → C identifier (alphanumeric sequence with «_
», not started from digit)
- grouping parenthesis can be used for repeating complex regexp:
- ”
\([A-Z][a-z]\)*
” → «ReGeXp
»
- ”
\([A-Z][a-z]\)*
” → «``»
- ”
\([A-Z][a-z]\)*
” → «Oi
»
- Implies leftmost longest rule (aka «greedy»):
In successful match of complex regexp leftmost atomic regexp takes longest possible match,
second leftmost atomic regexp takes longest match that possible in current condition; and so on
- ”
.*.*
” → all the string leftmost, empty string next
- ”
[a-z]*[0-9]*[a-z0-9]*
” → «123b0c0
»
- ”
[a-z]*
” → «»
- ”
[0-9]*
” → «123
»
- ”
[a-z0-9]*
” → «b0c0
»
- ”
[a-d]*[c-f]*[d-h]*
” → «abcdefgh
»
- ”
[a-d]*
” → «abcd
»
- ”
[c-f]*
” → «ef
»
- ”
[d-h]*
” → «gh
»
- Positioning mark
- ”
^regexp
” matches only substrings located at the beginning of the line
- “
regexp$
” matches only substrings located at the end of line