Examples of Regular Expressions
- Atomic regexp:
- any non-special character matches exactly same character
- a dot “
.” matches any one character
- ”
.” → «E»
- ”
.” → «:»
- ”
.” → «.»
- a set of characters matches any character from the set:
- ”
[quack!]” → «a»
- ”
[quack!]” → «!»
- ”
[a-z]” → «q» (any small letter)
- ”
[a-z]” → «z» (any small letter)
- ”
[a-fA-F0-9]” → «f» (any hexadecimal digit)
- ”
[a-fA-F0-9]” → «D» (any hexadecimal digit)
- ”
[abcdefABCDEF0-9]” → «4» (any hexadecimal digit)
- a negative set of characters matches any character not from the set:
- ”
[^quack!]” → «r»
- ”
[^quack!]” → «#»
- ”
[^quack!]” → «A»
- any atomic regexp followed by “
*” repeater matches a continuous sequence of substrings,
including empty sequence, each matched by the regexp
- “
a*” → «aaa»
- “
a*” → «``»
- “
a*” → «a»
- ”
[0-9]*” → «7»
- ”
[0-9]*” → «``»
- ”
[0-9]*” → «1231234»
- ”
.*” → any string!
- any complex regexp enclosed by special grouping parenthesis “
\(” and “\)” (see below)
- Complex regexp
- A sequence of atomic regexps
- Matches a continuous sequence of substrings, each matched by corresponded atomic regexp
- “
boo” → «boo»
- “
r....e” → «riddle»
- “
r....e” → «r re e»
- ”
[0-9][0-9]*” → any non-negative integer
- ”
[A-Za-z_][A-Za-z0-9]*” → C identifier (alphanumeric sequence with «_», not started from digit)
- grouping parenthesis can be used for repeating complex regexp:
- ”
\([A-Z][a-z]\)*” → «ReGeXp»
- ”
\([A-Z][a-z]\)*” → «``»
- ”
\([A-Z][a-z]\)*” → «Oi»
- Implies leftmost longest rule (aka «greedy»):
In successful match of complex regexp leftmost atomic regexp takes longest possible match,
second leftmost atomic regexp takes longest match that possible in current condition; and so on
- ”
.*.*” → all the string leftmost, empty string next
- ”
[a-z]*[0-9]*[a-z0-9]*” → «123b0c0»
- ”
[a-z]*” → «»
- ”
[0-9]*” → «123»
- ”
[a-z0-9]*” → «b0c0»
- ”
[a-d]*[c-f]*[d-h]*” → «abcdefgh»
- ”
[a-d]*” → «abcd»
- ”
[c-f]*” → «ef»
- ”
[d-h]*” → «gh»
- Positioning mark
- ”
^regexp” matches only substrings located at the beginning of the line
- “
regexp$” matches only substrings located at the end of line