PySchools Python Quick Reference Guide

Regular Expressions

«  Recursions   ::   Contents

Regular Expressions

Regular expressions allow you to construct expressions (using some predefined syntax) to search for specific patterns in a string.

Summary of Regular Expression Syntax

The table below contains some commonly used and basic rule/sytnax used to construct regular expressions (RE).

Syntax Explanation Examples Notes
. (dot) any character except a newline
  • ‘a.c’ matches ‘abc’,
  • but not ‘bbc’.
^ the start of the string
  • ‘^abc’ matches ‘abcd’,
  • but not ‘dabc’.
$ the end of the string
  • ‘abc$’ matches ‘dabc’,
  • but not ‘abcd’.
* 0 or more repetitions of preceding RE
  • ‘a*’ matches ‘a’, ‘aa’, ‘aaa’,
  • and also ‘b’ (zero ‘a’).
+ 1 or more repetitions of preceding RE
  • ‘a+’ matches ‘a’, ‘aa’, ‘aaa’,
  • but not ‘b’ (zero ‘a’).
? 0 or 1 repetition of preceding RE
  • ‘a?’ matches 0 or more ‘a’s,
  • but returns at most 1 ‘a’.
{ m } m repetitions of preceding RE
  • ‘a{3}’ matches ‘aaa’, ‘baaa’,
  • but not ‘aa’, ‘baab’.
{ m , n } m to n repetitions of preceding RE
  • ‘a{2,3}’ matches ‘aa’, ‘aaa’,
  • but not ‘a’, ‘bab’.
[ ] a set of characters between [ and ]
  • ‘[abcd]’ matches string containing any of ‘a’, ‘b’, ‘c’, and ‘d’.
A | B A or B where A and B are arbitrary REs
  • ‘ab|cd’ matches string containing ‘ab’ and/or ‘cd’.
\d any decimal digit 0 to 9
  • ‘\d’ matches ‘a1b’,
  • but not ‘ab’.
\w any alphanumeric character and underscore i.e. a-zA-Z0-9_
  • ‘\w’ matches ‘a1B’,
  • but not ‘%#!’.
( ... ) any RE inside parentheses, and indicates the start and end of a group. \ number can be used to match the same group later in the string.
  • r’(\d)(\d)\2’ matches ‘122’
  • but not ‘112’.
[1]To match a complement set, use ‘^’ as the first character of the set.
[2]A ‘r’ is prefix to the RE string to indicate raw string, so that there is no need to backslash the ‘\’.

Introduction of some methods from re module

Returns a MatchObject instance.

Scans through string looking for a location where the RE pattern produces a match, and returns a corresponding MatchObject instance.

>>> import re
>>> mobj ='[a-z]', 'A1#xc2')
>>> mobj ='(.*)@(.*)', '')
>>> mobj.groups()
('john', '')
match(pattern, string[, flags])

Returns a MatchObject instance.

match is similar to search, but it matches from the beginning of the string.

>>> import re
>>>'\d', 'a123')             # matches digit '1'.
<_sre.SRE_Match object at 0x7ff3d758>
>>> re.match(r'\d', 'a123')              # No match
>>> re.match(r'\d', '123')
<_sre.SRE_Match object at 0x7fec4d08>    # matches digit '1'.

MatchObject Properties

A MatchObject instance is returned by search and match methods if the RE pattern is matched.

>>> import re
>>> mobj ='\d+', '====1234567===')
>>> mobj.start() # returns start index of matched substring
>>> mobj.end()   # return end index of matched substring
>>> mobj.span()  # return start and end index
(4, 11)
>>> mobj.string  # return the string passed in

The examples above just touch the surface of the re module, for full Python documentation, refer to Regular Expression HOWTO.

See also

Ready for some practice? Test your understanding at PySchools: Regular Expressions.

«  Recursions   ::   Contents