0. Introduction
- reference. https://www.dataquest.io/wp-content/uploads/2019/03/python-regular-expressions-cheat-sheet.pdf
- Quick Example : 한/영/숫자/- 를 포함하려면
'[가-힣|a-z|0-9|\-]+'
1. Special Characters
^
| Matches the expression to its right at the start of a string. It matches every such instance before each\n
in the string.$
| Matches the expression to its left at the end of a string. It matches every such instance before each\n
in the string..
| Matches any character except line terminators like `\n``- ``` | Escapes special characters or denotes character classes
A|B
| Matches expressionA
orB
. IfA
is matched first,B
is left untried+
| Greedily matches the expression to its left 1 or more times*
| Greedily matches the expression to its left 0 or more times?
| Greedily matches the expression to its left 0 or 1 times. But if?
is added to qualifiers (+
,*
, and?
itself) it will perform matches in a non-greedy manner{m}
| Matches the expression to its leftm
times, and not less{m,n}
| Matches the expression to its leftm
ton
times, and not less.{m,n}?
| Matches the expression to its leftm
times, and ignoresn
. See?
above.
2. Character Classes (a.k.a. Special Sequences)
\w
| Matches alphanumeric characters, which meansa-z
,A-Z
, and0-9
. It also matches the underscore,_
.\d
| Matches digits, which means0-9
.\D
| Matches any non-digits.\s
| Matches whitespace characters, which include the\t
,\n
,\r
, and space characters.- \S
| Matches non-whitespace characters.
\b
| Matches the boundary (or empty string) at the start and end of a word, that is, between\w
and\W
.\B
| Matches where\b
does not, that is, the boundary of\w
characters.\A
| Matches the expression to its right at the absolute start of a string whether in single or multi-line mode.\Z
| Matches the expression to its left at the absolute end of a string whether in single or multi-line mode.
3. Sets
[ ]
| Contains a set of characters to match.[amk]
| Matches eithera
,m
, ork
. It does not matchamk
.[a-z]
| Matches any alphabet froma
toz
.[a\-z]
| Matchesa
,-
, orz
. It matches-
because\
escapes it.[a-]
| Matchesa
or-
, because-
is not being used to indicate a series of characters.[-a]
| As above, matchesa
or-
.[a-z0-9]
| Matches characters froma
toz
and also from0
to9
.[(+*)]
| Special characters become literal inside a set, so this matches(
,+
,*
, and)
.[^ab5]
| Adding^
excludes any character in the set. Here, it matches characters that are nota
,b
, or5
.
4. Groups
-
( )
| Matches the expression inside the parentheses and groups it. -
(? )
| Inside parentheses like this,?
acts as an extension notation. Its meaning depends on the character immediately to its right. -
(?PAB)
| Matches the expressionAB
, and it can be accessed with the group name. -
(?aiLmsux)
| Here,a
,i
,L
,m
,s
,u
, andx
are flags:a
— Matches ASCII onlyi
— Ignore caseL
— Locale dependentm
— Multi-lines
— Matches allu
— Matches unicodex
— Verbose
-
(?:A)
| Matches the expression as represented byA
, but unlike(?PAB)
, it cannot be retrieved afterwards. -
(?#...)
| A comment. Contents are for us to read, not for matching. -
A(?=B)
| Lookahead assertion. This matches the expressionA
only if it is followed byB
. -
A(?!B)
| Negative lookahead assertion. This matches the expressionA
only if it is not followed byB
. -
(?<=B)A
| Positive lookbehind assertion. This matches the expressionA
only ifB
is immediately to its left. This can only matched fixed length expressions. -
(?<!B)A
| Negative lookbehind assertion. This matches the expressionA
only ifB
is not immediately to its left. This can only matched fixed length expressions. -
(?P=name)
| Matches the expression matched by an earlier group named “name”. -
(...)\1
| The number1
corresponds to the first group to be matched. If we want to match more instances of the same expresion, simply use its number instead of writing out the whole expression again. We can use from1
up to99
such groups and their corresponding numbers.
5. Popular Python re
Module Functions
re.findall(A, B)
| Matches all instances of an expressionA
in a stringB
and returns them in a list.re.search(A, B)
| Matches the first instance of an expressionA
in a stringB
, and returns it as a re match object.re.split(A, B)
| Split a string B into a list using the delimiterA
.re.sub(A, B, C)
| ReplaceA
withB
in the stringC
.