[散修弟子]Python-164|2018年01月16日的日报-技能树.IT修真院

发表于： 2018-01-16 21:14:17

1 1126

今天完成的事情：今天继续系统学习Python中的正则表达式相关知识

正则表达式（二）

5、re 模块的主要功能函数

常用的功能函数包括：compile、match、search、split、findall（finditer）、sub（subn）。

（1）compile() 函数

　　格式：re.compile(pattern[, flags])

　　作用：把正则表达式 pattern 编译转化成正则表达式的一个 SRE_Pattern 对象（re的内置对象）。其中参数 flags 指匹配模式，取值为 re.I、re.L、re.M、re.S、re.U、re.X（具体用法见上一日报的“4、模式”部分）。flags 默认为0，则不使用任何模式。

　　编译生成 SRE_Pattern 对象后，可以调用 re 模块的其他函数来完成匹配，如search、findall等，方便后面的代码进行复用。

import re
s = """first line
second line
third line"""
regex = re.compile(".+")
print(regex.findall(s))

运行结果如下：

['first line', 'second line', 'third line']

（2）match() 函数

　　格式1：re.match(pattern, string[, flags])

　　格式2：match(string[, pos[, endpos]])

　　作用：在字符串 string 的开始位置处尝试匹配指定的正则表达式 pattern ，如果匹配成功，则返回一个 SRE_Match 对象（re的内置对象），并且不再继续匹配；如果没有找到匹配的位置，则返回 None。

import re
s = """first line
second line
third line"""
print(re.match(".+", s))      # 格式1，正则表达式不编译,直接使用match函数
regex = re.compile(".+")
print(regex.match(s, 2, 9))   # 格式2，正则表达式先编译，再调用match函数

运行结果如下：

<_sre.SRE_Match object; span=(0, 10), match='first line'>

<_sre.SRE_Match object; span=(2, 9), match='rst lin'>

（3）search() 函数

　　格式1：re.search(pattern, string[, flags])

　　格式2：search(string[, pos[, endpos]])

　　作用：在整个字符串 string 中尝试匹配指定的正则表达式 pattern ，如果匹配成功，则返回一个 SRE_Match 对象（re的内置对象），并且不再继续匹配；如果没有找到匹配，则返回 None。

　　与 match() 函数不同的是，search() 函数不限制仅在字符串的开始匹配，而是扫描整个字符串进行匹配。

import re
s = """first line
second line
third line"""
print("1---", re.match('i\w+', s))     # match函数从开始位置处尝试匹配
print("2---", re.search('i\w+', s))    # 格式1,正则表达式不编译，直接调用
regex = re.compile("i\w+")
print("3---", regex.search(s).group())  # 格式2，正则表达式先编译，再调用

运行结果如下：

1--- None

2--- <_sre.SRE_Match object; span=(1, 5), match='irst'>

3--- irst

（4）split() 函数

　　格式1：re.split(pattern, string[, maxsplit=0, flags=0])

　　格式2：split(string[, maxsplit=0])

　　作用：可以将字符串匹配正则表达式的部分割开并返回一个列表。

　　参数 maxsplit 指定分割次数，如果匹配成功，返回包含分割后子串的列表，如果匹配失败，则返回包含原字符串的一个列表。

import re
s = """first 111 line
second 222 line
third 333 line"""

# 按照数字分割
print("1---", re.split('\d+', s))
# \.+ 匹配不到 返回包含自身的列表
print("2---", re.split('\.+', s, 1))
# maxsplit 参数
print("3---", re.split('\d+', s, 1))

运行结果如下：

1--- ['first ', ' line\nsecond ', ' line\nthird ', ' line']

2--- ['first 111 line\nsecond 222 line\nthird 333 line']

3--- ['first ', ' line\nsecond 222 line\nthird 333 line']

（5）findall() 函数

　　格式1：re.findall(pattern, string[, flags])

　　格式2：findall(string[, pos[, endpos]])

　　作用：在字符串中找到正则表达式所匹配的所有子串，并组成一个列表返回。如果没有匹配到任何子串，返回一个空列表。

　　finditer() 函数

　　格式1：re.finditer(pattern, string[, flags])

　　格式2：finditer(string[, pos[, endpos]])

　　作用：和 findall() 类似，在字符串中找到正则表达式所匹配的所有子串，并组成一个迭代器返回。

import re
s = """first line
second line
third line"""
regex = re.compile("\w+")
print("1---", regex.findall(s))
print("2---", regex.finditer(s))

运行结果如下：

1--- ['first', 'line', 'second', 'line', 'third', 'line']

2--- <callable_iterator object at 0x032219F0>

（6）sub() 函数

　　格式1：re.sub(pattern, repl, string[, count, flags])

　　格式2：sub(repl, string[, count=0])

　　作用：在字符串 string 中找到匹配正则表达式 pattern 的所有子串，用另一个字符串 repl 进行替换。如果没有找到匹配 pattern 的串，则返回原字符串 string。Repl 既可以是字符串也可以是一个函数。参数 count 用于指定最大替换次数。

　　subn() 函数

　　格式1：re.subn(pattern, repl, string[, count, flags])

　　格式2：subn(repl, string[, count=0])

　　作用：该函数的功能和 sub() 相同，但它还返回新的字符串以及替换的次数。

import re
s = 'https://113.215.20.136:9011/c3pr90ntcya0/C80-E0F67DC5603F9F40.flv'
pattern = 'https://(.*?):9011/'
out1 = re.sub(pattern, 'http://127.0.0.1:9091/', s)
print("1---", out1)
out2 = re.subn(pattern, 'http://127.0.0.1:9091/', s)
print("2---", out2)