正则表达式的应用

约 1007 字大约 3 分钟

2025-08-18

一、正则表达式基础

1. 创建正则表达式

在JavaScript中，有两种方式创建正则表达式：

// 字面量形式
const regex1 = /pattern/flags;

// 构造函数形式
const regex2 = new RegExp('pattern', 'flags');

2. 常用修饰符(flags)

i: 不区分大小写匹配
g: 全局匹配(查找所有匹配而非在第一个匹配后停止)
m: 多行匹配
u: 使用Unicode码点进行匹配
s: 允许.匹配换行符(ES2018新增)

3. 正则表达式方法

3.1 正则对象的方法

const regex = /hello/;

// test() - 测试是否匹配，返回布尔值
regex.test('hello world'); // true

// exec() - 执行搜索匹配，返回结果数组或null
regex.exec('hello world'); // ["hello", index: 0, input: "hello world", groups: undefined]

3.2 字符串的方法

const str = 'hello world';

// match() - 返回匹配结果
str.match(/hello/); // ["hello", index: 0, input: "hello world", groups: undefined]

// search() - 返回匹配到的位置索引
str.search(/world/); // 6

// replace() - 替换匹配的子串
str.replace(/world/, 'JavaScript'); // "hello JavaScript"

// split() - 使用正则分割字符串
str.split(/\s+/); // ["hello", "world"]

二、正则表达式语法

1. 字符类

\d: 数字(0-9)
\D: 非数字
\w: 单词字符(字母、数字、下划线)
\W: 非单词字符
\s: 空白字符(空格、制表符、换行符等)
\S: 非空白字符
.: 除换行符外的任意字符(使用s修饰符时可包含换行符)

2. 量词

*: 0次或多次
+: 1次或多次
?: 0次或1次
{n}: 恰好n次
{n,}: 至少n次
{n,m}: n到m次

3. 边界匹配

^: 字符串开头(多行模式下匹配行开头)
$: 字符串结尾(多行模式下匹配行结尾)
\b: 单词边界
\B: 非单词边界

4. 分组与捕获

(...): 捕获分组
(?:...): 非捕获分组
(?<name>...): 命名捕获组(ES2018)

5. 断言

x(?=y): 正向肯定查找(后面是y的x)
x(?!y): 正向否定查找(后面不是y的x)
(?<=y)x: 反向肯定查找(前面是y的x)(ES2018)
(?<!y)x: 反向否定查找(前面不是y的x)(ES2018)

三、实用技巧与最佳实践

使用命名捕获组提高可读性

const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = dateRegex.exec('2023-05-15');
console.log(match.groups); // {year: "2023", month: "05", day: "15"}

动态创建正则表达式

function escapeRegExp(string) {
  return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

const searchTerm = 'hello.world';
const regex = new RegExp(escapeRegExp(searchTerm), 'gi');

使用正向/反向断言

// 匹配后面跟着px的数字
const pxRegex = /\d+(?=px)/;
'12px'.match(pxRegex); // ["12"]

// 匹配前面有$的数字
const dollarRegex = /(?<=\$)\d+/;
'Price: $100'.match(dollarRegex); // ["100"]

性能优化

避免过度使用通配符.*，尽量使用更具体的模式
将最可能匹配的模式放在前面
避免不必要的捕获组，使用非捕获组(?:...)
预编译常用正则表达式(特别是在循环中使用时)

调试正则表达式

可以使用在线工具如 regex101.com 或 regexr.com 来调试和测试正则表达式。

四、常见问题与解决方案

1. 贪婪匹配 vs 惰性匹配

默认情况下，量词是贪婪的(尽可能多匹配)，可以在量词后加?变为惰性匹配(尽可能少匹配)。

const greedyRegex = /<.*>/;
const lazyRegex = /<.*?>/;

'<div>content</div>'.match(greedyRegex)[0]; // "<div>content</div>"
'<div>content</div>'.match(lazyRegex)[0]; // "<div>"

2. 多行匹配

使用m修饰符使^和$匹配每行的开头和结尾。

const multiLineRegex = /^line/gm;
const text = `line 1
line 2
line 3`;

text.match(multiLineRegex); // ["line", "line", "line"]

3. Unicode字符匹配

使用u修饰符正确处理Unicode字符。

/^.$/.test('😊'); // false
/^.$/u.test('😊'); // true

4. 替换中的特殊字符

在replace方法中使用$&, $1, $2等引用匹配结果。

'John Smith'.replace(/(\w+)\s(\w+)/, '$2, $1'); // "Smith, John"