代码之家 › 专栏 › 技术社区 › KubiK888

复合正则表达式,用于删除python中特殊字符和一组可能的后续字符之间的子字符串

string python-3.x regex

KubiK888 · 技术社区 · 6 年前

我想把这些转过来

(book/livre), (manitoba), the (territories/des territoires), canada

(book/livre), (ontario), the territories/des territoires, canada

book/livre 1, alberta, the territories, canada

(book), (manitoba), the (territories), canada

(book), (ontario), the territories, canada

book 1, alberta, the territories, canada

意思是我想删除/和之间的所有内容,或者,

我的python代码如下:

self.df_census1901['LOC'] = self.df_census1901['LOC'].str.replace(r'/.*?\,', ',')
self.df_census1901['LOC'] = self.df_census1901['LOC'].str.replace(r'/.*?\)', ')')

1 回复 | 直到 6 年前

Sweeper 6 年前

你可以试试这个正则表达式:

/.*?(\)|(?: \d+)?,)

r"\1"

import re
result = re.sub(r"/.*?(\)|(?: \d+)?,)", r"\1", your_string)

正则表达式的开头与您的相同。把这三种情况结合起来的诀窍是 | (...)? .

剖析这一部分: (\)|(?: \d+)?,)

这是一个包含模式的组 \)|(?: \d+)?, ) (?: \d+)?,) . 匹配逗号,可选地在空格后的一组数字后面。

替换为第1组基本上会用 \)|(?:\d+?, 1, 是匹配的用于替换。

Demo

推荐文章

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

7 月前

Cam · Pandas列表日期到日期时间

7 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

7 月前

jjkennedy · Pandas文本文件导入:当每个文件中存在多个表时,自动选择1个表

7 月前

LMC · Numpy数组布尔索引以获取包含元素

8 月前

vr8ce · 非成对标记中特定字符的正则表达式

8 月前

Kernel · 如果指定了crs参数,shapefile的geopandas.read_file将出错

8 月前

ShaAnder · 为什么sqllachemy返回的是类而不是字符串

8 月前

sixtytrees · detectron2软件包未安装(没有名为“torch”的模块),但我安装了torch

8 月前

Pernoctador · Python映射可以复制吗?我需要参考地图

9 月前