代码之家 › 专栏 › 技术社区 › owwoow14

替换match python[duplicate]中的特定子字符串

replace pandas regex python

-1

owwoow14 · 技术社区 · 6 年前

我要用regex替换df序列中匹配字符串的子字符串。我看了文件(例如。 HERE )我找到了一个解决方案,可以捕获我想要匹配的特定类型的字符串。但是,在替换过程中,它不会替换子字符串。

data
initthe problem
nationthe airline
radicthe groups
professionthe experience
the cat in the hat

我尝试了以下解决方案:

patt = re.compile(r'(?:[a-z])(the)')
df['data'].str.replace(patt, r'al')

但是,它也替换了“the”前面的非空白字符。

有什么建议我可以做的只是重新计算这些子字符串的具体情况吗?

1 回复 | 直到 6 年前

Tim Biegeleisen 6 年前

the ,但实际上不消耗任何东西:

input = "data\ninitthe problem\nnationthe airline\nradicthe groups\nprofessionthe experience\nthe cat in the hat"

output = re.sub(r'(?<=[a-z])the', 'al', input)
print(output)

data
inital problem
national airline
radical groups
professional experience
the cat in the hat