代码之家 › 专栏 › 技术社区 › lmocsi

如何在Python中使用pandas数据框字段regex替换另一个字段中的文本?

dataframe pandas regex python

lmocsi · 技术社区 · 7 年前

#import re
import pandas as pd
df = pd.DataFrame([['I like apple pie','apple'],['Nice banana and lemon','banana|lemon']], columns=['text','words'])
df['text'] = df['text'].str.replace(r''+df['words'].str, '*'+group(0)+'*')
df

我想用*标记找到的单词。
我该怎么做?

我喜欢苹果派
不错的香蕉和柠檬*

2 回复 | 直到 7 年前

Paolo 7 年前

你可以从 words 在替换中使用backreference将其包装 * :

import re
import pandas as pd
df = pd.DataFrame([['I like apple pie','apple'],['Nice banana and     lemon','banana|lemon']], columns=['text','words'])

df['text'] = df['text'].replace(r'('+df['words']+')', r'*\1*', regex=True)
print(df)

印刷品:

                            text         words
0             I like *apple* pie         apple
1  Nice *banana* and     *lemon*  banana|lemon

BENY 7 年前

IIUC使用 (?i) 与 re.I

df.text.replace(regex=r'(?i)'+ df.words,value="*")
Out[131]: 
0        I like * pie
1    Nice * and     *
Name: text, dtype: object

既然你更新了问题

df.words=df.words.str.split('|')
s=df.words.apply(pd.Series).stack()
df.text.replace(dict(zip(s,'*'+s+'*')),regex=True)
Out[139]: 
0               I like *apple* pie
1    Nice *banana* and     *lemon*
Name: text, dtype: object

推荐文章