我有一个名为df的数据帧,看起来像这样
Text No c0404079=0.00 34 c1444716<=0.00 45 1c0<0226311 <= 0.00 36 c0001208 <= 0.00 32 0.00<c0243026<=2.00 85 c0036983 <= 0.00 55 c00369
74=0.00 39
我想在df中创建一个名为“Code”的新列
该代码可以是第一列中以字母c开头的代码,直到最后一个非字母数字字符或行尾
因此数据帧将
c0404079=0.00 34 c0404079 c1444716<=0.00 45 c1444716 1.0<c00226311 <= 0.00 36 c00226311 c0001208 <= 0.00 32 c0001208 0.00<c0243026<=2.00 85 c0243026 c0036983 <= 0.00 55 c0036983 c0036974=0.00 39 c0036974
知道怎么做吗?
我试过了,但没有得到正确的结果
df['Code'] = df['Text'].str.extract(r'c^(\d[^\W_]{5,})')
给定您的df,以下是如何获取从字母c到第一个非字母数字字符的所有内容:
df['extracted'] = df['text'].str.extract(r'(c[^\W]+)')
text extracted 0 c1444716<=0.00 c1444716 1 1.0c00226311 <= 0.00 c00226311 2 0.00<c0243026<=2.00 c0243026