你可以创造
Series
使用“删除重复项和缺少的值”,切换到DictionAnry并用于新的第一列使用
DataFrame.insert
具有
Series.map
:
s = (df.set_index('unique_key')['test_ids']
.str.split(',')
.explode()
.astype(float)
.dropna()
.astype(int)
.drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}
tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32
另一个想法是与
DataFrame
创造
系列
对于映射:
s = (df.assign(new = df['test_ids'].str.split(','))
.explode('new')
.astype({'new':float})
.dropna(subset=['new'])
.astype({'new':int})
.drop_duplicates(subset=['new'])
.set_index('new')['unique_key'])
print (s)
new
1 1
15 1
2 1
51 2
75 2
11 2
Name: unique_key, dtype: int64
tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32