首次使用
Series.str.split
具有
expand=True
用于数据帧和重塑
DataFrame.stack
,然后按
Series.value_counts
并通过
Series.head
:
df = pd.DataFrame({'Keywords':['aa,bb,vv,vv','aa,aa,cc,bb','zz,bb,aa,ss']})
N = 5
df1 = (df.Keywords.str.split(',', expand=True)
.stack()
.value_counts()
.head(N)
.rename_axis('val')
.reset_index(name='count'))
print (df1)
val count
0 aa 4
1 bb 3
2 vv 2
3 zz 1
4 cc 1
如果没有丢失的值,另一个解决方案是展平拆分的列表并按
Counter
:
from collections import Counter
N = 5
df1 = pd.DataFrame(Counter([y for x in df.Keywords for y in x.split(',')]).most_common(N),
columns=['val','count'])
print (df1)
val count
0 aa 4
1 bb 3
2 vv 2
3 zz 1
4 cc 1