我有一个数据帧,如下所示:
id = [1,1,1,2,2,2]
weeks = [1,2,3,1,2,3]
contr = [16,16,22,37,37,16]
df = pd.DataFrame({'ID' : id,
'Week' : weeks,
'Contract' : contr})
print(df)
ID Week Contract
0 1 1 16
1 1 2 16
2 1 3 22
3 2 1 37
4 2 2 37
5 2 3 16
现在我要做的是按ID计算一周内合同的变更数量(我的df很小,大约180万行)
所以我想
我的工作是对一个值进行滚动计数,这个值不等于上面的值,我试着用下面的代码:
df['count'] = df['ID'].groupby((df['Contract'] != df['Contract'].shift(-1)).cumsum()).cumcount()
但这并没有给我想要的结果,
我想要的是如下的东西
print(df)
ID Week Contract count
0 1 1 16 0 # First instance is this is ignored
1 1 2 16 0 # No Change so 0
2 1 3 22 1 # Change here so 1
3 2 1 37 0
4 2 2 37 0
5 2 3 16 1
6 2 4 16 0 # This should be 0 as the change was in the prev Week
(如果这不符合最低限度的问题,请让我知道)。