代码之家 › 专栏 › 技术社区 › Umar.H

按ID和周数计算单列滚动变化

pandas python

Umar.H · 技术社区 · 6 年前

我有一个数据帧,如下所示:

id = [1,1,1,2,2,2]
weeks = [1,2,3,1,2,3]
contr = [16,16,22,37,37,16]


df = pd.DataFrame({'ID' : id,
             'Week' : weeks,
             'Contract' : contr})

print(df)
   ID  Week Contract
0   1   1   16
1   1   2   16
2   1   3   22
3   2   1   37
4   2   2   37
5   2   3   16

现在我要做的是按ID计算一周内合同的变更数量(我的df很小,大约180万行)

所以我想我的工作是对一个值进行滚动计数,这个值不等于上面的值,我试着用下面的代码:

df['count'] = df['ID'].groupby((df['Contract'] != df['Contract'].shift(-1)).cumsum()).cumcount()

但这并没有给我想要的结果,

我想要的是如下的东西

    print(df)
   ID  Week Contract count
0   1   1   16       0   # First instance is this is ignored 
1   1   2   16       0   # No Change so 0
2   1   3   22       1   # Change here so 1
3   2   1   37       0
4   2   2   37       0
5   2   3   16       1
6   2   4   16       0  # This should be 0 as the change was in the prev Week

(如果这不符合最低限度的问题,请让我知道)。

1 回复 | 直到 6 年前

BENY 6 年前

我认为使用 diff 为了得到值的变化与否,我们需要另一个 groupby 到 cumsum 通过 ID

s=df.groupby('ID').Contract.diff().ne(0)
s.groupby(df['ID']).cumsum()-1
Out[33]: 
0    0.0
1    0.0
2    1.0
3    0.0
4    0.0
5    1.0
Name: Contract, dtype: float64
df['Count']=s.groupby(df['ID']).cumsum()-1

anky 6 年前

df['Count']=df.groupby('ID')['Contract'].apply(lambda x: (~x.duplicated()).cumsum()-1)
#or df.groupby('ID')['Contract'].transform(lambda x: pd.factorize(x)[0])
print(df)

   ID  Week  Contract  Count
0   1     1        16      0
1   1     2        16      0
2   1     3        22      1
3   2     1        37      0
4   2     2        37      0
5   2     3        16      1