代码之家 › 专栏 › 技术社区 › ababuji

pandas:按两列分组,并在其他三列中查找是否存在值

pandas-groupby pandas python-3.x python

ababuji · 技术社区 · 6 年前

我真的很难理解怎么去操纵 groupby 物体。。

以下是可复制的数据帧:

df = pd.DataFrame([[1, 1, 'Type1', 'Type3', 'General'],
                    [1, 1, 'Type1', 'Type2', 'Type3'], 
                    [1, 2, 'Type1', 'Type3', 'Type2'], 
                    [1, 2, 'General',  'Type2', 'Type3'],
                    [1, 3, 'Type1',  'Type2', 'Type3'], 
                    [1, 3, 'Type1',  'General', 'Type3'],
                    [1, 4, 'Type1',  'Type2', 'Type3'], 
                    [1, 4, 'Type7',  'Type2', 'Type3'], 
                    [1, 4, 'Type8',  'Type2', 'Type3'],
                    [1, 4, 'Type9',  'Type2', 'Type3'],
                    [1, 4, 'Type10',  'Type2', 'Type3']])

df.columns = ['eventId', 'listingId', 'SeatPart1', 'SeatPart2', 'SeatPart3']
print(df)

给予:

    eventId  listingId SeatPart1 SeatPart2 SeatPart3
0         1          1     Type1     Type3   General
1         1          1     Type1     Type2     Type3
2         1          2     Type1     Type3     Type2
3         1          2   General     Type2     Type3
4         1          3     Type1     Type2     Type3
5         1          3     Type1   General     Type3
6         1          4     Type1     Type2     Type3
7         1          4     Type7     Type2     Type3
8         1          4     Type8     Type2     Type3
9         1          4     Type9     Type2     Type3
10        1          4    Type10     Type2     Type3

现在,我想按两列分组 eventId 和 listingId . 在分组之后,在这些组内,如果存在,则 General 作为其他3列中任何一列的座椅类型 SeatPart1 或者 SeatPart2 或者 SeatPart3 ,我想要一个单独的列 SeatFlag 会有一个 1 为了那些 事件ID , 列表ID .

所以我得到的数据帧是:

    eventId  listingId SeatPart1 SeatPart2 SeatPart3  SeatFlag
0         1          1     Type1     Type3   General         1
1         1          1     Type1     Type2     Type3         1
2         1          2     Type1     Type3     Type2         1
3         1          2   General     Type2     Type3         1
4         1          3     Type1     Type2     Type3         1
5         1          3     Type1   General     Type3         1
6         1          4     Type1     Type2     Type3         0
7         1          4     Type7     Type2     Type3         0
8         1          4     Type8     Type2     Type3         0
9         1          4     Type9     Type2     Type3         0
10        1          4    Type10     Type2     Type3         0

再解释一下,

在里面 row0 ,用于 (eventId, listingId) = (1, 1) ,你看 海港3 (在 总则 需要在三个里面的任何一个 SeatPart 列)具有 总则 ,所以对于每一行 (事件ID,列表ID)=(1,1) ,和 赛特弗拉格 列将是 1个 ,但是 (eventId, listingId) = (1, 4) ,在任何行中,您都会发现 总则 在三个中的任何一个 海港 列,所以对于每一行 (事件ID,列表ID)=(1,4) ,和 赛特弗拉格 列将是 0 .

3 回复 | 直到 6 年前

Haleemur Ali 6 年前

groupby eventId&listingId,transform with函数检查是否与 'General' ,以及自由使用 any 减少到单个序列。

df['isGen'] =  df.groupby(
    ['eventId', 'listingId']
).transform(lambda x: (x == 'General').any()).any(axis=1).astype(int)

    eventId  listingId SeatPart1 SeatPart2 SeatPart3  isGen
0         1          1     Type1     Type3   General      1
1         1          1     Type1     Type2     Type3      1
2         1          2     Type1     Type3     Type2      1
3         1          2   General     Type2     Type3      1
4         1          3     Type1     Type2     Type3      1
5         1          3     Type1   General     Type3      1
6         1          4     Type1     Type2     Type3      0
7         1          4     Type7     Type2     Type3      0
8         1          4     Type8     Type2     Type3      0
9         1          4     Type9     Type2     Type3      0
10        1          4    Type10     Type2     Type3      0

Zero 6 年前

有一个办法

In [101]: isgen = df[['SeatPart1', 'SeatPart2', 'SeatPart3']].eq('General').any(1)

In [102]: df.assign(isgen=isgen).groupby(['eventId', 'listingId']
                                        )['isgen'].transform('any').astype(int)
Out[102]:
0     1
1     1
2     1
3     1
4     1
5     1
6     0
7     0
8     0
9     0
10    0
Name: isgen, dtype: int32

xyzjayne 6 年前

每个 groupby 元素是序列或数据帧。所以您想看看“general”是否是groupby元素的任何部分。

df['SeatFlag'] = df.groupby(['eventId','listingId']).transform(lambda x: (x=='General').sum()).sum(axis = 1)