代码之家  ›  专栏  ›  技术社区  ›  sync11

从groupby pandas中提取多索引

  •  4
  • sync11  · 技术社区  · 7 年前

    提取后第二个索引的有效方法是什么 groupby 根据每组的最大客户计数。

    假设一个数据帧 df 有不同的州和10名官员(姓名 Officer 1 Officer 10 )在每个州。专栏 Current Status 总是有价值的 Customer :

        State List  Sales Officer   Current Status
    0   UP          Officer 4       Customer
    1   MH          Officer 5       Customer
    2   AP          Officer 6       Customer
    3   AN          Officer 2       Customer
    4   GJ          Officer 3       Customer
    .... so on
    

    预期产出包括每个州客户数最高的销售员:

    State List     Sales Officer      
    AN             Officer 6          403
    AP             Officer 1          266
                   Officer 8          266
    ... and so on
    

    到目前为止,我已经执行了以下操作:

    df.groupby(['State List', 'Sales Officer'])['Current Status'].count()#.reset_index()
    

    给我以下信息:

    State List  Sales Officer
    AN          Officer 1        376
                Officer 10       401
                Officer 2        353
                Officer 3        373
                Officer 4        375
                Officer 5        382
                Officer 6        403
                Officer 7        400
                Officer 8        385
                Officer 9        378
    AP          Officer 1        266
                Officer 10       228
                Officer 2        240
                Officer 3        248
                Officer 4        235
                Officer 5        229
                Officer 6        242
                Officer 7        238
                Officer 8        266
                Officer 9        243
    

    现在,我一直在寻找 Sales Officer 反对每个 State List 最大客户数。任何想法!

    1 回复  |  直到 7 年前
        1
  •  4
  •   jezrael    7 年前

    使用 boolean indexing 具有 transform 属于 max 为了回报 Series 与原尺寸相同:

    s = df.groupby(['State List', 'Sales Officer'])['Current Status'].count()
    df = s[s == s.groupby('State List').transform('max')]
    print (df)
    State List  Sales Officer
    AN          Officer 6        403
    AP          Officer 1        266
                Officer 8        266
    Name: a, dtype: int64
    

    细节 :

    print (s.groupby('State List').transform('max'))
    State List  Sales Officer
    AN          Officer 1        403
                Officer 10       403
                Officer 2        403
                Officer 3        403
                Officer 4        403
                Officer 5        403
                Officer 6        403
                Officer 7        403
                Officer 8        403
                Officer 9        403
    AP          Officer 1        266
                Officer 10       266
                Officer 2        266
                Officer 3        266
                Officer 4        266
                Officer 5        266
                Officer 6        266
                Officer 7        266
                Officer 8        266
                Officer 9        266
    Name: a, dtype: int64