代码之家  ›  专栏  ›  技术社区  ›  natemcintosh

Pandas groupby:将两列视为一列

  •  1
  • natemcintosh  · 技术社区  · 7 年前

    我可以通过将这两列转换为一列元组来执行groupby操作,并按该列进行分组。然而,我的实际数据帧非常大,添加另一列可能会减慢速度。我想知道在熊猫身上是否有更惯用的方法来做到这一点。

    In [1]: import pandas as pd                                                                                                                                                                                 
    In [2]: import numpy as np                                                                                                                                                                                  
    In [3]: key  = np.random.randint(low = 1, high = 20, size = 100) 
       ...: data = np.random.random(size = (100)) 
       ...: d1   = {'key':key, 'time':range(1,101), 'data':data} 
       ...: df1  = pd.DataFrame(d1) 
       ...: print(df1.shape) 
       ...: df1.head()                                                                                                                                                                                                 
    (100, 3)
    Out[3]: 
        key  time      data
    0     3     1  0.778231
    1    13     2  0.822494
    2     4     3  0.053416
    3     8     4  0.894341
    4     7     5  0.884310
    In [4]: key = range(1,21) 
       ...: lat = np.random.randint(low = 0, high = 90, size = 20) 
       ...: lon = np.random.randint(low = 0, high = 90, size = 20) 
       ...: d2  = {'key':key, 'lat':lat, 'lon':lon} 
       ...: df2 = pd.DataFrame(d2) 
       ...: print(df2.shape) 
       ...: df2.head()                                                                                                                                                                                                 
    (20, 3)
    Out[4]: 
        key  lat  lon
    0     1   36   81
    1     2    6   57
    2     3   84    4
    3     4   61    0
    4     5   54   69
    In [5]: result = pd.merge(df1, df2).sort_values('time') 
       ...: result.head()                                                                                                                                                                                            
    Out[5]: 
        key  time      data  lat  lon
    0     3     1  0.778231   84    4
    4    13     2  0.822494   12   19
    13    4     3  0.053416   61    0
    18    8     4  0.894341   49   34
    23    7     5  0.884310    8   13
    

    (确保在框中向下滚动以查看的输出。) In [5]

    result.groupby(('lat','lon')) 并让熊猫将两列视为一列。有办法做到这一点吗?还是我应该咬紧牙关,创建一个新的数据元组列?

    1 回复  |  直到 7 年前
        1
  •  3
  •   Szymon Maszke    7 年前

    在这一点上,我希望能够做到 result.groupby(('lat','lon'))

    示例数据:

    key  time     data  lat  lon
    3     1   0.231000   84    4
    4     1   0.832310   22   11
    5     1   1.210000   84    4
    6     1   3.778231   22   11
    8     1  15.450000   84    4
    

    import pandas as pd
    
    for name, group in df.groupby(["lat", "lon"]):
        print("Group indices: {}".format(name))
        print(group)
    

    Group indices: (22, 11)
       key  time      data  lat  lon
    1    4     1  0.832310   22   11
    3    6     1  3.778231   22   11
    Group indices: (84, 4)
       key  time    data  lat  lon
    0    3     1   0.231   84    4
    2    5     1   1.210   84    4
    4    8     1  15.450   84    4
    

    这不正是你想要的吗?还是我误解了什么?