代码之家  ›  专栏  ›  技术社区  ›  S.EB

如何使用pandas对数据帧列表进行均值处理。面板?

  •  0
  • S.EB  · 技术社区  · 4 年前

    我有200个用户,我正在计算每个方法(行)的度量值(列),并将其保存到数据帧中。我跟着这个 post 这是使用 pandas.Panel 为每种方法的所有度量取所有用户的平均值

    之前 for loop 为了计算用户的测量值,例如,这是针对两个用户的( 0 1 )

    dfs = {}
    for s in range(0, 2): # do the following for user0 and user1
        .
        # some commands for calculation of measurements
        .
        .
        .
        #end of the loop
        dfs[s] = pd.concat([ov_df, sd_df], axis=1)  # dataframe for user s
    panel = pd.Panel(dfs)
    *** TypeError: object() takes no parameters
    

    如何将平均值应用于所有用户 15 measures 11 methods 个别地?

    dfs
    {0:              m1        s2       ...      ee         vd
    RF              0.536819  0.698611  ...  57.144087 -55.781946
    OL              0.480758  0.649341  ...  61.991170 -57.210469
    LA              0.427991  0.599431  ...  67.091363 -57.026384
    AP              0.466703  0.636397  ...  63.612812 -57.285542
    AP2             0.467951  0.637557  ...  63.677943 -59.602584
    MA              0.428375  0.599807  ...  67.073286 -56.977762
    RC              0.536892  0.698672  ...  57.135469 -55.766803
    DP              0.536819  0.698611  ...  57.144087 -55.781946
    DC              0.537510  0.699195  ...  57.014234 -55.574017
    KU              0.537032  0.698791  ...  57.111874 -55.745237
    KE              0.493517  0.660879  ...  60.704082 -57.366922
    
    [11 rows x 15 columns], 1:                  m1        s2       ...      ee         vd
    RF              0.369103  0.539190  ...  61.541261 -48.183651
    OL              0.334069  0.500827  ...  66.807720 -43.531795
    LA              0.300838  0.462530  ...  70.741817 -39.702935
    AP              0.322879  0.488146  ...  68.371827 -38.054113
    AP2             0.322453  0.487659  ...  68.212097 -47.518693
    MA              0.301198  0.462955  ...  70.716283 -39.436550
    RC              0.369095  0.539181  ...  61.546610 -48.155079
    DP              0.369103  0.539190  ...  61.541261 -48.183651
    DC              0.369500  0.539613  ...  61.484330 -48.376968
    KU              0.369116  0.539203  ...  61.539789 -48.176711
    KE              0.341218  0.508818  ...  65.061794 -49.218448
    
    1 回复  |  直到 4 年前
        1
  •  0
  •   David Warren    4 年前
    #load in dataframes. Example using 2 dataframes for person 0 (df0), and person 1 (df1)
    
    #concatenate the dataframes, switch their level, and sort to make easier
    df_combined = pd.concat([df0, df1], keys=[0,1], names=['user', 'method'])
    df_combined = df_combined.swaplevel(1,0)
    print(df_combined.sort_index())
    

    输出

                      m1        s2         ee         vd
    method user                                          
    AP     0     0.466703  0.636397  63.612812 -57.285542
           1     0.322879  0.488146  68.371827 -38.054113
    AP2    0     0.467951  0.637557  63.677943 -59.602584
           1     0.322453  0.487659  68.212097 -47.518693
    DC     0     0.537510  0.699195  57.014234 -55.574017
           1     0.369500  0.539613  61.484330 -48.376968
    DP     0     0.536819  0.698611  57.144087 -55.781946
           1     0.369103  0.539190  61.541261 -48.183651
    KE     0     0.493517  0.660879  60.704082 -57.366922
           1     0.341218  0.508818  65.061794 -49.218448
    KU     0     0.537032  0.698791  57.111874 -55.745237
           1     0.369116  0.539203  61.539789 -48.176711
    LA     0     0.427991  0.599431  67.091363 -57.026384
           1     0.300838  0.462530  70.741817 -39.702935
    MA     0     0.428375  0.599807  67.073286 -56.977762
           1     0.301198  0.462955  70.716283 -39.436550
    OL     0     0.480758  0.649341  61.991170 -57.210469
           1     0.334069  0.500827  66.807720 -43.531795
    RC     0     0.536892  0.698672  57.135469 -55.766803
           1     0.369095  0.539181  61.546610 -48.155079
    RF     0     0.536819  0.698611  57.144087 -55.781946
           1     0.369103  0.539190  61.541261 -48.183651
    
    #Average based on the method
    df_combined.groupby(level=0).mean()
    

    输出

        m1          s2          ee           vd
    method              
    AP  0.394791    0.562272    65.992319   -47.669827
    AP2 0.395202    0.562608    65.945020   -53.560638
    DC  0.453505    0.619404    59.249282   -51.975493
    DP  0.452961    0.618900    59.342674   -51.982799
    KE  0.417368    0.584848    62.882938   -53.292685
    KU  0.453074    0.618997    59.325831   -51.960974
    LA  0.364414    0.530981    68.916590   -48.364660
    MA  0.364787    0.531381    68.894785   -48.207156
    OL  0.407413    0.575084    64.399445   -50.371132
    RC  0.452994    0.618926    59.341040   -51.960941
    RF  0.452961    0.618900    59.342674   -51.982799
    

    从这里开始,如果你需要看到基于度量的平均值,那么相应地调整它就足够简单了(即使用 pd.transpose )

    参考另一篇文章中提供的解决方案,您似乎错过了一步:

    panel=pd.panel(dfs).mean(axis=0)
    

    请注意,自0.20版本以来,此功能已被弃用 pandas.Panel

        2
  •  0
  •   S.EB    4 年前

    我在中找到了答案 this post 仅限一行命令

    df = pd.concat(dfs).mean(level=0)