代码之家 › 专栏 › 技术社区 › S.EB

如何使用pandas对数据帧列表进行均值处理。面板?

pandas-groupby dataframe pandas python

S.EB · 技术社区 · 4 年前

我有200个用户,我正在计算每个方法(行)的度量值(列),并将其保存到数据帧中。我跟着这个 post 这是使用 pandas.Panel 为每种方法的所有度量取所有用户的平均值

之前 for loop 为了计算用户的测量值,例如,这是针对两个用户的( 0 和 1 )

dfs = {}
for s in range(0, 2): # do the following for user0 and user1
    .
    # some commands for calculation of measurements
    .
    .
    .
    #end of the loop
    dfs[s] = pd.concat([ov_df, sd_df], axis=1)  # dataframe for user s
panel = pd.Panel(dfs)
*** TypeError: object() takes no parameters

如何将平均值应用于所有用户 15 measures 和 11 methods 个别地?

dfs
{0:              m1        s2       ...      ee         vd
RF              0.536819  0.698611  ...  57.144087 -55.781946
OL              0.480758  0.649341  ...  61.991170 -57.210469
LA              0.427991  0.599431  ...  67.091363 -57.026384
AP              0.466703  0.636397  ...  63.612812 -57.285542
AP2             0.467951  0.637557  ...  63.677943 -59.602584
MA              0.428375  0.599807  ...  67.073286 -56.977762
RC              0.536892  0.698672  ...  57.135469 -55.766803
DP              0.536819  0.698611  ...  57.144087 -55.781946
DC              0.537510  0.699195  ...  57.014234 -55.574017
KU              0.537032  0.698791  ...  57.111874 -55.745237
KE              0.493517  0.660879  ...  60.704082 -57.366922

[11 rows x 15 columns], 1:                  m1        s2       ...      ee         vd
RF              0.369103  0.539190  ...  61.541261 -48.183651
OL              0.334069  0.500827  ...  66.807720 -43.531795
LA              0.300838  0.462530  ...  70.741817 -39.702935
AP              0.322879  0.488146  ...  68.371827 -38.054113
AP2             0.322453  0.487659  ...  68.212097 -47.518693
MA              0.301198  0.462955  ...  70.716283 -39.436550
RC              0.369095  0.539181  ...  61.546610 -48.155079
DP              0.369103  0.539190  ...  61.541261 -48.183651
DC              0.369500  0.539613  ...  61.484330 -48.376968
KU              0.369116  0.539203  ...  61.539789 -48.176711
KE              0.341218  0.508818  ...  65.061794 -49.218448

1 回复 | 直到 4 年前

David Warren 4 年前

#load in dataframes. Example using 2 dataframes for person 0 (df0), and person 1 (df1)

#concatenate the dataframes, switch their level, and sort to make easier
df_combined = pd.concat([df0, df1], keys=[0,1], names=['user', 'method'])
df_combined = df_combined.swaplevel(1,0)
print(df_combined.sort_index())

输出

                  m1        s2         ee         vd
method user                                          
AP     0     0.466703  0.636397  63.612812 -57.285542
       1     0.322879  0.488146  68.371827 -38.054113
AP2    0     0.467951  0.637557  63.677943 -59.602584
       1     0.322453  0.487659  68.212097 -47.518693
DC     0     0.537510  0.699195  57.014234 -55.574017
       1     0.369500  0.539613  61.484330 -48.376968
DP     0     0.536819  0.698611  57.144087 -55.781946
       1     0.369103  0.539190  61.541261 -48.183651
KE     0     0.493517  0.660879  60.704082 -57.366922
       1     0.341218  0.508818  65.061794 -49.218448
KU     0     0.537032  0.698791  57.111874 -55.745237
       1     0.369116  0.539203  61.539789 -48.176711
LA     0     0.427991  0.599431  67.091363 -57.026384
       1     0.300838  0.462530  70.741817 -39.702935
MA     0     0.428375  0.599807  67.073286 -56.977762
       1     0.301198  0.462955  70.716283 -39.436550
OL     0     0.480758  0.649341  61.991170 -57.210469
       1     0.334069  0.500827  66.807720 -43.531795
RC     0     0.536892  0.698672  57.135469 -55.766803
       1     0.369095  0.539181  61.546610 -48.155079
RF     0     0.536819  0.698611  57.144087 -55.781946
       1     0.369103  0.539190  61.541261 -48.183651

#Average based on the method
df_combined.groupby(level=0).mean()

输出

    m1          s2          ee           vd
method              
AP  0.394791    0.562272    65.992319   -47.669827
AP2 0.395202    0.562608    65.945020   -53.560638
DC  0.453505    0.619404    59.249282   -51.975493
DP  0.452961    0.618900    59.342674   -51.982799
KE  0.417368    0.584848    62.882938   -53.292685
KU  0.453074    0.618997    59.325831   -51.960974
LA  0.364414    0.530981    68.916590   -48.364660
MA  0.364787    0.531381    68.894785   -48.207156
OL  0.407413    0.575084    64.399445   -50.371132
RC  0.452994    0.618926    59.341040   -51.960941
RF  0.452961    0.618900    59.342674   -51.982799

从这里开始,如果你需要看到基于度量的平均值,那么相应地调整它就足够简单了(即使用 pd.transpose )

参考另一篇文章中提供的解决方案,您似乎错过了一步:

panel=pd.panel(dfs).mean(axis=0)

请注意,自0.20版本以来,此功能已被弃用 pandas.Panel

S.EB 4 年前

我在中找到了答案 this post 仅限一行命令

df = pd.concat(dfs).mean(level=0)