代码之家 › 专栏 › 技术社区 › Igor Pozdeev

基于多索引的多个级别从数据帧中选择

multi-index pandas python-3.x

Igor Pozdeev · 技术社区 · 6 年前

当n>2时,如何扩展基于前n-1层从数据帧中选择的逻辑?

例如,考虑一个数据帧:

midx = pd.MultiIndex.from_product([[0, 1], [10, 20, 30], ["a", "b"]])
df = pd.DataFrame(1, columns=midx, index=np.arange(3))
In[11]: df
Out[11]: 
   0                 1               
  10    20    30    10    20    30   
   a  b  a  b  a  b  a  b  a  b  a  b
0  1  1  1  1  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1  1  1  1  1

在这里,很容易选择0或1位于第一级的列:

df[[0, 1]]

但同样的逻辑并没有扩展到选择第一级为0或1,第二级为10或20的列:

In[13]: df[[(0, 10), (0, 20), (1, 10), (1, 20)]]
ValueError: operands could not be broadcast together with shapes (4,2) (3,) (4,2)

以下工作:

df.loc[:, pd.IndexSlice[[0, 1], [10, 20], :]]

但是很麻烦,特别是当需要从另一个具有2级多索引的数据帧中提取选择器时:

idx = df.columns.droplevel(2)
In[16]: idx
Out[16]: 
MultiIndex(levels=[[0, 1], [10, 20, 30]],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, ... 1, 2, 2]])
In[17]: df[idx]
ValueError: operands could not be broadcast together with shapes (12,2) (3,) (12,2)

编辑: 理想情况下,我也希望能够以这种方式对列进行排序,而不仅仅是选择它们—同样,本着 df[[1, 0]] 能够根据第一级对列进行排序。

1 回复 | 直到 6 年前

jezrael 6 年前

如果可能的话,你可以通过 boolean indexing 具有 get_level_values 和 isin :

m1 = df.columns.get_level_values(0).isin([0,1])
m2 = df.columns.get_level_values(1).isin([10,20])

print (m1)
[ True  True  True  True  True  True  True  True  True  True  True  True]
print (m2)
[ True  True  True  True False False  True  True  True  True False False]
print (m1 & m2)
[ True  True  True  True False False  True  True  True  True False False]

df1 = df.loc[:, m1 & m2]
print (df1)
   0           1         
  10    20    10    20   
   a  b  a  b  a  b  a  b
0  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1

df.columns = df.columns.droplevel(2)
print (df)
   0                 1               
  10 10 20 20 30 30 10 10 20 20 30 30
0  1  1  1  1  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1  1  1  1  1

df2 = df.loc[:, m1 & m2]
print (df2)
   0           1         
  10 10 20 20 10 10 20 20
0  1  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1
2  1  1  1  1  1  1  1  1

推荐文章

Gabriela Catalina · 合并具有不同级别的两个数据帧,并将一级层次行索引移动到列

7 年前

Baron Yugovich · 展平嵌套熊猫数据框列

7 年前

marco · Python-Pandas-在索引列中减去子列

7 年前

marco · Python-Pandas-结合str.contains使用横截面

7 年前

CDVC23 · 对多索引熊猫数据帧上的重复行求和

7 年前

pylipp · Pandas-读取多索引文件的特定列

7 年前

jbssm · 从具有多索引的数据帧中删除特定行

7 年前

Kang · 熊猫多索引数据帧中按级别对列求和

7 年前

BENY · 使用df。对多索引的查询给出了UndefinedVariableError

7 年前

Jeff Tilton · Pandas根据另一个数据帧mith多索引设置元素样式

7 年前