代码之家 › 专栏 › 技术社区 › ytu

尝试在中包含多个索引时发生ValueError数据帧.pivot

pivot dataframe pandas python

3

ytu · 技术社区 · 7 年前

pandas: how to run a pivot with a multi-index? 但它不能解决我的问题。

给定以下数据框:

import pandas as pd
df = pd.DataFrame({
    "date": ["20180920"] * 6,
    "id": ["A123456789"] * 6,
    "test": ["a", "b", "c", "d", "e", "f"],
    "result": [70, 90, 110, "(-)", "(+)", 0.3],
    "ref": ["< 90", "70 - 100", "100 - 120", "(-)", "(-)", "< 1"]
})

我想把这个摊开 test 列中的值 result ,然后忽略 ref . 换句话说,所需的输出如下:

       date          id      a   b    c    d    e    f
0  20180920  A123456789     70  90  110  (-)  (+)  0.3

所以我试过了 df.pivot(index=["date", "id"], columns="test", values="result") ,但失败了 ValueError:传递的值的长度是6,索引意味着2 pivot_table documentation 但我不明白这是什么意思。有人能详细说明一下吗?

顺便说一句,我终于得到了我想要的输出 df.drop(columns="ref").set_index(["date", "id", "test"]).unstack(level=2)

2 回复 | 直到 7 年前

1

9

Community Mohan Dere 5 年前

pivot 可能有用,但代码有点疯狂:

df = (df.set_index(["date", "id"])
        .pivot(columns="test")['result']
        .reset_index()
        .rename_axis(None, axis=1)
     )
print (df)

       date          id   a   b    c    d    e    f
0  20180920  A123456789  70  90  110  (-)  (+)  0.3

关于您可以查看的文档 issue 16578 在熊猫中,0.24.0应该是 improved docs 或者是新的工作支持 MultiIndex ? 也有点不清楚 issue 8160 .

在我看来,您的上一个代码应该只改进了一点(与@Vaishali一样的解决方案)-create Series with MultiIndex 通过选择after set_index 以及 unstack level ,因为默认情况下是未堆叠的多索引的最后一级- Series.unstack

水平 :int、string或其列表,默认为最后一级

#all 3 return same output
df.set_index(["date", "id", "test"])['result'].unstack()
df.set_index(["date", "id", "test"])['result'].unstack(level=2)
df.set_index(["date", "id", "test"])['result'].unstack(level=-1)

2

15

Vaishali 7 年前

pivot不接受列列表作为索引,因此需要使用pivot表。这里使用first的聚合假设没有重复项。

pd.pivot_table(df,index=["date", "id"], columns="test", values="result", aggfunc= 'first')\
.reset_index().rename_axis(None, 1)

按照@piRsquared的建议,使用set_index和unstack并重命名_axis会更安全,

df.set_index(['date', 'id', 'test']).result.unstack()\
.reset_index().rename_axis(None, 1)

    date    id          a   b   c   d   e   f
20180920    A123456789  70  90  110 (-) (+) 0.3

3

Paul Rougieux 6 年前

pandas/issues/23955

def multiindex_pivot(df, columns=None, values=None):                                                                                                                        
    #https://github.com/pandas-dev/pandas/issues/23955                                                                                                                      
    names = list(df.index.names)                                                                                                                                            
    df = df.reset_index()                                                                                                                                                   
    list_index = df[names].values                                                                                                                                           
    tuples_index = [tuple(i) for i in list_index] # hashable                                                                                                                
    df = df.assign(tuples_index=tuples_index)                                                                                                                               
    df = df.pivot(index="tuples_index", columns=columns, values=values)                                                                                                     
    tuples_index = df.index  # reduced                                                                                                                                      
    index = pd.MultiIndex.from_tuples(tuples_index, names=names)                                                                                                            
    df.index = index                                                                                                                                                        
    return df                                                                                                                                                               

multiindex_pivot(df.set_index(['date', 'id']), columns='test', values='result')                                                                                            
Out[10]:                                                                                                                                                                            
test                  a   b    c    d    e    f                                                                                                                                     
date     id                                                                                                                                                                         
20180920 A123456789  70  90  110  (-)  (+)  0.3