代码之家  ›  专栏  ›  技术社区  ›  Karn Kumar

带PANDAS数据帧的LOC功能警告

  •  1
  • Karn Kumar  · 技术社区  · 6 年前

    在处理一个这样的问题时,我遇到了一个警告错误 loc 具体细节如下:

    数据帧示例:

    第一个数据帧DF1:

    >>> data1 = {'Sample': ['Sample_A','Sample_D', 'Sample_E'],
    ...         'Location': ['Bangladesh', 'Myanmar', 'Thailand'],
    ...         'Year':[2012, 2014, 2015]}
    
    >>> df1 = pd.DataFrame(data1)
    >>> df1.set_index('Sample')
                Location  Year
    Sample
    Sample_A  Bangladesh  2012
    Sample_D     Myanmar  2014
    Sample_E    Thailand  2015
    

    第二个数据帧DF2:

    >>> data2 = {'Num': ['Value_1','Value_2','Value_3','Value_4','Value_5'],
    ...         'Sample_A': [0,1,0,0,1],
    ...         'Sample_B':[0,0,1,0,0],
    ...         'Sample_C':[1,0,0,0,1],
    ...         'Sample_D':[0,0,1,1,0]}
    >>> df2 = pd.DataFrame(data2)
    >>> df2.set_index('Num')
             Sample_A  Sample_B  Sample_C  Sample_D
    Num
    Value_1         0         0         1         0
    Value_2         1         0         0         0
    Value_3         0         1         0         1
    Value_4         0         0         0         1
    Value_5         1         0         1         0
    
    
    >>> samples
    ['Sample_A', 'Sample_D', 'Sample_E']
    

    当我走的时候 samples 为了保护列不受其影响,它可以按如下方式工作,但同时会产生警告。

    >>> df3 = df2.loc[:, samples]
    >>> df3
       Sample_A  Sample_D  Sample_E
    0         0         0       NaN
    1         1         0       NaN
    2         0         1       NaN
    3         0         1       NaN
    4         1         0       NaN
    

    警告:

    indexing.py:1472: FutureWarning:
    Passing list-likes to .loc or [] with any missing label will raise
    KeyError in the future, you can use .reindex() as an alternative.
    
    See the documentation here:
    https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
      return self._getitem_tuple(key)
    

    想知道如何更好地处理这个问题!

    1 回复  |  直到 6 年前
        1
  •  1
  •   jezrael    6 年前

    使用 reindex 像:

    df3 = df2.reindex(columns=samples)
    print (df3)
       Sample_A  Sample_D  Sample_E
    0         0         0       NaN
    1         1         0       NaN
    2         0         1       NaN
    3         0         1       NaN
    4         1         0       NaN
    

    或者如果只希望相交列使用 Index.intersection :

    df3 = df2[df2.columns.intersection(samples)]
    #alternative
    #df3 = df2[np.intersect1d(df2.columns, samples)]
    print (df3)
       Sample_A  Sample_D
    0         0         0
    1         1         0
    2         0         1
    3         0         1
    4         1         0