代码之家  ›  专栏  ›  技术社区  ›  r.ook jpp

熊猫:如何按值列表搜索并按相同顺序返回?

  •  2
  • r.ook jpp  · 技术社区  · 6 年前

    我的追求:

    我有一个简单的问题 DataFrame 我希望通过搜索提取视图的位置 list searches . 例子:

    import pandas as pd
    data = {k: [v+str(i) for i in range(10)] for k, v in zip(('OrderNo','Name', 'Useless','Description'),('1000','Product ', 'Junk ','Short Desc '))}
    df = pd.DataFrame(data)
    df.loc[2:6, ('Useless',)] = pd.np.nan
    # to mock some nan data in my real one.
    

    因而发生的 df :

      OrderNo       Name Useless   Description
    0   10000  Product 0  Junk 0  Short Desc 0
    1   10001  Product 1  Junk 1  Short Desc 1
    2   10002  Product 2     Nan  Short Desc 2
    3   10003  Product 3     Nan  Short Desc 3
    4   10004  Product 4     Nan  Short Desc 4
    5   10005  Product 5     Nan  Short Desc 5
    6   10006  Product 6     Nan  Short Desc 6
    7   10007  Product 7  Junk 7  Short Desc 7
    8   10008  Product 8  Junk 8  Short Desc 8
    9   10009  Product 9  Junk 9  Short Desc 9
    

    现在我想找一个 列表 OrderNos 像这样:

    searches = ['10005','10009','10003','10000']
    

    我正试图看到这样的景象:

      OrderNo       Name Useless   Description
    5   10005  Product 5     Nan  Short Desc 5
    9   10009  Product 9  Junk 9  Short Desc 9
    3   10003  Product 3     Nan  Short Desc 3
    0   10000  Product 0  Junk 0  Short Desc 0
    

    因此,我最终可以将视图转换为以下内容(注意,我删除了一些无用的列):

                            0             1             2             3
    OrderNo             10005         10009         10003         10000
    Name            Product 5     Product 9     Product 3     Product 0
    Description  Short Desc 5  Short Desc 9  Short Desc 3  Short Desc 0
    

    我所尝试的:

    This great question/answer 搜查 ,但返回的视图不符合我的顺序:

    found = df.loc[df['OrderNo'].isin(searches)]
    
      OrderNo       Name Useless   Description
    0   10000  Product 0  Junk 0  Short Desc 0
    3   10003  Product 3     Nan  Short Desc 3
    5   10005  Product 5     Nan  Short Desc 5
    9   10009  Product 9  Junk 9  Short Desc 9
    

    ['my_sort'] found 因此,我可以根据列表重新排序:

    found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
    found.sort_values(by='my_sort', inplace=True)
    # For now assume index will always be matched and ValueError will be handled.
    # This detail is not critical
    

    而这 有点 pandas SettingWithCopyWarning 到处都是,叫我用 .loc[row_indexer,col_indexer] = ... 仍然 建立 抛出相同的,所以我怀疑问题来自搜索。最后我把它包装成了一个新的 要不再看到警告,请执行以下操作:

    found = pd.DataFrame(df.loc[df['OrderNo'].isin(searches)])
    found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
    found = found[columns].T
    

    虽然这是可行的,但我还是忍不住觉得这很复杂,效率也不高,因为我不得不引入一个新的列来进行排序,然后再次删除。我研究了一些相关的函数,如 reindex where dropna nan 对象),但它们似乎都无法实现我的目标。

    1 回复  |  直到 6 年前
        1
  •  3
  •   jpp    6 年前

    set_index loc + T

    您可以利用索引功能:

    df = df.set_index('OrderNo')
    
    searches = ['10005','10009','10003','10000']
    
    df_search = df.loc[searches]
    
    print(df_search)
    
              Description       Name Useless
    OrderNo                                 
    10005    Short Desc 5  Product 5     NaN
    10009    Short Desc 9  Product 9  Junk 9
    10003    Short Desc 3  Product 3     NaN
    10000    Short Desc 0  Product 0  Junk 0
    
    res = df_search.T
    
    print(res)
    
    OrderNo             10005         10009         10003         10000
    Description  Short Desc 5  Short Desc 9  Short Desc 3  Short Desc 0
    Name            Product 5     Product 9     Product 3     Product 0
    Useless               NaN        Junk 9           NaN        Junk 0
    

    如果需要编号的列标签:

    print(df_search.reset_index().T)
    
                            0             1             2             3
    OrderNo             10005         10009         10003         10000
    Description  Short Desc 5  Short Desc 9  Short Desc 3  Short Desc 0
    Name            Product 5     Product 9     Product 3     Product 0
    Useless               NaN        Junk 9           NaN        Junk 0