我的追求:
我有一个简单的问题
DataFrame
我希望通过搜索提取视图的位置
list
searches
. 例子:
import pandas as pd
data = {k: [v+str(i) for i in range(10)] for k, v in zip(('OrderNo','Name', 'Useless','Description'),('1000','Product ', 'Junk ','Short Desc '))}
df = pd.DataFrame(data)
df.loc[2:6, ('Useless',)] = pd.np.nan
# to mock some nan data in my real one.
因而发生的
df
:
OrderNo Name Useless Description
0 10000 Product 0 Junk 0 Short Desc 0
1 10001 Product 1 Junk 1 Short Desc 1
2 10002 Product 2 Nan Short Desc 2
3 10003 Product 3 Nan Short Desc 3
4 10004 Product 4 Nan Short Desc 4
5 10005 Product 5 Nan Short Desc 5
6 10006 Product 6 Nan Short Desc 6
7 10007 Product 7 Junk 7 Short Desc 7
8 10008 Product 8 Junk 8 Short Desc 8
9 10009 Product 9 Junk 9 Short Desc 9
现在我想找一个
列表
的
OrderNos
像这样:
searches = ['10005','10009','10003','10000']
我正试图看到这样的景象:
OrderNo Name Useless Description
5 10005 Product 5 Nan Short Desc 5
9 10009 Product 9 Junk 9 Short Desc 9
3 10003 Product 3 Nan Short Desc 3
0 10000 Product 0 Junk 0 Short Desc 0
因此,我最终可以将视图转换为以下内容(注意,我删除了一些无用的列):
0 1 2 3
OrderNo 10005 10009 10003 10000
Name Product 5 Product 9 Product 3 Product 0
Description Short Desc 5 Short Desc 9 Short Desc 3 Short Desc 0
我所尝试的:
This great question/answer
搜查
,但返回的视图不符合我的顺序:
found = df.loc[df['OrderNo'].isin(searches)]
OrderNo Name Useless Description
0 10000 Product 0 Junk 0 Short Desc 0
3 10003 Product 3 Nan Short Desc 3
5 10005 Product 5 Nan Short Desc 5
9 10009 Product 9 Junk 9 Short Desc 9
['my_sort']
到
found
因此,我可以根据列表重新排序:
found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found.sort_values(by='my_sort', inplace=True)
# For now assume index will always be matched and ValueError will be handled.
# This detail is not critical
而这
有点
pandas
SettingWithCopyWarning
到处都是,叫我用
.loc[row_indexer,col_indexer] = ...
仍然
建立
抛出相同的,所以我怀疑问题来自搜索。最后我把它包装成了一个新的
要不再看到警告,请执行以下操作:
found = pd.DataFrame(df.loc[df['OrderNo'].isin(searches)])
found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found = found[columns].T
虽然这是可行的,但我还是忍不住觉得这很复杂,效率也不高,因为我不得不引入一个新的列来进行排序,然后再次删除。我研究了一些相关的函数,如
reindex
where
和
dropna
nan
对象),但它们似乎都无法实现我的目标。