代码之家 › 专栏 › 技术社区 › Dasi

Pandas.loc返回序列或浮点数不一致

indexing dataframe pandas python

Dasi · 技术社区 · 10 月前

我正在编写一些代码,在pandas DataFrame中使用.loc时遇到了一些意外行为,具体取决于数据帧本身的长度。希望就正在发生的事情以及如何避免产生不一致的产出提供一些指导。

首先,我正在使用Python 3.11和pandas 2.2.2版本。问题是,根据数据帧长度,.loc将返回一个单项序列或float64对象。下面是我制作的一些虚拟示例的复制品。

首先,当考虑具有多索引的较长数据帧时,返回具有.loc的Series

df = pd.DataFrame.from_dict(
    {'ix1': ['asd', 'asd', 'asd', 'qwe', 'qwe', 'qwe', 'qwe', 'asd', 'qwe', 'asd', 'asd', 'qwe', 'asd', 'asd', 'asd', 'asd', 'qwe', 'qwe', 'qwe', 'qwe', 'asd', 'qwe', 'qwe', 'asd', 'qwe', 'qwe', 'qwe', 'asd', 'asd', 'asd', 'bar', 'qwe', 'qwe', 'asd', 'qwe', 'asd'],
     'ix2': ['sdf', 'bar', 'rty', 'fgh', 'cvb', 'cvb', 'vbn', 'bnm', 'jkl', 'ewq', 'uio', 'uio', 'wer', 'dsa', 'vbn', 'cxz', 'sdf', 'iuo', 'bar', 'bvc', 'fgh', 'rty', 'gfd', 'cvb', 'wer', 'bnm', 'ewq', 'tre', 'uyt', 'jhg', 'foo', 'dsa', 'mnb', 'jkl', 'iuy', 'lkj'],
     'value': [float(i) for i in range(1, 37)]})

>>> df[['ix1', 'ix2', 'value']].set_index(['ix1', 'ix2']).loc[('bar', 'foo'), 'value']
# <input>:1: PerformanceWarning: indexing past lexsort depth may impact performance.
# ix1  ix2
# bar  foo   31.00
# Name: value, dtype: float64

返回PerformanceWarning,因为索引未排序,且序列值为float64(不需要)

更短的数据帧,相同的结构,相同的索引,执行的代码行相同,返回float64代替

df = pd.DataFrame.from_dict(
    {'ix1': ['foo', 'foo', 'foo', 'foo', 'foo', 'foo', 'foo', 'bar'],
     'ix2': ['tyu', 'fgh', 'vbn', 'jkl', 'foo', 'asd', 'qwe', 'foo'],
     'value': [float(i) for i in range(1, 9)]})

>>> df[['ix1', 'ix2', 'value']].set_index(['ix1', 'ix2']).loc[('bar', 'foo'), 'value']
# np.float64(8.0)

只需返回一个float64(根据需要)

这在后面的代码中造成了麻烦,因为我需要浮点数来执行一些计算,而且我似乎不知道该怎么做才能生成一致的输出。

1 回复 | 直到 10 月前

mozway 10 月前

你的第二个例子没有重复的组合 ix1 / ix2 ,这防止了问题的发生。

如果你总是想要一个浮点数,我会用:

cols = ['ix1', 'ix2']
df.drop_duplicates(cols).set_index(cols)['value'].loc[('bar', 'foo')]

输出: 31.0

推荐文章

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

4 月前

Cam · Pandas列表日期到日期时间

4 月前

jjkennedy · Pandas文本文件导入:当每个文件中存在多个表时,自动选择1个表

5 月前

Sun Jar · 在另一个系列中查找当前df值的索引,并将其添加到列中

5 月前

dietzi96 · Pandas DataFrame.to_sql随机和静默地失败,没有错误消息

5 月前

Bijan · Pandas批量更新帐户字符串

5 月前

Kernel · TypeError:Index.reindex()收到意外的关键字参数fill_value'

5 月前

Kernel · 进入熊猫的定义。系列super().reindex

6 月前

adventurous_chip_55 · 如何引爆柱子

6 月前

RKIDEV · Panda迭代行并将第n行值乘以下一(n+1)行值

6 月前