代码之家 › 专栏 › 技术社区 › Sterling Butters

熊猫数据帧:无法在分组序列上迭代

pandas-groupby dataframe loops pandas python

Sterling Butters · 技术社区 · 7 年前

所以我有以下熊猫系列 grouped :

                               Amount
Ticker Unit   Date       Time        
FLWS   SHARES 2019-01-03 -       20.0
              2019-01-13 -       20.0
PIH    SHARES 2019-01-13 -      -10.0
       VALUE  2019-01-03 -      -25.0

*我想重设索引以删除“amount”作为一个多索引并“将其放下”,但随后分组将变为未打包状态,这仅在序列转换为数据帧之后。

我正在尝试对组进行迭代:

    for ticker, action, date, time in grouped:
        print(ticker)
        print(action)
        print(date)
        print(time)

但我得到以下信息: TypeError: 'float' object is not iterable

其他信息: 我从以下内容中获得所提到的数据帧:

orders = pd.DataFrame(OrderedDict([
        ('Ticker', tickers),
        ('Action', actions),
        ('Unit', units),
        ('Amount', amounts),
        ('Date', dates),
        ('Time', times),
    ]))

    df_orders = pd.DataFrame(orders)
if not df_orders.empty:
    df_orders.loc[df_orders['Action'] == 'SELL', 'Amount'] *= -1
    grouped = df_orders.groupby(['Ticker', 'Unit', 'Date', 'Time'])['Amount'].apply(np.sum) 

    print(grouped)

哪里 tickers , actions , units 等等都是清单

编辑: 我认为最好展示我想要处理采集数据的逻辑。

total = 0
for ticker in tickers: 
    for date in dates:    
        if unit=='SHARES':
            total += some_function(ticker, date)
        else:
            total += some_function(ticker, date)

注意,在这种情况下,tickers中的每个ticker都是唯一的。那么,您将如何以这种方式迭代分组序列呢?

1 回复 | 直到 7 年前

Henry Woody 7 年前

问题是,只需重复 grouped 本身,您迭代序列中的值,这些值只是 Amount 列。还要注意 ticker , action , date 和 time 是序列的索引,而不是其值。因此,您试图分配 ticker, action, date, time 一个浮球。因此错误 TypeError: 'float' object is not iterable . 在python 3中,错误更有用,因为 TypeError: cannot unpack non-iterable float object .

要修复此问题,应使用 iteritems ( docs )熊猫系列类的方法。这将迭代序列中的每个项,并在每次迭代中以元组的形式返回索引和值。因为您有一个复合索引,所以该索引也将是一个元组,您可以使用如下方式将其解包成不同的值:

for (ticker, action, date, time), amount in grouped.iteritems():
    print(ticker)
    print(action)
    print(date)
    print(time)

编辑: [解决问题编辑。]

在您所提供的代码示例中,无论您如何调用,在某种意义上,这些代码标签都是唯一的。 some_function 在同一个断续器上可能会出现多次,因此断续器实际上不需要是唯一的。也许你能做的就是这样:

grouped = df_orders.groupby(['ticker', 'date', 'unit'])['amount'].agg(sum)

total = 0
for (ticker, date, unit), amount in grouped.iteritems():
    if unit == 'SHARES':
        total += share_function(ticker, date)
    else:
        total += other_function(ticker, date)