代码之家  ›  专栏  ›  技术社区  ›  VISHAL LIMGIRE

Pandas需要通过重采样或分组操作进行首次平均

  •  0
  • VISHAL LIMGIRE  · 技术社区  · 4 年前

    我想得到下面的预期输出。我如何使用groupby或重采样来获得每小时的平均摄氏度,但仍然将分钟值保留在measured_at列中?

    我的输入:

     measured_at                  celsius
    0 2020-05-19 01:13:40+00:00    15.00
    1 2020-05-19 01:14:40+00:00    16.50
    1 2020-05-20 02:13:26+00:00    30.00
    2 2020-05-20 02:14:57+00:00    15.35
    3 2020-05-20 02:15:19+00:00    14.00
    4 2020-05-20 12:06:39+00:00    20.00
    5 2020-05-21 03:13:07+00:00    15.50
    6 2020-05-22 12:09:37+00:00    15.00
    
    
    df['measured_at'] = pd.to_datetime(df.measured_at)
    df1 = df.resample('60T', on='measured_at')['celsius'].mean().dropna().reset_index()
    

    我的输出:

         measured_at    celsius
    0 2020-05-19 01:00:00+00:00  15.750000
    1 2020-05-20 02:00:00+00:00  19.783333
    2 2020-05-20 12:00:00+00:00  20.000000
    3 2020-05-21 03:00:00+00:00  15.500000
    4 2020-05-22 12:00:00+00:00  15.000000
    
    

    预期产量:

         measured_at    celsius
    0 2020-05-19 01:13:00+00:00  15.750000
    1 2020-05-20 02:13:00+00:00  19.783333
    2 2020-05-20 12:06:00+00:00  20.000000
    3 2020-05-21 03:13:00+00:00  15.500000
    4 2020-05-22 12:09:00+00:00  15.000000
    
    0 回复  |  直到 4 年前
        1
  •  1
  •   Isaac Ng    4 年前

    这是您用例的代码。

    我去掉了分钟和秒的部分,这样就可以对它们进行平均,并在重新采样后加起来。

    不确定+00:00是什么,如果是为了更高的精度,并且您需要它,您可以转换为微秒或纳秒。

    import pandas as pd
    from datetime import datetime
    
    # Convert to datetime object
    df['measured_at'] = df['measured_at'].apply(pd.to_datetime)
    
    # Extract minutes and seconds as total seconds
    df['seconds'] = df['measured_at'].apply(lambda x: (x.minute*60)+x.second)
    
    # Resample to periods of one hour
    df = df.resample('60T', on='measured_at').mean().dropna().reset_index()
    
    # Add back average minutes for each period
    df['measured_at'] = df['measured_at'] + pd.to_timedelta(df['seconds'].apply(int),'s')
    
    # Remove seconds column
    df = df.drop(columns='seconds')