代码之家 › 专栏 › 技术社区 › VISHAL LIMGIRE

Pandas需要通过重采样或分组操作进行首次平均

pandas python-3.x python

VISHAL LIMGIRE · 技术社区 · 4 年前

我想得到下面的预期输出。我如何使用groupby或重采样来获得每小时的平均摄氏度,但仍然将分钟值保留在measured_at列中?

我的输入:

 measured_at                  celsius
0 2020-05-19 01:13:40+00:00    15.00
1 2020-05-19 01:14:40+00:00    16.50
1 2020-05-20 02:13:26+00:00    30.00
2 2020-05-20 02:14:57+00:00    15.35
3 2020-05-20 02:15:19+00:00    14.00
4 2020-05-20 12:06:39+00:00    20.00
5 2020-05-21 03:13:07+00:00    15.50
6 2020-05-22 12:09:37+00:00    15.00


df['measured_at'] = pd.to_datetime(df.measured_at)
df1 = df.resample('60T', on='measured_at')['celsius'].mean().dropna().reset_index()

我的输出:

     measured_at    celsius
0 2020-05-19 01:00:00+00:00  15.750000
1 2020-05-20 02:00:00+00:00  19.783333
2 2020-05-20 12:00:00+00:00  20.000000
3 2020-05-21 03:00:00+00:00  15.500000
4 2020-05-22 12:00:00+00:00  15.000000

预期产量:

     measured_at    celsius
0 2020-05-19 01:13:00+00:00  15.750000
1 2020-05-20 02:13:00+00:00  19.783333
2 2020-05-20 12:06:00+00:00  20.000000
3 2020-05-21 03:13:00+00:00  15.500000
4 2020-05-22 12:09:00+00:00  15.000000

0 回复 | 直到 4 年前

Isaac Ng 4 年前

这是您用例的代码。

我去掉了分钟和秒的部分,这样就可以对它们进行平均,并在重新采样后加起来。

不确定+00:00是什么,如果是为了更高的精度,并且您需要它,您可以转换为微秒或纳秒。

import pandas as pd
from datetime import datetime

# Convert to datetime object
df['measured_at'] = df['measured_at'].apply(pd.to_datetime)

# Extract minutes and seconds as total seconds
df['seconds'] = df['measured_at'].apply(lambda x: (x.minute*60)+x.second)

# Resample to periods of one hour
df = df.resample('60T', on='measured_at').mean().dropna().reset_index()

# Add back average minutes for each period
df['measured_at'] = df['measured_at'] + pd.to_timedelta(df['seconds'].apply(int),'s')

# Remove seconds column
df = df.drop(columns='seconds')