代码之家 › 专栏 › 技术社区 › steff

查找重复的DateTime索引值,并添加时间间隔使其唯一

python-3.8 pandas

0

steff · 技术社区 · 4 年前

数据:

n = 8
np.random.seed(42)
df = pd.DataFrame(index=[dt.datetime(2020,3,31,9,25) + dt.timedelta(seconds=x) 
                         for x in np.random.randint(0,10000,size=n).tolist()],
                  data=np.random.randint(0,100,size=(n, 2)),
                  columns=['price', 'volume']).sort_index()
df.index.name = 'timestamp'
df = df.append(df.iloc[[3,6]]+1)
df = df.append(df.iloc[3]+1)
df = df.append(df.iloc[3]).sort_index()

                  price volume
timestamp       
2020-03-31 09:32:46 413 805
2020-03-31 09:39:20 372 99
2020-03-31 10:38:46 385 191
2020-03-31 10:51:31 130 661
2020-03-31 10:51:31 131 662
2020-03-31 10:51:31 131 662
2020-03-31 10:51:31 130 661
2020-03-31 10:54:50 871 663
2020-03-31 11:00:34 308 769
2020-03-31 11:09:25 343 491
2020-03-31 11:09:25 344 492
2020-03-31 11:26:10 458 87

使用 df.loc[df.index.duplicated(keep=False)] 我可以找到具有非唯一索引的行。对于这些行,我希望在索引中添加1秒/(行数)的增量,以使索引单调递增。

所需的输出如下:

                          price volume
timestamp       
2020-03-31 09:32:46.000000  413 805
2020-03-31 09:39:20.000000  372 99
2020-03-31 10:38:46.000000  385 191
2020-03-31 10:51:31.000000  130 661
2020-03-31 10:51:31.250000  131 662
2020-03-31 10:51:31.750000  131 662
2020-03-31 10:51:31.000000  130 661
2020-03-31 10:54:50.000000  871 663
2020-03-31 11:00:34.000000  308 769
2020-03-31 11:09:25.000000  343 491
2020-03-31 11:09:25.500000  344 492
2020-03-31 11:26:10.000000  458 87

谢谢你的帮助!

0 回复 | 直到 4 年前

1

2

cs95 abhishek58g 4 年前

我们可以对索引进行分组,并创建一列以秒为单位递增的时间增量。

此解决方案会就地更新索引,但您可以使用 set_index 创建所需结果的副本。

g = df.groupby(level=0)
deltas = g.cumcount().div(g['price'].transform('size')).to_numpy()

df.index += pd.to_timedelta(deltas, unit='ms')

或者,作为一个返回副本的离谱单行:

df = (df.groupby(level=0)
        .cumcount()
        .div(g['price'].transform('size'))
        .apply(pd.to_timedelta, unit='s')
        .add(df.index)
        .pipe(df.set_index))

df

                         price  volume
2020-03-31 09:32:46.000     63      59
2020-03-31 09:39:20.000     99      23
2020-03-31 10:38:46.000     20      32
2020-03-31 10:51:31.000     52       1
2020-03-31 10:51:31.250     53       2
2020-03-31 10:51:31.500     53       2
2020-03-31 10:51:31.750     52       1
2020-03-31 10:54:50.000      2      21
2020-03-31 11:00:34.000     87      29
2020-03-31 11:09:25.000     37       1
2020-03-31 11:09:25.500     38       2
2020-03-31 11:26:10.000     74      87