代码之家  ›  专栏  ›  技术社区  ›  jovicbg

如何在python中转换时间列并查找具有条件的时间增量

  •  1
  • jovicbg  · 技术社区  · 8 年前

    我有一个列Time,它是非null对象,我无法将其转换为timedelta或datetime。

         Time             msg
    12:29:36.306000      Setup
    12:29:36.507000      Alerting
    12:29:38.207000      Service
    12:29:39.194000      Setup
    12:30:05.773000      Alerting
    12:30:06.205000      Service
    12:32:07.315000      Setup
    12:32:17.194000      Service
    12:32:26.889000      Setup
    12:36:06.274000      Alerting
    12:36:08.523000      Service
    12:37:59.200000      Setup
    12:47:10.652000      Alerting
    12:47:43.921000      Setup
    

    当我键入df时。info(),我得到一个“Time”列是非null对象,我无法将其转换为timedelta或datetime(很明显,为什么我不能这样做)。那么,找到连续msg(时间增量)之间差异的解决方案是什么,但如果是时间增量<5秒后通过。

         Time             msg         diff
    12:29:36.306000      Setup         
    12:29:36.507000      Alerting      
    12:29:38.207000      Service
    12:29:39.194000      Setup
    12:30:05.773000      Alerting
    12:30:06.205000      Service
    12:32:07.315000      Setup
    12:32:17.194000      Service
    12:32:26.889000      Setup
    12:36:06.274000      Alerting    6.30***
    12:36:08.523000      Service     
    12:37:59.200000      Setup
    12:47:10.652000      Alerting    11.02***    
    12:47:43.921000      Setup      
    

    我试过这样的方法:

    df['diff'] = (df['Time']df['Time'].shift()).fillna(0)
    

    但我不知道写5秒间隔的条件。

    1 回复  |  直到 8 年前
        1
  •  5
  •   jezrael    8 年前

    我认为首先需要转换为 str 然后打电话 to_timedelta .

    然后获取 diff 5s .

    mask

    df['Time'] = pd.to_timedelta(df['Time'].astype(str))
    
    df['diff'] = df['Time'].diff()
    df['mask'] = df['Time'].diff() > pd.Timedelta(5, unit='s')
    print (df)
                  Time       msg            diff   mask
    0  12:29:36.306000     Setup             NaT  False
    1  12:29:36.507000  Alerting 00:00:00.201000  False
    2  12:29:38.207000   Service 00:00:01.700000  False
    3  12:29:39.194000     Setup 00:00:00.987000  False
    4  12:30:05.773000  Alerting 00:00:26.579000   True
    5  12:30:06.205000   Service 00:00:00.432000  False
    6  12:32:07.315000     Setup 00:02:01.110000   True
    7  12:32:17.194000   Service 00:00:09.879000   True
    8  12:32:26.889000     Setup 00:00:09.695000   True
    9  12:36:06.274000  Alerting 00:03:39.385000   True
    10 12:36:08.523000   Service 00:00:02.249000  False
    11 12:37:59.200000     Setup 00:01:50.677000   True
    12 12:47:10.652000  Alerting 00:09:11.452000   True
    13 12:47:43.921000     Setup 00:00:33.269000   True
    

    df['Time'] = pd.to_timedelta(df['Time'])
    diff = df['Time'].diff()
    mask = df['Time'].diff() > pd.Timedelta(5, unit='s')
    df['new'] = diff.where(mask)
    print (df)
                  Time       msg             new
    0  12:29:36.306000     Setup             NaT
    1  12:29:36.507000  Alerting             NaT
    2  12:29:38.207000   Service             NaT
    3  12:29:39.194000     Setup             NaT
    4  12:30:05.773000  Alerting 00:00:26.579000
    5  12:30:06.205000   Service             NaT
    6  12:32:07.315000     Setup 00:02:01.110000
    7  12:32:17.194000   Service 00:00:09.879000
    8  12:32:26.889000     Setup 00:00:09.695000
    9  12:36:06.274000  Alerting 00:03:39.385000
    10 12:36:08.523000   Service             NaT
    11 12:37:59.200000     Setup 00:01:50.677000
    12 12:47:10.652000  Alerting 00:09:11.452000
    13 12:47:43.921000     Setup 00:00:33.269000