代码之家  ›  专栏  ›  技术社区  ›  f.lechleitner

基于另一列计算列的时差

  •  0
  • f.lechleitner  · 技术社区  · 6 年前

    > df
                      time region place action
    1  2019-01-14 16:00:08      A     G  START
    2  2019-01-14 16:00:08      A     I   STOP
    3  2019-01-14 16:00:16      A     H  START
    4  2019-01-14 16:00:16      A     G   STOP
    5  2019-01-14 16:01:40      A     H   STOP
    6  2019-01-14 16:01:40      A     G  START
    7  2019-01-14 16:01:54      A     G   STOP
    8  2019-01-14 16:02:21      A     D  START
    9  2019-01-14 16:02:31      A     C  START
    10 2019-01-14 16:02:54      A     D   STOP
    11 2019-01-14 16:03:12      A     C   STOP
    12 2019-01-14 16:03:13      A     E  START
    13 2019-01-14 16:03:34      A     E   STOP
    14 2019-01-14 16:03:34      A     A  START
    15 2019-01-14 16:04:12      A     A   STOP
    16 2019-01-14 16:04:12      A     E  START
    17 2019-01-14 16:04:17      A     E   STOP
    18 2019-01-14 16:04:55      A     F  START
    19 2019-01-14 16:05:08      A     B  START
    20 2019-01-14 16:05:08      A     F   STOP
    

    我要每个地区的起点和终点的区别。(例如,第1行的时间和第4行的时间之间的差异,因为这是这个地方最近的一站)。到目前为止,我得到的是:我试图在ifelse构造中找到下一行,在action列中加上一个停止符。

    df %>% group_by(region, place) %>%
      mutate(difference = ifelse(action == "STOP", NA, time[which(action == "STOP")[which.max(which(action == "STOP") > row_number())]] - time))
    
    # A tibble: 20 x 5
    # Groups:   region, place [9]
       time                region place action difference
       <dttm>              <fct>  <fct> <chr>       <dbl>
     1 2019-01-14 16:00:08 A      g     START        8.52
     2 2019-01-14 16:00:08 A      i     STOP        NA   
     3 2019-01-14 16:00:16 A      h     START       84.2 
     4 2019-01-14 16:00:16 A      g     STOP        NA   
     5 2019-01-14 16:01:40 A      h     STOP        NA   
     6 2019-01-14 16:01:40 A      g     START      -84.2 
     7 2019-01-14 16:01:54 A      g     STOP        NA   
     8 2019-01-14 16:02:21 A      d     START       32.9 
     9 2019-01-14 16:02:31 A      c     START       40.8 
    10 2019-01-14 16:02:54 A      d     STOP        NA   
    11 2019-01-14 16:03:12 A      c     STOP        NA   
    12 2019-01-14 16:03:13 A      e     START       21.3 
    13 2019-01-14 16:03:34 A      e     STOP        NA   
    14 2019-01-14 16:03:34 A      a     START       38.0 
    15 2019-01-14 16:04:12 A      a     STOP        NA   
    16 2019-01-14 16:04:12 A      e     START      -38.5 
    17 2019-01-14 16:04:17 A      e     STOP        NA   
    18 2019-01-14 16:04:55 A      f     START       13.4 
    19 2019-01-14 16:05:08 A      b     START       NA   
    20 2019-01-14 16:05:08 A      f     STOP        NA  
    

    数据:

    df <- structure(
      list(
        time = structure(
          c(
            1547478008.024,
            1547478008.225,
            1547478016.168,
            1547478016.542,
            1547478100.374,
            1547478100.758,
            1547478114.589,
            1547478141.86,
            1547478151.972,
            1547478174.757,
            1547478192.723,
            1547478193.077,
            1547478214.37,
            1547478214.562,
            1547478252.523,
            1547478252.907,
            1547478257.458,
            1547478295.109,
            1547478308.358,
            1547478308.547
          ),
          class = c("POSIXct", "POSIXt"),
          tzone = ""
        ),
        region = structure(
          c(
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L,
            1L
          ),
          .Label = "A",
          class = "factor"
        ),
        place = structure(
          c(
            7L,
            9L,
            8L,
            7L,
            8L,
            7L,
            7L,
            4L,
            3L,
            4L,
            3L,
            5L,
            5L,
            1L,
            1L,
            5L,
            5L,
            6L,
            2L,
            6L
          ),
          .Label = c("a",
                     "b", "c", "d", "e", "f", "g", "h", "i"),
          class = "factor"
        ),
        action = c(
          "START",
          "STOP",
          "START",
          "STOP",
          "STOP",
          "START",
          "STOP",
          "START",
          "START",
          "STOP",
          "STOP",
          "START",
          "STOP",
          "START",
          "STOP",
          "START",
          "STOP",
          "START",
          "START",
          "STOP"
        )
      ),
      row.names = c(NA, 20L),
      class = "data.frame"
    )
    
    0 回复  |  直到 6 年前
        1
  •  2
  •   iod    6 年前

    假设每次启动后和下一次启动前总是恰好有一个停止,则这将起作用:

    df %>% group_by(region, place) %>% 
      arrange(time) %>% 
      mutate(difference=ifelse(action=="STOP", NA, difftime(lead(time),time,units="secs")))
    
    # A tibble: 20 x 5
    # Groups:   region, place [9]
       time                region place action difference
       <dttm>              <fct>  <fct> <chr>       <dbl>
     1 2019-01-14 10:00:08 A      g     START        8.52
     2 2019-01-14 10:00:08 A      i     STOP        NA   
     3 2019-01-14 10:00:16 A      h     START       84.2 
     4 2019-01-14 10:00:16 A      g     STOP        NA   
     5 2019-01-14 10:01:40 A      h     STOP        NA   
     6 2019-01-14 10:01:40 A      g     START       13.8 
     7 2019-01-14 10:01:54 A      g     STOP        NA   
     8 2019-01-14 10:02:21 A      d     START       32.9 
     9 2019-01-14 10:02:31 A      c     START       40.8 
    10 2019-01-14 10:02:54 A      d     STOP        NA   
    11 2019-01-14 10:03:12 A      c     STOP        NA   
    12 2019-01-14 10:03:13 A      e     START       21.3 
    13 2019-01-14 10:03:34 A      e     STOP        NA   
    14 2019-01-14 10:03:34 A      a     START       38.0 
    15 2019-01-14 10:04:12 A      a     STOP        NA   
    16 2019-01-14 10:04:12 A      e     START        4.55
    17 2019-01-14 10:04:17 A      e     STOP        NA   
    18 2019-01-14 10:04:55 A      f     START       13.4 
    19 2019-01-14 10:05:08 A      b     START       NA   
    20 2019-01-14 10:05:08 A      f     STOP        NA 
    

    ifelse 要确保开始后的下一行是停止,否则给出NA:

    df %>% group_by(region, place) %>% 
      arrange(time) %>% 
      mutate(difference=ifelse(action=="STOP", NA,
        ifelse(lead(action)=="STOP",difftime(lead(time),time,units="secs"),NA)))