代码之家  ›  专栏  ›  技术社区  ›  Lynn

基于R(dplyr,lubridate)中的多个条件创建开始和结束时间列

  •  1
  • Lynn  · 技术社区  · 5 年前

     Read      Box       ID      Time
     T         out               10/1/2019 9:00:01 AM
     T         out               10/1/2019 9:00:02 AM
     T         out               10/1/2019 9:00:03 AM
     T         out               10/1/2019 9:02:59 AM
     T         out               10/1/2019 9:03:00 AM
     F                           10/1/2019 9:05:00 AM
     T         out               10/1/2019 9:06:00 AM
     T         out               10/1/2019 9:06:02 AM
     T         in                10/1/2019 9:07:00 AM
     T         in                10/1/2019 9:07:02 AM
     T         out               10/1/2019 9:07:04 AM
     T         out               10/1/2019 9:07:05 AM
     T         out               10/1/2019 9:07:06 AM
               hello             10/1/2019 9:07:08 AM
    

    基于此数据集中的某些条件,我想创建startime列和endtime列。 当发生以下情况时,我想创建一个“starttime”:Read==“T”,Box==“out”,ID=>” 当此条件的第一个实例发生时,将生成starttime。例如,对于这个数据集,starttime将是10/1/2019 9:00:01 AM,因为这是我们首先看到所需条件的地方(Read=T,Box=out,ID=) 然而,当这些条件中的任何一个都不真实时,将创建结束时间。因此,第一个结束时间正好在第6行之前,时间是2019年10月1日上午9:03:00。我的最终目标是为此创建一个持续时间列。

    这是我想要的输出:

      starttime                    endtime                     duration
    
      10/01/2019 9:00:01 AM        10/01/2019 9:03:00 AM       179 secs
      10/1/2019 9:06:00 AM         10/1/2019 9:06:02 AM        2 secs
      10/1/2019 9:07:04 AM         10/1/2019 9:07:06 AM        2 secs
    

      structure(list(Read = structure(c(3L, 3L, 3L, 3L, 3L, 2L, 3L, 
      3L, 3L, 3L, 4L, 4L, 3L, 1L), .Label = c("", "F", "T", "T "), class = "factor"), 
      Box = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 2L, 2L, 
      3L, 3L, 3L, 1L), .Label = c("", "in", "out"), class = "factor"), 
      ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 2L), .Label = c("", "hello"), class = "factor"), 
      Time = structure(1:14, .Label = c("10/1/2019 9:00:01 AM", 
     "10/1/2019 9:00:02 AM", "10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", 
     "10/1/2019 9:03:00 AM", "10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", 
     "10/1/2019 9:06:02 AM", "10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", 
     "10/1/2019 9:07:04 AM", "10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", 
     "10/1/2019 9:07:08 AM"), class = "factor")), class = "data.frame", row.names = c(NA, 
     -14L))
    

    我认为总的来说,我必须创建一个循环。我相信我的思维过程是正确的,只是不确定如何制定代码。这就是我要尝试的:

     df2 <- mutate(df,
          Date = lubridate::mdy_hms(Date))
    
    
    
    
       for ( i in 2:nrow(df2))
        {
      if(df2$Read[[i]] == 'T')
    
         }
    

    我认为这可能是一个开始(只是把我的条件放在循环中,我不知道如何完成这个)

    1 回复  |  直到 5 年前
        1
  •  1
  •   Ronak Shah    5 年前

    你可以不用循环来做这个。使用 dplyr 因为使用管道很容易做多种事情。

    Time 列到 POSIXct cond 根据要检查的条件提供逻辑值的列,使用 康德 first last 时间 以及每组的差异。

    library(dplyr)
    
    df %>%
      mutate(Time = lubridate::mdy_hms(Time), 
             cond = Read == "T" & Box == "out" & ID == "", 
             grp = cumsum(!cond)) %>%
      filter(cond) %>%
      group_by(grp) %>%
      summarise(starttime = first(Time), 
                endtime = last(Time), 
                duration = difftime(endtime, starttime, units = "secs")) %>%
      select(-grp)
    
    # A tibble: 3 x 3
    #  starttime           endtime             duration
    #  <dttm>              <dttm>              <drtn>  
    #1 2019-10-01 09:00:01 2019-10-01 09:03:00 179 secs
    #2 2019-10-01 09:06:00 2019-10-01 09:06:02   2 secs
    #3 2019-10-01 09:07:04 2019-10-01 09:07:06   2 secs
    

    数据

    我整理了一下你的资料 df

    df <- structure(list(Read = c("T", "T", "T", "T", "T", "F", "T", "T", 
    "T", "T", "T", "T", "T", ""), Box = c("out", "out", "out", "out", 
    "out", "", "out", "out", "in", "in", "out", "out", "out", "hello"
    ), ID = c("", "", "", "", "", "", "", "", "", "", "", "", "", 
    ""), Time = c("10/1/2019 9:00:01 AM", "10/1/2019 9:00:02 AM", 
    "10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", "10/1/2019 9:03:00 AM", 
    "10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", "10/1/2019 9:06:02 AM", 
    "10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", "10/1/2019 9:07:04 AM", 
    "10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", "10/1/2019 9:07:08 AM"
    )), row.names = c(NA, -14L), class = "data.frame")
    
    推荐文章