代码之家  ›  专栏  ›  技术社区  ›  Lynn

分组,如果datetime超过某个时间,则创建一个“break”,在原始分组列(R,dplyr)中创建一个新值

  •  0
  • Lynn  · 技术社区  · 5 年前

      Subject      Folder     Message    Date
      A            Out                   9/9/2019 5:46:38 PM
      A            Out                   9/9/2019 5:46:40 PM
      A            Out                   9/9/2019 5:46:42 PM
      A            Out                   9/9/2019 5:46:43 PM
      A            Out                   9/9/2019 9:30:00 PM
      A            Out                   9/9/2019 9:30:01 PM
      B            Out                   9/9/2019 9:35:00 PM
      B            Out                   9/9/2019 9:35:01 PM
    

    我正在尝试按主题对其进行分组,找到持续时间,然后创建一个新的持续时间列。如果日期时间超过一定的时间,我也希望创建一个阈值。我的难题是在A组,时间从第四排的5:46到第五排的9:30。这在A组中给出了一个不准确的持续时间。我希望“打断”该时间并在该时间超过10分钟时在主题中创建新值(A1)时找到新的持续时间。我不确定我是否应该用一个循环来做这个?

     Subject   Duration   Group
     A         5 sec      outdata1
     A1        1 sec      outdata2
     B         1 sec      outdata3
    

    这是我的dput:

    structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L), .Label = c("A", "B"), class = "factor"), Folder = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Out", class = "factor"), 
    Message = c("", "", "", "", "", "", "", ""), Date = structure(1:8, .Label = c("9/9/2019 5:46:38 PM", 
    "9/9/2019 5:46:40 PM", "9/9/2019 5:46:42 PM", "9/9/2019 5:46:43 PM", 
    "9/9/2019 9:30:00 PM", "9/9/2019 9:30:01 PM", "9/9/2019 9:35:00 PM", 
    "9/9/2019 9:35:01 PM"), class = "factor")), row.names = c(NA, 
    -8L), class = "data.frame")
    

    我就是这么想的:

    thresh <- duration(10, units = "minutes")
    
    df %>%  
    mutate(Date = mdy_hms(Date)) %>% 
    transmute(Subject, Duration = diff = difftime(as.POSIXct(Date, format = 
    "%m/%d/%Y %I:%M:%S %p"),as.POSIXct(Date, 
    format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs")) %>% 
    ungroup %>% 
    distinct %>% 
    mutate(grp = str_c("Outdata", row_number()))
    
     mutate(delta = if_else(grp < thresh1, grp, NA_real_))
    

    如有任何帮助,我们将不胜感激

    1 回复  |  直到 5 年前
        1
  •  1
  •   Ronak Shah    5 年前

    我们可以计算连续的 Date 值创建新组,然后计算 min max 在每组中。

    library(dplyr)
    thresh <- 10
    
    df %>%  
      mutate(Date = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p")) %>%
      group_by(Subject, Group = cumsum(difftime(Date, 
                lag(Date, default = first(Date)), units = "mins") > thresh)) %>%
      summarise(Duration = difftime(max(Date), min(Date), units = "secs")) %>%
      ungroup %>%
      mutate(Group = paste0('outdata', row_number()))
    
    # A tibble: 3 x 3
    #  Subject Group    Duration
    #  <fct>   <chr>    <drtn>  
    #1 A       outdata1 5 secs  
    #2 A       outdata2 1 secs  
    #3 B       outdata3 1 secs