代码之家  ›  专栏  ›  技术社区  ›  Lynn

分组依据,并创建具有持续时间的新列

  •  1
  • Lynn  · 技术社区  · 5 年前

    我有数据集,df,

      Subject      Folder      Message    Date
      A            Out                   9/9/2019 5:46:38 PM
      A            Out                   9/9/2019 5:46:40 PM
      A            Out                   9/9/2019 5:46:42 PM
      B            Out                   9/9/2019 5:48:00 PM
      B            Out                   9/9/2019 5:48:01 PM
      C            Out                   9/10/2019 5:49:01 PM
    

    在创建新的持续时间列时,如何按主题对此进行分组,然后查找持续时间。 这是我想要的输出:

      Subject   Duration    Group
      A         4 sec      outdata1
      B         1 sec      outdata2
      C         0 sec      outdata3
    

             structure(list(Subject = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label =                 c("A", 
             "B", "C"), class = "factor"), Folder = structure(c(1L, 1L, 1L, 
             1L, 1L, 1L), .Label = "Out", class = "factor"), Message = c("", 
             "", "", "", "", ""), Date = structure(c(2L, 3L, 4L, 5L, 6L, 1L
             ), .Label = c("9/10/2019 5:49:01 PM", "9/9/2019 5:46:38 PM", 
             "9/9/2019 5:46:40 PM", "9/9/2019 5:46:42 PM", "9/9/2019 5:48:00 PM", 
             "9/9/2019 5:48:01 PM"), class = "factor")), row.names = c(NA, 
             -6L), class = "data.frame")
    

    df %>%  
    mutate(Date = mdy_hms(Date)) %>% 
    transmute(Subject, Duration = diff = difftime(as.POSIXct(Date, format = 
    "%m/%d/%Y %I:%M:%S %p"),as.POSIXct(Date, 
    format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs")) %>% 
    ungroup %>% 
    distinct %>% 
    mutate(grp = str_c("Outdata", row_number()))
    

    感谢您的帮助

    1 回复  |  直到 5 年前
        1
  •  3
  •   akrun    5 年前

    在这里,我们可以在 summarise

    library(dplyr)
    library(lubridate)
    library(stringr)
    df %>%
       mutate(Date = mdy_hms(Date)) %>% 
       group_by(Subject) %>%
       summarise(Duration = diff(range(Date))) %>% 
       mutate(grp = str_c("Outdata", row_number()))
    # A tibble: 3 x 3
    #  Subject Duration grp     
    #  <fct>   <drtn>   <chr>   
    #1 A       4 secs   Outdata1
    #2 B       1 secs   Outdata2
    #3 C       0 secs   Outdata3
    

    unit 使用 difftime

    df %>%
        mutate(Date = mdy_hms(Date)) %>%
        group_by(Subject) %>%
        summarise(Duration = difftime(max(Date), min(Date), unit = 'sec')) %>%
        mutate(grp = str_c("Outdata", row_number()))
    
        2
  •  2
  •   Ronak Shah    5 年前

    我们可以在R基地这样做:

    df$Date <- as.POSIXct(df$Date, format = "%m/%d/%Y %I:%M:%S %p")
    
    transform(aggregate(Date~Subject, df, function(x) 
               difftime(max(x), min(x), units = "secs")), 
              Group = paste0('outdata', seq_along(Subject)))
    
    #  Subject Date    Group
    #1       A   4  outdata1
    #2       B   1  outdata2
    #3       C   0  outdata3
    
    推荐文章