代码之家  ›  专栏  ›  技术社区  ›  luciano

在R中查找条纹的第一个和最后一个日期

  •  0
  • luciano  · 技术社区  · 7 年前

    这是我的数据帧:

    date <- as.Date(c("1993-09-21", "1994-02-12", "1994-02-23", "1994-05-14", "1994-08-18", "1994-08-25", "1994-08-29", "1994-09-17", "1994-10-16", "1994-10-16", "1994-10-22", "1994-10-26", "1994-12-26", "1995-04-12", "1995-05-04", "1995-06-20", "1995-07-11", "1995-07-27", "1995-08-14", "1995-08-15", "1995-08-22", "1995-08-27", "1995-08-27", "1995-08-28", "1995-08-30", "1995-08-30", "1995-09-03", "1995-09-03", "1995-09-03", "1995-09-15"))
    
    value <- c(2, 1, 1, 1, 2, 1, 2, 4, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1)
    
    df <- data.frame(date, value)
    
    df$value.equals.1 <- df$value == 1
    

    我需要两件事:(1)每个连续数值的第一个和最后一个日期为1(2)每个连续数值的长度为1。

    我已经用我需要的东西对数据框进行了注释。如何在R中实现这一点?

    enter image description here

    1 回复  |  直到 7 年前
        1
  •  1
  •   akrun    7 年前

    我们可以用 rleid 从…起 data.table 。使用创建分组变量 rleid “on”值。等于。1”,根据“值”将“日期”子集。等于。1”,并提取按“grp”分组的第一个和最后一个“日期”

    library(data.table)
    setDT(df)[, date[value.equals.1], .(grp  = rleid(value.equals.1))
          ][, .(date  = c(V1[1], V1[.N]), n = .N), by = grp][, grp := NULL][]
    #          date n
    # 1: 1994-02-12 3
    # 2: 1994-05-14 3
    # 3: 1994-08-25 1
    # 4: 1994-08-25 1
    # 5: 1994-10-22 1
    # 6: 1994-10-22 1
    # 7: 1994-12-26 8
    # 8: 1995-08-15 8
    # 9: 1995-08-27 3
    #10: 1995-08-30 3
    #11: 1995-09-03 2
    #12: 1995-09-15 2
    

    或者这可以用 tidyverse

    library(dplyr)
    df %>%
       group_by(grp = rleid(value.equals.1)) %>%
       filter(all(value.equals.1)) %>%
       mutate(n = n()) %>%
       slice(c(1, n())) %>%
       ungroup %>% 
       select(date, n)
    # A tibble: 12 x 2
    #   date           n
    #   <date>     <int>
    # 1 1994-02-12     3
    # 2 1994-05-14     3
    # 3 1994-08-25     1
    # 4 1994-08-25     1
    # 5 1994-10-22     1
    # 6 1994-10-22     1
    # 7 1994-12-26     8
    # 8 1995-08-15     8
    # 9 1995-08-27     3
    #10 1995-08-30     3
    #11 1995-09-03     2
    #12 1995-09-15     2
    

    或使用 rle 从…起 base R 创建组的步骤

    grp <- inverse.rle(within.list(rle(df$value.equals.1), values <- seq_along(values)))
    do.call(c, lapply(with(df, split(date[value.equals.1], 
            grp[value.equals.1])), function(x) c(x[1], x[length(x)])))