代码之家  ›  专栏  ›  技术社区  ›  costebk08

R中与日期相关的复杂值和累积值的总和

  •  0
  • costebk08  · 技术社区  · 1 周前

    我有如下数据: df<-tibble( date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"), val1 = c(105, 105, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150), val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150), diff = val1-val2 )

    我正在努力制作以下内容: output<-tibble( date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"), val1 = c(105, 105, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150), val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150), diff = val1-val2, diff_calc = c(0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 5, 22, 22, 22) )

    哪里 diff_calc 是中以前唯一值的累积总和 diff ,其中求和从哪里开始 diff =0表示上一个 diff 值,并复制到 diff 为0,并且上一个是唯一的 diff 值再次累积求和。

    我尝试过不同的滞后和衔接组合,但在这里我真的很吃力。谢谢

    3 回复  |  直到 1 周前
        1
  •  2
  •   Melissa Key    1 周前

    这与@jblood94的答案非常相似,但可能会简单一点

    df |> 
      mutate(
        diff_calc = cummax(cumsum(diff * !duplicated(diff)) * (diff == 0)),
      )
    
    
    
        2
  •  2
  •   jblood94    1 周前
    df$diff_calc <- with(df, cummax(cumsum(c(TRUE, diff(diff) != 0)*diff)*(diff == 0 & c(0, diff[-nrow(df)]) != 0)))
    df
    #> # A tibble: 14 × 5
    #>    date        val1  val2  diff diff_calc
    #>    <date>     <dbl> <dbl> <dbl>     <dbl>
    #>  1 2021-01-01   105   100     5         0
    #>  2 2021-02-01   105   100     5         0
    #>  3 2021-03-01   105   100     5         0
    #>  4 2021-04-01   125   125     0         5
    #>  5 2021-05-01   125   125     0         5
    #>  6 2021-06-01   125   125     0         5
    #>  7 2021-07-01   125   125     0         5
    #>  8 2021-08-01   132   125     7         5
    #>  9 2021-09-01   132   125     7         5
    #> 10 2021-10-01   132   125     7         5
    #> 11 2021-11-01   135   125    10         5
    #> 12 2021-12-01   150   150     0        22
    #> 13 2022-01-01   150   150     0        22
    #> 14 2022-02-01   150   150     0        22
    
        3
  •  0
  •   Friede    1 周前

    看起来这是有效的,但肯定会有更好的解决方案。

    library(timetk)
    library(tidyverse)
    
    df1 <- df %>% 
      filter(diff != 0) %>% 
      group_by(diff) %>%
      top_n(1, date) %>%
      ungroup %>%
      
      mutate(
        diff_sum=cumsum(diff),
        date1=lead(date),
        date_diff=interval(date, date1) %/% months(1),
        date2 = date %+time% "1 month")%>% 
      
      filter(date_diff !=1 | is.na(date_diff)) %>%
      select(date2, diff_sum) %>%
      
      pad_by_time(
        .date_var = date2,
        .by = "month", 
        .start_date = "2021-01-01", 
        .end_date = "2022-02-01") %>% 
      
      fill(diff_sum, .direction = "down")
    
    df2 <- df %>%
      left_join(df1, by = c("date" = "date2"))