代码之家  ›  专栏  ›  技术社区  ›  ah bon

通过提取数据帧列表括号中的内容来重命名多列

  •  0
  • ah bon  · 技术社区  · 3 年前

    给定如下数据和代码,我能够生成一个数据帧列表 dfs :

    library(data.table)
    library(purrr)
    library(glue)
    library(gt)
    library(tidyverse)
    library(purrr)
    
    df <- structure(list(id = c("M0000607", "M0000609", "M0000612"), `2021-08(actual)` = c(12.6, 
                                                                                           19.2, 8.3), `2021-09(actual)` = c(10.3, 17.3, 6.4), `2021-10(actual)` = c(8.9, 
                                                                                                                                                                     15.7, 5.3), `2021-11(actual)` = c(7.3, 14.8, 3.1), `2021-12(actual)` = c(6.1, 
                                                                                                                                                                                                                                              14.2, 3.5), `2021-08(pred)` = c(11.65443222, 14.31674997, 7.084180415
                                                                                                                                                                                                                                              ), `2021-09(pred)` = c(12.29810914, 17.7143733, 6.057927385), 
                         `2021-10(pred)` = c(9.619846116, 15.54553601, 6.525992602
                         ), `2021-11(pred)` = c(8.352097939, 13.97318204, 3.164682627
                         ), `2021-12(pred)` = c(6.113631596, 14.16243166, 3.288372517
                         ), `2021-08(error)` = c(2.082307066, 1.146759554, 0.687406723
                         ), `2021-09(error)` = c(1.631350383, 2.753457736, 2.952737781
                         ), `2021-10(error)` = c(0.945567783, 4.883250027, 1.215819585
                         ), `2021-11(error)` = c(1.998109138, 0.414373304, 0.342072615
                         ), `2021-12(error)` = c(0.719846116, 0.154463985, 1.225992602
                         )), class = "data.frame", row.names = c(NA, -3L))
    
    year_months <- c('2021-12', '2021-11', '2021-10')  
    curr <- lubridate::ym(year_months)
    prev <- curr - months(2L)
    dfs <- mapply(function(x, y) {
      df[c(
        "id", 
        format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"), 
        format(x, "%Y-%m(pred)"), 
        format(x, "%Y-%m(error)")
      )]
    }, curr, prev, SIMPLIFY = FALSE)
    

    我想通过将括号中的内容提取为新列名来重命名最后两列:

    [[1]]
            id 2021-10(actual) 2021-11(actual) 2021-12(actual)          pred          error
    1 M0000607             8.9             7.3             6.1      6.113632      0.7198461
    2 M0000609            15.7            14.8            14.2     14.162432      0.1544640
    3 M0000612             5.3             3.1             3.5      3.288373      1.2259926
    
    [[2]]
            id 2021-09(actual) 2021-10(actual) 2021-11(actual)          pred          error
    1 M0000607            10.3             8.9             7.3      8.352098      1.9981091
    2 M0000609            17.3            15.7            14.8     13.973182      0.4143733
    3 M0000612             6.4             5.3             3.1      3.164683      0.3420726
    
    [[3]]
            id 2021-08(actual) 2021-09(actual) 2021-10(actual)          pred          error
    1 M0000607            12.6            10.3             8.9      9.619846      0.9455678
    2 M0000609            19.2            17.3            15.7     15.545536      4.8832500
    3 M0000612             8.3             6.4             5.3      6.525993      1.2158196
    

    我怎么能用R做到这一点?谢谢

    参考链接:

    Extract info inside all parenthesis in R

    2 回复  |  直到 3 年前
        1
  •  3
  •   Ronak Shah    3 年前

    这里有一个 tidyverse 另类-

    library(dplyr)
    library(purrr)
    
    map(dfs, ~.x %>%rename_with(~sub('.*\\((.*)\\)$', '\\1', .x), last_col(c(0, 1))))
    
    #[[1]]
    #        id 2021-10(actual) 2021-11(actual) 2021-12(actual)      pred     error
    #1 M0000607             8.9             7.3             6.1  6.113632 0.7198461
    #2 M0000609            15.7            14.8            14.2 14.162432 0.1544640
    #3 M0000612             5.3             3.1             3.5  3.288373 1.2259926
    
    #[[2]]
    #        id 2021-09(actual) 2021-10(actual) 2021-11(actual)      pred     error
    #1 M0000607            10.3             8.9             7.3  8.352098 1.9981091
    #2 M0000609            17.3            15.7            14.8 13.973182 0.4143733
    #3 M0000612             6.4             5.3             3.1  3.164683 0.3420726
    
    #[[3]]
    #        id 2021-08(actual) 2021-09(actual) 2021-10(actual)      pred     error
    #1 M0000607            12.6            10.3             8.9  9.619846 0.9455678
    #2 M0000609            19.2            17.3            15.7 15.545536 4.8832500
    #3 M0000612             8.3             6.4             5.3  6.525993 1.2158196
    
        2
  •  1
  •   Dion Groothof    3 年前

    使用 lapply 操作列表中的元素 dfs ,我们可以用一些 regex 去做这项工作。

    dfs <- lapply(dfs, function(x) {
      col_num <- grep('pred|error', colnames(x))
      colnames(x)[col_num] <- gsub('.*\\(|\\)', '', colnames(x)[col_num]); x
    })
    

    输出

    > dfs
    [[1]]
            id 2021-10(actual) 2021-11(actual) 2021-12(actual)      pred     error
    1 M0000607             8.9             7.3             6.1  6.113632 0.7198461
    2 M0000609            15.7            14.8            14.2 14.162432 0.1544640
    3 M0000612             5.3             3.1             3.5  3.288373 1.2259926
    
    [[2]]
            id 2021-09(actual) 2021-10(actual) 2021-11(actual)      pred     error
    1 M0000607            10.3             8.9             7.3  8.352098 1.9981091
    2 M0000609            17.3            15.7            14.8 13.973182 0.4143733
    3 M0000612             6.4             5.3             3.1  3.164683 0.3420726
    
    [[3]]
            id 2021-08(actual) 2021-09(actual) 2021-10(actual)      pred     error
    1 M0000607            12.6            10.3             8.9  9.619846 0.9455678
    2 M0000609            19.2            17.3            15.7 15.545536 4.8832500
    3 M0000612             8.3             6.4             5.3  6.525993 1.2158196