代码之家  ›  专栏  ›  技术社区  ›  Chris Ruehlemann

`dplyr中跨列多次连续操作的语法`

  •  0
  • Chris Ruehlemann  · 技术社区  · 4 年前

    我正在努力为多个连续的操作使用正确的语法 across dplyr .在这些数据中:

    df <- structure(list(A1 = c(838.611, 824.048, 668.901, 225.075, 0, 
                          0, 341.291, 0, 101.652, 127.341, 0, 297.092, 0, 0, 0, 0, 0, 764.737, 
                          759.51, 772.21), A2 = c(499.041, 492.997, 486.132, 469.503, 476.782, 
                                                  464.18, 469.833, 462.317, 455.507, 441.47, 490.147, 430.844, 
                                                  0, 0, 0, 0, 0, 0, 0, 124.068)), row.names = c(NA, 20L), class = "data.frame")
    

    比如,我想跨列实现以下更改 A1 A2 :

      1. 代替 0 具有 NA
      1. 将异常值设置为
      1. 插话

    使用以下语法仅执行更改1。但不是2。和3.:

    library(dplyr)
    library(zoo)
    df %>%
      mutate(across(starts_with("A"),
                    ~na_if(.,0),
                    ~ifelse(. %in% boxplot(.)$out, NA, .),
                    ~na.approx(., na.rm = FALSE, rule = 2)))
            A1      A2
    1  838.611 499.041
    2  824.048 492.997
    3  668.901 486.132
    4  225.075 469.503
    5       NA 476.782
    6       NA 464.180
    7  341.291 469.833
    8       NA 462.317
    9  101.652 455.507
    10 127.341 441.470
    11      NA 490.147
    12 297.092 430.844
    13      NA      NA
    14      NA      NA
    15      NA      NA
    16      NA      NA
    17      NA      NA
    18 764.737      NA
    19 759.510      NA
    20 772.210 124.068
    

    编辑 : 这个 正确输出 是从这种(重复的)代码类型中获得的(我希望避免):

    df %>%
      mutate(across(starts_with("A"),
                    ~na_if(.,0))) %>%
      mutate(across(starts_with("A"),        
                    ~ifelse(. %in% boxplot(.)$out, NA, .))) %>%
      mutate(across(starts_with("A"),
                    ~na.approx(., na.rm = FALSE, rule = 2)))
    
             A1      A2
    1  838.6110 499.041
    2  824.0480 492.997
    3  668.9010 486.132
    4  225.0750 469.503
    5  263.8137 476.782
    6  302.5523 464.180
    7  341.2910 469.833
    8  221.4715 462.317
    9  101.6520 455.507
    10 127.3410 441.470
    11 212.2165 490.147
    12 297.0920 430.844
    13 375.0328 430.844
    14 452.9737 430.844
    15 530.9145 430.844
    16 608.8553 430.844
    17 686.7962 430.844
    18 764.7370 430.844
    19 759.5100 430.844
    20 772.2100 430.844
    
    0 回复  |  直到 4 年前
        1
  •  3
  •   Limey    4 年前

    在评论中回答OP的问题。

    df %>%
      mutate(
        across(
          starts_with("A"),
          list(
            ~na_if(.,0),
            ~ifelse(. %in% boxplot(.)$out, NA, .),
            ~na.approx(., na.rm = FALSE, rule = 2)
           )
         )
       )
            A1      A2    A1_1    A1_2    A1_3    A2_1    A2_2    A2_3
    1  838.611 499.041 838.611 838.611 838.611 499.041 499.041 499.041
    2  824.048 492.997 824.048 824.048 824.048 492.997 492.997 492.997
    3  668.901 486.132 668.901 668.901 668.901 486.132 486.132 486.132
    4  225.075 469.503 225.075 225.075 225.075 469.503 469.503 469.503
    5    0.000 476.782      NA   0.000   0.000 476.782 476.782 476.782
    6    0.000 464.180      NA   0.000   0.000 464.180 464.180 464.180
    7  341.291 469.833 341.291 341.291 341.291 469.833 469.833 469.833
    8    0.000 462.317      NA   0.000   0.000 462.317 462.317 462.317
    9  101.652 455.507 101.652 101.652 101.652 455.507 455.507 455.507
    10 127.341 441.470 127.341 127.341 127.341 441.470 441.470 441.470
    11   0.000 490.147      NA   0.000   0.000 490.147 490.147 490.147
    12 297.092 430.844 297.092 297.092 297.092 430.844 430.844 430.844
    13   0.000   0.000      NA   0.000   0.000      NA   0.000   0.000
    14   0.000   0.000      NA   0.000   0.000      NA   0.000   0.000
    15   0.000   0.000      NA   0.000   0.000      NA   0.000   0.000
    16   0.000   0.000      NA   0.000   0.000      NA   0.000   0.000
    17   0.000   0.000      NA   0.000   0.000      NA   0.000   0.000
    18 764.737   0.000 764.737 764.737 764.737      NA   0.000   0.000
    19 759.510   0.000 759.510 759.510 759.510      NA   0.000   0.000
    20 772.210 124.068 772.210 772.210 772.210 124.068 124.068 124.068
    

    您可以(除其他外)为输出列指定更有意义的名称 命名 列表元素包括:

    df %>%
      mutate(
        across(
          starts_with("A"),
          list(
            "Zero"=~na_if(.,0),
            "BoxPlot"=~ifelse(. %in% boxplot(.)$out, NA, .),
            "Approx"=~na.approx(., na.rm = FALSE, rule = 2)
           )
         )
       )
            A1      A2 A1_Zero A1_BoxPlot A1_Approx A2_Zero A2_BoxPlot A2_Approx
    1  838.611 499.041 838.611    838.611   838.611 499.041    499.041   499.041
    2  824.048 492.997 824.048    824.048   824.048 492.997    492.997   492.997
    ...
    

    以下是对OP评论的最新回应

    across() 有一个 .names 参数,该参数允许控制输出列的命名,但在这里不起作用,因为 跨越 为输入列和函数的每个组合输出一列。我们希望对每个输入列应用多个函数,为每个输入列生成一个输出列。要做到这一点,将每个列的函数包装在一个函数中。这与倍数具有相同的效果 mutate 调用OP对原始问题的编辑。

    df %>%
      mutate(
        across(
          starts_with("A"),
          function(.x) {
             .x <- na_if(.x, 0)
            .x <- ifelse(.x %in% boxplot(.x)$out, NA, .x)
            .x <- na.approx(.x, na.rm = FALSE, rule = 2)
            .x
          }
        )
      )
             A1      A2
    1  838.6110 499.041
    2  824.0480 492.997
    3  668.9010 486.132
    4  225.0750 469.503
    5  263.8137 476.782
    6  302.5523 464.180
    7  341.2910 469.833
    8  221.4715 462.317
    9  101.6520 455.507
    10 127.3410 441.470
    11 212.2165 490.147
    12 297.0920 430.844
    13 375.0328 430.844
    14 452.9737 430.844
    15 530.9145 430.844
    16 608.8553 430.844
    17 686.7962 430.844
    18 764.7370 430.844
    19 759.5100 430.844
    20 772.2100 430.844
    
        2
  •  2
  •   Ronak Shah    4 年前

    为了清晰起见,我编写了一个自定义函数,可以应用于 across .

    library(dplyr)
    library(zoo)
    
    apply_fun <- function(x) {
      na_if(x, 0) %>%
        ifelse(. %in% boxplot(.)$out, NA, .) %>%
        na.approx(., na.rm = FALSE, rule = 2)
    }
    
    
    df %>% mutate(across(starts_with("A"),apply_fun))
    
    #         A1      A2
    #1  838.6110 499.041
    #2  824.0480 492.997
    #3  668.9010 486.132
    #4  225.0750 469.503
    #5  263.8137 476.782
    #6  302.5523 464.180
    #7  341.2910 469.833
    #8  221.4715 462.317
    #9  101.6520 455.507
    #10 127.3410 441.470
    #11 212.2165 490.147
    #12 297.0920 430.844
    #13 375.0328 430.844
    #14 452.9737 430.844
    #15 530.9145 430.844
    #16 608.8553 430.844
    #17 686.7962 430.844
    #18 764.7370 430.844
    #19 759.5100 430.844
    #20 772.2100 430.844
    
    推荐文章