代码之家  ›  专栏  ›  技术社区  ›  apple

跨行重新排序值,并将重复项转换为R中的NA

  •  2
  • apple  · 技术社区  · 1 年前

    我试图在R中的数据帧中按行重新排序值。

    data <- data.frame(sample = c("A", "B", "C", "D", "E", "F"),
                       year_1 = c(2015, 2014, NA, 1985, 2011, 2010),
                       year_2 = c(NA, 1986, 1999, 1986, 2009, 2009),
                       year_3 = c(2015, 2014, 2014, 1956, NA, 2010),
                       year_4 = c(NA, 2014, 2014, 1985, 2010, 2010))
    
    data
    
    >    sample year_1 year_2 year_3 year_4
    >  1      A   2015     NA   2015     NA
    >  2      B   2014   1986   2014   2014
    >  3      C     NA   1999   2014   2014
    >  4      D   1985   1986   1956   1985
    >  5      E   2011   2009     NA   2010
    >  6      F   2010   2009   2010   2010
    
    

    我想重新排序每一行,这样 year_1 年份最高, year_2 具有次高,依此类推。我还想保留所有列,所以任何 NA s将以开头出现 year_4 .

    如上所述,我已经设法对每一行中的值进行了重新排序。

    cols <- c("year_1", "year_2", "year_3", "year_4")
    data[cols] <- t(apply(data[cols], 
                          MARGIN = 1,
                          FUN = function(x) sort(x, decreasing = TRUE, na.last = TRUE)))
    
    data
    
    >    sample year_1 year_2 year_3 year_4
    >  1      A   2015   2015     NA     NA
    >  2      B   2014   2014   2014   1986
    >  3      C   2014   2014   1999     NA
    >  4      D   1986   1985   1985   1956
    >  5      E   2011   2010   2009     NA
    >  6      F   2010   2010   2010   2009
    

    但是,我也想删除任何重复项,并替换为 NA (显示在行的末尾)。例如,对于示例D,行应为 1986, 1985, 1956, NA .

    最终目标:

    >    sample year_1 year_2 year_3 year_4
    >  1      A   2015     NA     NA     NA
    >  2      B   2014   1986     NA     NA
    >  3      C   2014   1999     NA     NA
    >  4      D   1986   1985   1956     NA
    >  5      E   2011   2010   2009     NA
    >  6      F   2010   2009     NA     NA
    

    注意:我的数据有几十万行,所以非常感谢高效的代码。

    3 回复  |  直到 1 年前
        1
  •  3
  •   jpsmith    1 年前

    你很接近!一种选择是只更换 duplicated 具有的值 NA 排序前:

    cols <- c("year_1", "year_2", "year_3", "year_4")
    
    data[cols] <- t(apply(data[cols], 1, \(x){
      x[duplicated(x)] <- NA
      sort(x, decreasing = TRUE, na.last = TRUE)
    }))  
    

    输出

      sample year_1 year_2 year_3 year_4
    1      A   2015     NA     NA     NA
    2      B   2014   1986     NA     NA
    3      C   2014   1999     NA     NA
    4      D   1986   1985   1956     NA
    5      E   2011   2010   2009     NA
    6      F   2010   2009     NA     NA
    
        2
  •  1
  •   Friede    1 年前

    肯定有重新调整的潜力。

    data[-1L] = ave(unlist(data[-1L]), row(data[-1L]), FUN = \(x) `is.na<-`(x, duplicated(x)))
    M = as.matrix(data[-1L])
    M = matrix(M[order(-row(M), M, decreasing = TRUE)], ncol = ncol(M), byrow = TRUE)
    setNames(cbind.data.frame(data[1L], M), names(data))
    
    
      sample year_1 year_2 year_3 year_4
    1      A   2015     NA     NA     NA
    2      B   2014   1986     NA     NA
    3      C   2014   1999     NA     NA
    4      D   1986   1985   1956     NA
    5      E   2011   2010   2009     NA
    6      F   2010   2009     NA     NA
    
        3
  •  1
  •   Andre Wildberg    1 年前

    A. tidyverse 方法使用 unnest_wider

    library(dplyr)
    library(tidyr)
    
    data %>% 
      rowwise() %>% 
      mutate(year = list(`length<-`(sort(unique(c_across(starts_with("year"))),
        decreasing=T), length(c_across(starts_with("year")))))) %>% 
      select(-starts_with("year_")) %>% 
      unnest_wider(year, names_sep="_")
    # A tibble: 6 × 5
      sample year_1 year_2 year_3 year_4
      <chr>   <dbl>  <dbl>  <dbl>  <dbl>
    1 A        2015     NA     NA     NA
    2 B        2014   1986     NA     NA
    3 C        2014   1999     NA     NA
    4 D        1986   1985   1956     NA
    5 E        2011   2010   2009     NA
    6 F        2010   2009     NA     NA
    

    A. 数据表 只是为了好玩

    library(data.table)
    
    setDT(data)
    
    data[, as.list(`length<-`(sort(unique(unlist(.SD)), decreasing=T), 
      length(.SD))) |> setNames(names(.SD)), by = sample]
       sample year_1 year_2 year_3 year_4
       <char>  <num>  <num>  <num>  <num>
    1:      A   2015     NA     NA     NA
    2:      B   2014   1986     NA     NA
    3:      C   2014   1999     NA     NA
    4:      D   1986   1985   1956     NA
    5:      E   2011   2010   2009     NA
    6:      F   2010   2009     NA     NA
    
        4
  •  1
  •   LMc    1 年前
    library(dplyr)
    library(tidyr)
    
    data |> 
      pivot_longer(starts_with("year"), names_sep = "_", names_to = c("year", "n")) |>
      mutate(value = sort(replace(value, duplicated(value), NA), T, T), .by = sample) |>
      pivot_wider(names_from = c(year, n))
    

    输出

      sample year_1 year_2 year_3 year_4
      <chr>   <dbl>  <dbl>  <dbl>  <dbl>
    1 A        2015     NA     NA     NA
    2 B        2014   1986     NA     NA
    3 C        2014   1999     NA     NA
    4 D        1986   1985   1956     NA
    5 E        2011   2010   2009     NA
    6 F        2010   2009     NA     NA