代码之家  ›  专栏  ›  技术社区  ›  accibio

基于列中的元素在R中子集数据帧

  •  0
  • accibio  · 技术社区  · 4 年前

    我有一个数据帧 它有5行6列。

    df <- data.frame(
      Hits = c("Hit1", "Hit2", "Hit3", "Hit4", "Hit5"),
      category1 = c("a1", "", "b1", "a1", "c1"),
      category2 = c("", "", "", "", "a2"),
      category3 = c("a3", "", "b3", "", "a3"),
      category4 = c("", "", "", "", ""),
      category5 = c("", "", "a5", "b5", ""),
      stringsAsFactors = FALSE)
    

    enter image description here

    从每一列 ,我只需要保留出现在最顶端位置的元素,即。

    enter image description here

    最后,删除这五列中没有元素的行,即。

    enter image description here

    0 回复  |  直到 4 年前
        1
  •  2
  •   Ronak Shah    4 年前

    你可以用-

    library(dplyr)
    
    df %>%
      #Retain only the values that appear in topmost position
      mutate(across(starts_with('category'), ~replace(., -match(TRUE, . != ''), ''))) %>%
      #Drop the rows that have no element
      filter(if_any(starts_with('category'), ~. != ''))
    
    #  Hits category1 category2 category3 category4 category5
    #1 Hit1        a1                  a3                    
    #2 Hit3                                                a5
    #3 Hit5                  a2                              
    

    df %>%
      mutate(across(2:6, ~replace(., -match(TRUE, . != ''), ''))) %>%
      filter(if_any(2:6, ~. != ''))
    
        2
  •  1
  •   iago    4 年前
    df %>% 
      mutate(across(.cols = -Hits, .fns = ~ifelse(row_number() == first(which(.!="")) | all(. == ""), ., ""))) %>% 
      filter(if_any(-Hits, ~.!=""))
    
      Hits category1 category2 category3 category4 category5
    1 Hit1        a1                  a3                    
    2 Hit3                                                a5
    3 Hit5                  a2