代码之家  ›  专栏  ›  技术社区  ›  SNT

根据r中另一列的字符串值创建新列

r
  •  1
  • SNT  · 技术社区  · 6 年前

    我在r中有一个数据帧,它的列是一个大字符串。我想用这个字符串创建一个具有特定值的新列。

    dom <- data.frame(
      Site = c("alpha", "beta", "charlie", "delta"),
      Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
    )
    

    现在如果列标题包含 Watermelon Vanilla label 只应具有值 西瓜 香草 其他的 Default . 下面是预期的数据帧应该是什么样子。

    grep 或者其他什么有多重病症的东西?

    dom_output <- data.frame(
      Site = c("alpha", "beta", "charlie", "delta"),
      Banner = c("testing_Watermelon -bbb_300x250 v2"   , "notest_Orange aaa_300x250 v2"    , "bottle :15s","aaaa vvvv cccc 320x480"),
      label  = c("Watermelon","Vanilla","Default","Default")
    )
    
    3 回复  |  直到 5 年前
        1
  •  5
  •   Gregor Thomas    6 年前
    library(stringr)
    dom$label = str_extract(dom$Banner, "Watermelon|Vanilla")
    dom$label[is.na(dom$label)] <- "Default"
    dom
    #      Site                              Banner      label
    # 1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
    # 2    beta notest_Vanilla Latte-DPI_300x250 v2    Vanilla
    # 3 charlie                         bottle :15s    Default
    # 4   delta aaaa vvvv cccc Build_Mobile_320x480    Default
    
        2
  •  0
  •   Brigadeiro    6 年前

    下面是一个使用基数R的简单解决方案:

    #Sample data:
    dom <- data.frame(
      Site = c("alpha", "beta", "charlie", "delta"),
      Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
    )
    
    
    dom$label <- ifelse(grepl("watermelon", dom$Banner, ignore.case = T), "Watermelon",
                        ifelse(grepl("vanilla", dom$Banner, ignore.case = T), "Vanilla", "Default"))
    
        3
  •  0
  •   tmfmnk    6 年前

    base R 可能是:

    labels <- paste(c("Watermelon", "Orange"), collapse = "|")
    
    dom$label <- sapply(regmatches(dom$Banner, regexec(labels, dom$Banner)), "[", 1)
    dom$label[is.na(dom$label)] <- "Default"
    
         Site                              Banner      label
    1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
    2    beta  notest_Orange Latte-DPI_300x250 v2     Orange
    3 charlie                         bottle :15s    Default
    4   delta aaaa vvvv cccc Build_Mobile_320x480    Default
    

    同样的方法也可以用于 dplyr tidyr :

    dom %>%
     mutate(label = sapply(regmatches(Banner, regexec(labels, Banner)), "[", 1),
            label = replace_na(label, "Default"))
    

    样本数据:

    dom <- data.frame(
     Site = c("alpha", "beta", "charlie", "delta"),
     Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Orange Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
    )
    
        4
  •  0
  •   M--    6 年前
    library(dplyr)
    library(stringi)
    
    dom %>% mutate(label = case_when(stri_detect_fixed(Banner, "Watermelon") ~ "Watermelon",
                                     stri_detect_fixed(Banner, "Vanilla")    ~ "Vanilla",
                                                                       TRUE  ~ "Default"))
    #>      Site                              Banner          label
    #> 1   alpha  testing_Watermelon -DPI_300x250 v2     Watermelon
    #> 2    beta notest_Vanilla Latte-DPI_300x250 v2        Vanilla
    #> 3 charlie                         bottle :15s        Default
    #> 4   delta aaaa vvvv cccc Build_Mobile_320x480        Default
    

    dom <- data.frame(Site = c("alpha", "beta", "charlie", "delta"),
                      Banner = c("testing_Watermelon -DPI_300x250 v2",
                                 "notest_Vanilla Latte-DPI_300x250 v2",
                                 "bottle :15s",
                                 "aaaa vvvv cccc Build_Mobile_320x480"))