代码之家  ›  专栏  ›  技术社区  ›  Tim Wilcox

如何查找一列中的文本字符串是否在另一列中?

  •  -1
  • Tim Wilcox  · 技术社区  · 2 年前

    以下是示例数据

     df1 <- c ("Board of Accountancy", "Board of Economists", "Board of Medicine"
     df2 <- c ("State Board of Accountancy", "The State Board of Economists", "State Board of Law")
    

    手头的任务有两方面。首先,在df2中搜索在df1中找到的文本字符串。如果在df1中没有找到它,那么就别管它,并得出这样的最终结果。这与我昨天提出的一个问题有关,但经过仔细研究。。我的第一项工作是查找df1中的名称是否在df2中找到。

    df3: "State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine"
    
    1 回复  |  直到 2 年前
        1
  •  1
  •   r2evans    2 年前
    c(df2, df1[rowSums(sapply(df1, grepl, df2)) < 1])
    # [1] "State Board of Accountancy"    "The State Board of Economists" "State Board of Law"            "Board of Medicine"            
    df3
    # [1] "State Board of Accountancy"    "The State Board of Economists" "State Board of Law"            "Board of Medicine"            
    

    演练:

    • grepl 它本身只接受单个模式,因此我们需要对每个模式进行迭代;我们用 sapply
    • 从那以后( 狡猾的 )返回一个矩阵(针对所有 df2 ),我们需要在一行(每个 df1 )是匹配;我们用 rowSums(.) < 1 (又名 == 0 ),意思是没有匹配的;通过细分 df1[..] 在这个问题上,我们得到 df1 没有找到匹配项

    更正的数据:

    df1 <- c("Board of Accountancy", "Board of Economists", "Board of Medicine")
    df2 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law")
    df3 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine")