代码之家  ›  专栏  ›  技术社区  ›  AwaitedOne

哪些文件在r中有一些内容

  •  1
  • AwaitedOne  · 技术社区  · 7 年前

    我有一个包含文件行的列表,其中的示例如下所示。

    list(c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", "\"NM_012429\",\"7.19739265676517\",\"0.738130599770152\"", 
    "\"NM_003980\",\"12.4036181424743\",\"13.753593768862\"", "\"AY044449\",\"8.74973537284918\",\"1.77200602833912\"", 
    "\"NM_005015\",\"11.3735054810744\",\"6.76079815107347\""), c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", 
    "\"NM_012429\",\"7.07699512126353\",\"0.987579612646805\"", "\"NM_003980\",\"11.3172936656653\",\"8.38227473088534\"", 
    "\"AY044449\",\"9.2865464417786\",\"2.61149606120517\"", "\"NM_005015\",\"10.1228142794354\",\"3.98707517627092\""
    ), c("ID,SIGNALINTENSITY,SNR", "1,NM_012429,6.44764696592035,0.84120306786724", 
    "2,NM_003980,9.52604513443066,3.02404186191898", "3,AY044449,9.11930818670925,2.24361163736047", 
    "4,NM_005015,10.5672879852575,5.29334273442728"))
    

    我想在看台词的时候确认一下是否匹配。我试图找出哪些文件的内容以 NM GE 按以下代码

    which(lapply(lines, function(x) any(grepl(paste(c("^NM_","^GE"),collapse = "|"), x, ignore.case = TRUE))) == T)
    

    它应该给出所有三个的索引,但是它返回 integer(0) . 我不知道我错过了什么。

    2 回复  |  直到 7 年前
        1
  •  1
  •   PKumar    7 年前

    试试这个:

    lyst <- list(c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", "\"NM_012429\",\"7.19739265676517\",\"0.738130599770152\"", 
    "\"NM_003980\",\"12.4036181424743\",\"13.753593768862\"", "\"AY044449\",\"8.74973537284918\",\"1.77200602833912\"", 
    "\"NM_005015\",\"11.3735054810744\",\"6.76079815107347\""), c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", 
    "\"NM_012429\",\"7.07699512126353\",\"0.987579612646805\"", "\"NM_003980\",\"11.3172936656653\",\"8.38227473088534\"", 
    "\"AY044449\",\"9.2865464417786\",\"2.61149606120517\"", "\"NM_005015\",\"10.1228142794354\",\"3.98707517627092\""
    ), c("ID,SIGNALINTENSITY,SNR", "1,NM_012429,6.44764696592035,0.84120306786724", 
    "2,NM_003980,9.52604513443066,3.02404186191898", "3,AY044449,9.11930818670925,2.24361163736047", 
    "4,NM_005015,10.5672879852575,5.29334273442728"))
    

    假设 lyst 根据你的问题给出字符串,然后你可以:

    lapply(1:length(lyst), function(x)grepl("^NM|^GE",gsub('"',"", lyst[[x]])))
    

    逻辑 :

    首先将“替换为”不使用 gsub 然后使用“^”确定字符串的开头是nm还是ge(使用grepl)。

    但是,如果有人对匹配可选数字和逗号感兴趣 也可以使用此regex:

    lapply(1:3, function(x)grepl("^(NM|GE)|^\\d+,(NM|GE)",gsub('"',"", lyst[[x]])))
    

    输出:

        > lapply(1:3, function(x)grepl("^(NM|GE)|^\\d+,(NM|GE)",gsub('"',"", lyst[[x]])))
    [[1]]
    [1] FALSE  TRUE  TRUE FALSE  TRUE
    
    [[2]]
    [1] FALSE  TRUE  TRUE FALSE  TRUE
    
    [[3]]
    [1] FALSE  TRUE  TRUE FALSE  TRUE
    
        2
  •  1
  •   Aurèle    7 年前
    dat <- lapply(
      lines,
      function(x) read.csv(text = x)
    )
    
    # [[1]]
    #          ID SIGNALINTENSITY        SNR
    # 1 NM_012429        7.197393  0.7381306
    # 2 NM_003980       12.403618 13.7535938
    # 3  AY044449        8.749735  1.7720060
    # 4 NM_005015       11.373505  6.7607982
    # 
    # [[2]]
    #          ID SIGNALINTENSITY       SNR
    # 1 NM_012429        7.076995 0.9875796
    # 2 NM_003980       11.317294 8.3822747
    # 3  AY044449        9.286546 2.6114961
    # 4 NM_005015       10.122814 3.9870752
    # 
    # [[3]]
    #          ID SIGNALINTENSITY       SNR
    # 1 NM_012429        6.447647 0.8412031
    # 2 NM_003980        9.526045 3.0240419
    # 3  AY044449        9.119308 2.2436116
    # 4 NM_005015       10.567288 5.2933427
    

    要筛选行:

    lapply(
      dat,
      function(df) df[grepl("^NM_|^GE", df$ID, ignore.case = TRUE), ]
    )
    
    # [[1]]
    #          ID SIGNALINTENSITY        SNR
    # 1 NM_012429        7.197393  0.7381306
    # 2 NM_003980       12.403618 13.7535938
    # 4 NM_005015       11.373505  6.7607982
    # 
    # [[2]]
    #          ID SIGNALINTENSITY       SNR
    # 1 NM_012429        7.076995 0.9875796
    # 2 NM_003980       11.317294 8.3822747
    # 4 NM_005015       10.122814 3.9870752
    # 
    # [[3]]
    #          ID SIGNALINTENSITY       SNR
    # 1 NM_012429        6.447647 0.8412031
    # 2 NM_003980        9.526045 3.0240419
    # 4 NM_005015       10.567288 5.2933427
    

    或者如果只需要索引:

    lapply(
      dat,
      function(df) grepl("^NM_|^GE", df$ID, ignore.case = TRUE)
    )
    
    # [[1]]
    # [1]  TRUE  TRUE FALSE  TRUE
    # 
    # [[2]]
    # [1]  TRUE  TRUE FALSE  TRUE
    # 
    # [[3]]
    # [1]  TRUE  TRUE FALSE  TRUE
    

    或与 grep 而不是 grepl :

    lapply(
      dat,
      function(df) grep("^NM_|^GE", df$ID, ignore.case = TRUE)
    )
    
    # [[1]]
    # [1] 1 2 4
    # 
    # [[2]]
    # [1] 1 2 4
    # 
    # [[3]]
    # [1] 1 2 4