代码之家  ›  专栏  ›  技术社区  ›  Geomicro

如何在数据帧中将分类单元注释扩展为单独的列

  •  0
  • Geomicro  · 技术社区  · 4 年前

    我有一个分类标识符数据框架,我正在尝试将分类单元ID“扩展”到单独的列中。我发现并遵循了前面的问题,但再次被困于解释最后一行代码。

    以下是我分类群数据框架的前几行:

    > dput(toytax)
    structure(list(taxa = structure(1:3, .Label = c("D_0__Archaea;D_1__Altiarchaeota;D_2__Altiarchaeia;D_3__uncultured archaeon;D_4__uncultured archaeon;D_5__uncultured archaeon;D_6__uncultured archaeon", 
    "D_0__Archaea;D_1__Altiarchaeota;D_2__Altiarchaeia;D_3__uncultured euryarchaeote;D_4__uncultured euryarchaeote;D_5__uncultured euryarchaeote;D_6__uncultured euryarchaeote", 
    "D_0__Archaea;D_1__Asgardaeota;D_2__Odinarchaeia;D_3__uncultured euryarchaeote;D_4__uncultured euryarchaeote;D_5__uncultured euryarchaeote;D_6__uncultured euryarchaeote"
    ), class = "factor")), class = "data.frame", row.names = c("otu1", 
    "otu2", "otu3"))
    

    我试着把它们分开,看起来像这样:

    otu     Domain     Phylum         Class         Order               family              ...
    otu1    Archaea    Altiarchaeota  Altiarchaeia  uncultured archaeon uncultured archaeon ...
    .
    .
    .
    

    toytax$taxa <- gsub("D_0__[A-Za-z]+\\.D_1__[A-Za-z]+\\D_2__", "", toytax$taxa)
    

    我的第二步是将这些名称提取到单独的列中,但是没有用。

    tidyr::extract(tax, taxa,
                   c('domain','phylum','class','order','family','genus','species'),
                   '(\\d+-\\d+-\\d+ \\d+:\\d+:\\d+:\\d+)\\s*([A-Z]+)\\s*(<.*?>)\\s*({.*?})\\s*(\\[.*?\\])\\s*(.*)')
    

    很明显,我不明白如何用这种方式来指代人物。感谢任何帮助!

    1 回复  |  直到 4 年前
        1
  •  1
  •   Alexlok    4 年前

    清理 stringr::str_remove_all() )你只需要指定“D”在哪里 \\d 表示一个数字。然后你可以使用 tidyr::separate() 在“;”

    library(tidyverse)
    
    toytax %>%
      mutate(taxa = str_remove_all(taxa, "D_\\d__")) %>%
      separate(taxa,
               into = c("Domain", "Phylum", "Class", "Order", "family", "level5", "level6"),
               sep = ";")
    #>    Domain        Phylum        Class                    Order
    #> 1 Archaea Altiarchaeota Altiarchaeia      uncultured archaeon
    #> 2 Archaea Altiarchaeota Altiarchaeia uncultured euryarchaeote
    #> 3 Archaea   Asgardaeota Odinarchaeia uncultured euryarchaeote
    #>                     family                   level5                   level6
    #> 1      uncultured archaeon      uncultured archaeon      uncultured archaeon
    #> 2 uncultured euryarchaeote uncultured euryarchaeote uncultured euryarchaeote
    #> 3 uncultured euryarchaeote uncultured euryarchaeote uncultured euryarchaeote
    

    创建于2020-12-18 reprex package