我有一个分类标识符数据框架,我正在尝试将分类单元ID“扩展”到单独的列中。我发现并遵循了前面的问题,但再次被困于解释最后一行代码。
以下是我分类群数据框架的前几行:
> dput(toytax)
structure(list(taxa = structure(1:3, .Label = c("D_0__Archaea;D_1__Altiarchaeota;D_2__Altiarchaeia;D_3__uncultured archaeon;D_4__uncultured archaeon;D_5__uncultured archaeon;D_6__uncultured archaeon",
"D_0__Archaea;D_1__Altiarchaeota;D_2__Altiarchaeia;D_3__uncultured euryarchaeote;D_4__uncultured euryarchaeote;D_5__uncultured euryarchaeote;D_6__uncultured euryarchaeote",
"D_0__Archaea;D_1__Asgardaeota;D_2__Odinarchaeia;D_3__uncultured euryarchaeote;D_4__uncultured euryarchaeote;D_5__uncultured euryarchaeote;D_6__uncultured euryarchaeote"
), class = "factor")), class = "data.frame", row.names = c("otu1",
"otu2", "otu3"))
我试着把它们分开,看起来像这样:
otu Domain Phylum Class Order family ...
otu1 Archaea Altiarchaeota Altiarchaeia uncultured archaeon uncultured archaeon ...
.
.
.
toytax$taxa <- gsub("D_0__[A-Za-z]+\\.D_1__[A-Za-z]+\\D_2__", "", toytax$taxa)
我的第二步是将这些名称提取到单独的列中,但是没有用。
tidyr::extract(tax, taxa,
c('domain','phylum','class','order','family','genus','species'),
'(\\d+-\\d+-\\d+ \\d+:\\d+:\\d+:\\d+)\\s*([A-Z]+)\\s*(<.*?>)\\s*({.*?})\\s*(\\[.*?\\])\\s*(.*)')
很明显,我不明白如何用这种方式来指代人物。感谢任何帮助!