我有以下虚拟数据帧:
structure(list(ref = structure(1:7, .Label = c("a", "b", "c",
"d", "e", "f", "g"), class = "factor"), gene = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L), .Label = c("gyrA", "parC"), class = "factor"),
result = structure(c(2L, 4L, 6L, 2L, 3L, 5L, 1L), .Label = c("S479T",
"S83L", "S83L, D678E, D741E", "S83L, D87G", "T765E", "V196A, M248V, E678D"
), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
ref gene result
a gyrA S83L
b gyrA S83L, D87G
c gyrA V196A, M248V, E678D
d gyrA S83L
e gyrA S83L, D678E, D741E
f parC T765E
g parC S479T
我想做的是检查“result”列中的数值(每个条目中两个字母之间)是否在特定范围内,特别是67-106,但仅当“gene”列==gyrA时。需要检查“结果”列中每个单元格中的所有数字。
如果单元格中的任何数字在指定范围内,result\ u pos中的结果应返回1。
df %>%
mutate(gyrA_pos = ifelse(gene == "gyrA", gsub("[[:alpha:]]", "", result), NA),
result_pos = ifelse(gene == "gyrA" & gyrA_pos %in% as.character(seq(from = 67, to = 106)) == TRUE, 1, 0))
这是有效的,但只适用于只有一个值的条目。我还发现,在匹配之前必须创建一个去掉字母的列是很乏味的。最后我想说:
ref gene result result_pos
a gyrA S83L 1
b gyrA S83L, D87G 1
c gyrA V196A, M248V, E678D 0
d gyrA S83L 1
e gyrA S83L, D678E, D741E 1
f parC T765E NA
g parC S479T NA