代码之家 › 专栏 › 技术社区 › lhahne

按逻辑条件筛选data.frame行

dataframe subset r

125

lhahne · 技术社区 · 15 年前

我想从列表中筛选行 data.frame

   expr_value     cell_type
1    5.345618 bj fibroblast
2    5.195871 bj fibroblast
3    5.247274 bj fibroblast
4    5.929771          hesc
5    5.873096          hesc
6    5.665857          hesc
7    6.791656          hips
8    7.133673          hips
9    7.574058          hips
10   7.208041          hips
11   7.402100          hips
12   7.167792          hips
13   7.156971          hips
14   7.197543          hips
15   7.035404          hips
16   7.269474          hips
17   6.715059          hips
18   7.434339          hips
19   6.997586          hips
20   7.619770          hips
21   7.490749          hips

我想要的是得到一个新的数据帧,它看起来相同,但只有一个单元格类型的数据。例如,子集/选择包含单元格类型“hesc”的行:

   expr_value     cell_type
1    5.929771          hesc
2    5.873096          hesc
3    5.665857          hesc

或细胞类型“bj成纤维细胞”或“hesc”:

   expr_value     cell_type
1    5.345618 bj fibroblast
2    5.195871 bj fibroblast
3    5.247274 bj fibroblast
4    5.929771          hesc
5    5.873096          hesc
6    5.665857          hesc

有什么简单的方法可以做到这一点吗?

expr[expr[2] == 'hesc']
# [1] "5.929771" "5.873096" "5.665857" "hesc"     "hesc"     "hesc"

8 回复 | 直到 5 年前

239

Henrik plannapus 6 年前

根据选择行的步骤一 “细胞类型”(例如“hesc”),使用 == :

expr[expr$cell_type == "hesc", ]

或 “bj成纤维细胞”),使用 %in% :

expr[expr$cell_type %in% c("hesc", "bj fibroblast"), ]

rcs 9 年前

subset

subset(expr, cell_type == "hesc")
subset(expr, cell_type %in% c("bj fibroblast", "hesc"))

或者更好 dplyr::filter()

filter(expr, cell_type %in% c("bj fibroblast", "hesc"))

Ken Williams Dirk is no longer here 15 年前

原因 expr[expr[2] == 'hesc'] 不起作用的是,对于数据帧, x[y] 选择列,而不是行。如果要选择行,请更改语法 x[y,] 相反:

> expr[expr[2] == 'hesc',]
  expr_value cell_type
4   5.929771      hesc
5   5.873096      hesc
6   5.665857      hesc

nathaneastwood 10 年前

你可以使用 dplyr 包裹:

library(dplyr)
filter(expr, cell_type == "hesc")
filter(expr, cell_type == "hesc" | cell_type == "bj fibroblast")

eigenfoo 6 年前

似乎没有人包括which函数。它也可以被证明是有用的过滤。

expr[which(expr$cell == 'hesc'),]

这还将处理NAs并将其从生成的数据帧中删除。

在9840x24数据帧上运行50000次,似乎which方法的运行时间比%in%方法快60%。

Justin Harbour 7 年前

我当时正在处理一个数据帧,但对提供的答案没有把握,它总是返回0行,所以我找到并使用了grepl:

df = df[grepl("downlink",df$Transmit.direction),]

这基本上把我的数据帧修剪成了只在传输方向列中包含“下行链路”的行。另外,如果有人能猜到为什么我没有看到预期的行为,请留下评论。

具体到原来的问题:

expr[grepl("hesc",expr$cell_type),]

expr[grepl("bj fibroblast|hesc",expr$cell_type),]

Daniel Bonetti 7 年前

有时,要筛选的列可能显示在与列索引2不同的位置,或者具有变量名。

在这种情况下,您只需参考列名您要筛选为:

columnNameToFilter = "cell_type"
expr[expr[[columnNameToFilter]] == "hesc", ]

DKMDebugin 5 年前

celltype_hesc_bool = expr['cell_type'] == 'hesc'

expr_celltype_hesc = expr[celltype_hesc]

Check this blog post

Varn K 6 年前

  library(data.table)
  expr <- data.table(expr)
  expr[cell_type == "hesc"]
  expr[cell_type %in% c("hesc","fibroblast")]

%like% 模式匹配算子

 expr[cell_type %like% "hesc"|cell_type %like% "fibroblast"]