代码之家 › 专栏 › 技术社区 › dan

排列或dcast并填写计数

dcast spread reshape2 tidyr r

dan · 技术社区 · 3 年前

可能是个基本问题。

我有一个 key - value data.frame ( df (见下文):

features <- paste0("f",1:5)
set.seed(1)
ids <- paste0("id",1:10)

df <- do.call(rbind,lapply(ids,function(i){
  data.frame(id = i, feature = sample(features,3,replace = F))
}))

我想 tidyr::spread 或 reshape2::dcast 将其删除,以便行被删除 id' the columns are 特色 , but the values are the sum of 特征 for each id`。

reshape2::dcast(df, id ~ feature)

但这并不能实现。它只是充满了 feature NA

添加 fun.aggregate = sum 对上面的命令执行以下操作会导致错误:

> reshape2::dcast(df, id ~ feature, fun.aggregate = sum)
Using feature as value column: use value.var to override.
Error in .fun(.value[0], ...) : invalid 'type' (character) of argument

tidyr::spread(df, key = id, value = feature)

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 30 rows:

有什么想法吗?

1 回复 | 直到 3 年前

Ronak Shah 3 年前

我想你应该数一数功能,而不是 sum 他们尝试使用该函数 length .

tidyr::pivot_wider(df, names_from = feature, 
            values_from = feature, values_fn = length, values_fill = 0)

或与 dcast .

library(data.table)
dcast(setDT(df), id~feature, value.var = 'feature', fun.aggregate = length)

table(df) 将给出相同的输出。

table(df)

#     feature
#id     f1 f2 f3 f4 f5
#  id1   1  0  1  1  0
#  id10  1  0  1  1  0
#  id2   1  1  0  0  1
#  id3   0  1  1  1  0
#  id4   1  0  1  0  1
#  id5   1  1  0  0  1
#  id6   1  1  1  0  0
#  id7   1  0  0  1  1
#  id8   1  1  0  0  1
#  id9   0  1  0  1  1

推荐文章

BobbyG · 在r编程中,基于关键字拆分字符串,关键字=BAS,得到重复结果。尤其是FRM_id=1014

2 年前

Faryan · 如何使用R[关闭]将列表表的名称插入到列中

7 年前

conor · 当组合不存在时,用空值重新整形

7 年前

Balamurali N.R · 从嵌套数据帧/TIBLE运行多重简单线性回归

7 年前

mbooma · 熊猫造型(python)

7 年前

Thiago · 基于两个变量生成非平衡面板数据的R Stata

7 年前

user183974 · 通过分组变量r扩展二进制变量

7 年前

Nick Criswell · 相关矩阵-tidyr聚集v.重塑2熔体

7 年前

astrsk · 带R ggplot2的组合条形图:闪避和堆叠

7 年前

Zulu · 使用ggplot用多列打印选定行

7 年前