代码之家 › 专栏 › 技术社区 › phaser

统计重复项的数量,并将其放入数据帧中的一列中

count r

phaser · 技术社区 · 8 年前

我想计算一列中的重复数,然后将它们添加到数据库的另一列中。

例如,一些数据

a <- c(1,1,2,3,4,4)
b <- c("A","A","C","C","D","D")

df <- data.frame(a,b)

  a b count
1 1 A     1
2 1 A     2
3 2 C     1
4 3 C     1
5 4 D     1
6 4 D     2

4 回复 | 直到 8 年前

user7298145 user7298145 8 年前

df$count <- with(df, ave(rep(1, nrow(df)), b, a, FUN = cumsum))

AK88 8 年前

试试这个:

df$count = sequence(rle(df$a)$lengths)
df

akrun 8 年前

我们可以用 data.table

library(data.table)
setDT(df)[, count := seq_len(.N), .(a, b)]
df
#    a b count
#1: 1 A     1
#2: 1 A     2
#3: 2 C     1
#4: 3 C     1
#5: 4 D     1
#6: 4 D     2

Josh 7 年前

我有一个类似的问题,但只需要根据1列中的信息计算重复项。user7298145的回答在小数据帧中效果很好,但我的数据有20k行,失败了,出现了错误:

Error: memory exhausted (limit reached?)
Error during wrapup: memory exhausted (limit reached?)

所以我创建了一个 for

##  order the values that are duplicated
primary_duplicated <- primary_duplicated1[order(primary_duplicated1$md5), ]
##  create blank/NA column
primary_duplicated$count <- NA
##  set first value as 1
primary_duplicated$count[1] <- 1
##  set count of duplicates to 1 greater than the 
##  value of the preceding duplicate
for (i in 2:nrow(primary_duplicated)) {
      if (primary_duplicated$md5[i] == primary_duplicated$md5[i-1]) {
            primary_duplicated$count[i] <- primary_duplicated$count[i-1] + 1
      } else {
      ##  set the count value for the first incidence of
      ##  a duplicate as 1
            primary_duplicated$count[i] <- 1
      }
}

推荐文章

Amp · 使用R ggplot2删除geom_radial中axis.line和panel.border之间的空格

11 月前

Hard_Course · 用另一列中的值替换行的最后一个非NA条目

11 月前

Mark R · 使用geom_sf()删除地球仪上不需要的网格线

11 月前

Joe · 根据对工作日和本周早些时候的日期的了解,找到一个日期

11 月前

Ben · 统计向量中的单词在字符串中出现的频率

11 月前

TheCodeNovice · R中符号格式的尾随零和其他问题[重复]

11 月前

katefull06 · 在R中使用terra修改范围时,会为单独的SpatRaster重写范围

12 月前

dez93_2000 · 在R管道子功能中引用管道对象的当前状态

1 年前

accibio · 在ggplot2中为同一变量创建两个连续的颜色渐变比例

1 年前

Mankka · 如何在Ggplot2中绘制均匀的径向图

1 年前