代码之家  ›  专栏  ›  技术社区  ›  Dr. Flow

使用dplyr从包含几个物种、处理和变量的数据框中计算百分比

  •  2
  • Dr. Flow  · 技术社区  · 9 年前

    问题

    创建包含百分比的新行

    数据

     df<- data.frame(
         species   = c ("A","A","A","A","B","B","B","B","A","A","A","A","B","B","B","B"),
         number    = c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2),
         treatment = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),
         variable  = c ("x","y","x","y","x","y","x","y","x","y","x","y","x","y","x","y"),
         value = sample(1:16)
        )
    

    问题

    我想计算给定数量和处理的物种的百分比。即变量x和y(第一行)的总和应为100%。

    我试过用dplyr:

    result <- df%>%
        group_by(variable) %>%
        mutate(percent = value*100/sum(value))
    
    test<-subset(result,variable=="x")
    sum(test[,6]) # sums to 100%
    

    “测试”是错误的,因为它是两个物种和两种处理的所有x的百分比。

    期望输出

     species number treatment variable value    percent
        A      1         0        x     40         40
        A      1         0        y     60         60
        A      2         0        x      1         10
        A      2         0        y      9         90
    
    2 回复  |  直到 9 年前
        1
  •  3
  •   Curt F.    9 年前

    下面是一个使用 tidyr :

    require(tidyr)
    require(dplyr) 
    
    df %>% spread(variable, value) %>% 
            mutate(percent.x = x / (x+y), 
                   percent.y = y / (x+y)) 
    

    这里还有一个 dplyr -唯一解决方案:

    df %>% group_by(number, treatment, species) %>% 
            mutate(percent = 100 * value / sum(value)) 
    

    你的问题是你在做什么 group_by() 完全错误的变量。因为您希望在特定的 (number, treatment, solution) 组合,但在您的 variable ,你应该 分组() 前者,而不是后者。

        2
  •  1
  •   BogdanC    9 年前

    这就是你要找的吗?我正在使用 data.table 包裹:

    library(data.table)
    DT <- as.data.table(df)
    
    DT_output <- DT[,list(value=sum(value)),by=c('species', 'number', 'treatment', 'variable')]
    DT_temp <- DT[,list(sum=sum(value)),by=c('species', 'number', 'treatment' )]
    
    T_output <- merge(DT_output, DT_temp, by = c('species', 'number', 'treatment'))
    
    DT_output[, percent := 100 * value / sum]
    
    setorder(DT_output, species,treatment,number,variable)
    DT_output