代码之家  ›  专栏  ›  技术社区  ›  duhaime

找到共享一个公共级别的两个级别的值的比例

r
  •  0
  • duhaime  · 技术社区  · 6 年前

    我有这样一个数据帧:

    group <- c('a', 'b', 'a', 'b')
    year <- c(1990, 1990, 2000, 2000)
    freq <- c(100, 120, 130, 170)
    df <- data.frame(group, year, freq)
    

    对于每一个不同的年份,我想找到组所在行的freq值 a b 并将这些比例值添加到数据帧中。生成的数据帧应如下所示:

    group <- c('a', 'b', 'c', 'a', 'b', 'c')
    year <- c(1990, 1990, 1990, 2000, 2000, 2000)
    freq <- c(100, 120, 100/120, 130, 170, 130/170)
    df <- data.frame(group, year, freq)
    

    我试着用下面最难看的环线把车开走,但把火车从铁轨上弄下来了。如果有人能帮我演示如何在R中完成这项基本任务,我将不胜感激!

    for (year in unique(df$year)) {
      a = df[ which(df$group == 'a' & df$year == year), ]
      b = df[ which(df$group == 'b' & df$year == year), ]
      proportion = a$freq / b$freq
      row = c('c', year, proportion)
      rbind(df, row)
    }
    
    3 回复  |  直到 6 年前
        1
  •  3
  •   Maurits Evers    6 年前

    tidyverse 选项

    library(tidyverse)
    df %>%
        spread(group, freq) %>%
        mutate(c = a / b) %>%
        gather(group, freq, -year) %>%
        arrange(year, group)
    #  year group        freq
    #1 1990     a 100.0000000
    #2 1990     b 120.0000000
    #3 1990     c   0.8333333
    #4 2000     a 130.0000000
    #5 2000     b 170.0000000
    #6 2000     c   0.7647059
    

    spread 数据从长到宽,添加一列 c = a / b gather

        2
  •  0
  •   pogibas    6 年前

    使用函数按年份拆分原始 split (结果是一个列表)。

    foo <- split(df, df$year)
    

    对于列表中的每个条目 foo x 用新的数据框已经计算了 freq

    bar <- lapply(foo, function(x)
                  rbind(x, data.frame(group = "c", 
                                      year = x$year[1], 
                                      freq = x$freq[1] / x$freq[2])))
    
    # Bind back final result as it's a list (lapply result)
    do.call(rbind, bar)
    
        3
  •  0
  •   akrun    6 年前

    这里有一个使用 data.table setDT(df) ),按“年”分组,将“group”与“c”连接起来,将“freq”与“freq”元素的比率连接起来

    library(data.table)
    setDT(df)[, .(group = c(group, 'c'), freq = c(freq, freq[1]/freq[2])), .(year)]
    #   year group        freq
    #1: 1990     a 100.0000000
    #2: 1990     b 120.0000000
    #3: 1990     c   0.8333333
    #4: 2000     a 130.0000000
    #5: 2000     b 170.0000000
    #6: 2000     c   0.7647059
    

    或者 rbind

    rbind(setDT(df), df[, .(freq = Reduce(`/`, freq), group = 'c'), .(year)])
    

    或使用 tidyverse

    library(tidyverse)
    df %>% 
       group_by(year) %>% 
       summarise(group = list(c(group, 'c')), 
                freq = list(c(freq, freq[1]/freq[2]))) %>% 
       unnest
    # A tibble: 6 x 3
    #   year group    freq
    #  <dbl> <chr>   <dbl>
    #1  1990 a     100    
    #2  1990 b     120    
    #3  1990 c       0.833
    #4  2000 a     130    
    #5  2000 b     170    
    #6  2000 c       0.765
    

    数据

    df <- structure(list(group = c("a", "b", "a", "b"), year = c(1990, 
    1990, 2000, 2000), freq = c(100, 120, 130, 170)), row.names = c(NA, 
    -4L), class = "data.frame")