代码之家  ›  专栏  ›  技术社区  ›  Max

在R中按列取消数据帧的堆叠

r
  •  0
  • Max  · 技术社区  · 5 年前

    我想将R中的数据帧按两列展开,即从

    id   segment  count  freq
    1    a        x1a    f1a
    1    b        x1b    f1b
    1    c        x1c    f1c
    2    a        x2a    f2a
    2    b        x2b    f2b
    2    c        x2c    f2c
    

    我想得到:

    id   count_a  count_b count_c freq_a freq_b freq_c
    1    x1a      x1b     x1c     f1a    f1b    f1c
    2    x2a      x2b     x2c     f2a    f2b    f2c
    

    基本上,这相当于通过前两列id和segment来取消数据帧的堆栈。但是,我不知道如何使用R中的unstack()函数来实现这一点。我可以使用一种非常简单的方法(嵌套for循环、连接列名等,然后绑定)来实现这一点,但是必须有一种更直接、更有效的方法。

    1 回复  |  直到 5 年前
        1
  •  1
  •   akrun    5 年前

    pivot_wider

    library(dplyr)
    library(tidyr)
    df1 %>%       
       pivot_wider(names_from = c(segment), values_from = c(count, freq))
    # A tibble: 2 x 7
    #     id count_a count_b count_c freq_a freq_b freq_c
    #  <int> <chr>   <chr>   <chr>   <chr>  <chr>  <chr> 
    #1     1 x1a     x1b     x1c     f1a    f1b    f1c   
    #2     2 x2a     x2b     x2c     f2a    f2b    f2c   
    

    或与 dcast

    library(data.table)
    dcast(setDT(df1), id ~ segment, value.var = c('count', 'freq'))
    #   id count_a count_b count_c freq_a freq_b freq_c
    #1:  1     x1a     x1b     x1c    f1a    f1b    f1c
    #2:  2     x2a     x2b     x2c    f2a    f2b    f2c
    

    更新

    如果存在重复项,则创建序列列

    df1 %>%
       mutate(rn = rowid(segment)) %>%
        pivot_wider(names_from = c(segment), values_from = c(count, freq)) %>%
       select(-rn)
    

    data.table

    dcast(setDT(df1), id + rowid(segment) ~ segment, 
           alue.var = c('count', 'freq'))[, segment := NULL][]
    

    数据

    df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L), segment = c("a", 
    "b", "c", "a", "b", "c"), count = c("x1a", "x1b", "x1c", "x2a", 
    "x2b", "x2c"), freq = c("f1a", "f1b", "f1c", "f2a", "f2b", "f2c"
    )), class = "data.frame", row.names = c(NA, -6L))