代码之家  ›  专栏  ›  技术社区  ›  Adrian

R:将列表转换为data.frame

  •  2
  • Adrian  · 技术社区  · 7 年前
    mylist <- list(NULL, structure(list(Gender = structure(1L, .Label = "Female", class = "factor"), 
      ID = structure(1L, .Label = "1", class = "factor"), Class = structure(1L, .Label = "A", class = "factor"), 
      Score1 = 21.6, Score2 = 39.61, Score3 = 8.85, 
      Score4 = 13.66, Score5 = 2.64999999999999, Score6 = 6.94736842105265), .Names = c("Gender", 
      "ID", "Class", "Score1", "Score2", "Score3", 
      "Score4", "Score5", "Score6"), row.names = c(NA, -1L
      ), class = "data.frame"), list(structure(list(Gender = structure(1:2, .Label = c("Female", 
      "Male"), class = "factor"), ID = structure(c(1L, 1L), .Label = "2", class = "factor"), 
      Class = structure(c(1L, 1L), .Label = "A", class = "factor"), 
      Score1 = c(25.58, 18.31), Score2 = c(55.01, 
      36.28), Score3 = c(1.66, 2.13), Score4 = c(3.6, 
      4.24), Score5 = c(15.2727272727273, 8.57142857142858), Score6 = c(15.5833333333334, 
      8.4545454545455)), .Names = c("Gender", "ID", "Class", 
      "Score1", "Score2", "Score3", "Score4", 
      "Score5", "Score6"), row.names = c(NA, -2L), class = "data.frame"), 
      structure(list(Gender = structure(1:2, .Label = c("Female", 
      "Male"), class = "factor"), ID = structure(c(1L, 1L), .Label = "3", class = "factor"), 
      Class = structure(c(1L, 1L), .Label = "A", class = "factor"), 
      Score1 = c(27.16, 14.67), Score2 = c(58.39, 
      29.07), Score3 = c(1.66, 2.13), Score4 = c(3.6, 
      4.24), Score5 = c(16.2727272727273, 6.85714285714286), 
      Score6 = c(16.5833333333333, 6.81818181818185)), .Names = c("Gender", 
      "ID", "Class", "Score1", "Score2", 
      "Score3", "Score4", "Score5", "Score6"), row.names = c(NA,-2L), class = "data.frame")))
    

    我有一个清单,看起来像:

    > mylist
    [[1]]
    NULL
    
    [[2]]
      Gender ID Class Score1 Score2 Score3 Score4 Score5   Score6
    1 Female  1     A   21.6  39.61   8.85  13.66   2.65 6.947368
    
    [[3]]
    [[3]][[1]]
      Gender ID Class Score1 Score2 Score3 Score4    Score5    Score6
    1 Female  2     A  25.58  55.01   1.66   3.60 15.272727 15.583333
    2   Male  2     A  18.31  36.28   2.13   4.24  8.571429  8.454545
    
    [[3]][[2]]
      Gender ID Class Score1 Score2 Score3 Score4    Score5    Score6
    1 Female  3     A  27.16  58.39   1.66   3.60 16.272727 16.583333
    2   Male  3     A  14.67  29.07   2.13   4.24  6.857143  6.818182
    

    其中有些元素 NULL ,其他元素可以有多个子元素,例如,第三个元素有子元素 [[1]] [[2]] .

    我想将这些列表元素组合成一个类似这样的data.frame(为了方便起见,我省略了score2到score6列的内容):

      Gender ID Class Score1 Score2 ... Score6
    1 Female  1     A  21.60     
    2 Female  2     A  25.58    
    3   Male  2     A  18.31    
    4 Female  3     A  27.16    
    5   Male  3     A  14.67    
    

    我试过以下方法,但有错误

    > tab <- unlist(mylist, recursive = FALSE)
    > df <- do.call("rbind", tab)
    Warning in `[<-.factor`(`*tmp*`, ri, value = 1L) :
      invalid factor level, NA generated
    Warning in `[<-.factor`(`*tmp*`, ri, value = 1L) :
      invalid factor level, NA generated
    ...
    

    使用 ldply 无法正确处理最后一个元素

    > ldply(mylist, data.frame)
      Gender ID Class Score1 Score2 Score3 Score4    Score5    Score6 Gender.1 ID.1 Class.1 Score1.1 Score2.1 Score3.1 Score4.1  Score5.1  Score6.1
    1 Female  1     A  21.60  39.61   8.85  13.66  2.650000  6.947368     <NA> <NA>    <NA>       NA       NA       NA       NA        NA        NA
    2 Female  2     A  25.58  55.01   1.66   3.60 15.272727 15.583333   Female    3       A    27.16    58.39     1.66     3.60 16.272727 16.583333
    3   Male  2     A  18.31  36.28   2.13   4.24  8.571429  8.454545     Male    3       A    14.67    29.07     2.13     4.24  6.857143  6.818182
    
    3 回复  |  直到 7 年前
        1
  •  2
  •   Juan Antonio Roldán Díaz    7 年前

    试试这个:

    ll <- unlist(lapply(mylist, function(x) if(is.data.frame(x)) list(x) else x), recursive = FALSE)
    do.call(rbind, ll)
      Gender ID Class Score1 Score2 Score3 Score4    Score5    Score6
    1 Female  1     A  21.60  39.61   8.85  13.66  2.650000  6.947368
    2 Female  2     A  25.58  55.01   1.66   3.60 15.272727 15.583333
    3   Male  2     A  18.31  36.28   2.13   4.24  8.571429  8.454545
    4 Female  3     A  27.16  58.39   1.66   3.60 16.272727 16.583333
    5   Male  3     A  14.67  29.07   2.13   4.24  6.857143  6.818182
    
        2
  •  5
  •   camille    7 年前

    你只需要几个 tidyverse 功能。 purrr::reduce 允许在列表或向量上应用函数,以及 dplyr::bind_rows 就像一个扩展的,更聪明的 rbind .

    注意,就像我在评论中所说的,你会得到关于你将字符向量与因子向量绑定的警告,但是这只是一个 警告 不是一个 错误 .

    purrr::reduce(mylist, dplyr::bind_rows)
    
    #> Warning in bind_rows_(x, .id): binding character and factor vector,
    #> coercing into character vector
    
    ...
    
    #>   Gender ID Class Score1 Score2 Score3 Score4    Score5    Score6
    #> 1 Female  1     A  21.60  39.61   8.85  13.66  2.650000  6.947368
    #> 2 Female  2     A  25.58  55.01   1.66   3.60 15.272727 15.583333
    #> 3   Male  2     A  18.31  36.28   2.13   4.24  8.571429  8.454545
    #> 4 Female  3     A  27.16  58.39   1.66   3.60 16.272727 16.583333
    #> 5   Male  3     A  14.67  29.07   2.13   4.24  6.857143  6.818182
    
        3
  •  1
  •   moodymudskipper    7 年前

    我建议 unlist 调用的函数 unlist_unless 它的工作原理是 未列出的 区别如下:

    • 它有一个 predicate 用于保持某些子元素未触及的参数(如果 purrr 已安装)
    • ... 将参数传递给 谓语
    • keep_null 用于保留(默认)或删除 NULL 元素

    喜欢 未列出的 它具有参数 recursive use.names 具有相同的默认值。参数设置为 TRUE 默认情况下,它还具有 保持零位 我设置的参数 真的 默认情况下。

    unlist_unless <- function(x, predicate = function(x) FALSE, ..., recursive = TRUE,  use.names = TRUE, keep_null = TRUE){
      if(inherits(predicate, "formula")) {
        if (requireNamespace("purrr")) predicate <- purrr::as_mapper(predicate) else
          stop("Package `purrr` needs to be installed to use formula notation")
      }
    
      unlist(lapply(x, function(y){
        if(predicate(y, ...) || (keep_null && is.null(y)))
          list(y)
        else if (is.list(y) && recursive)
          unlist_unless(y, predicate = predicate, ..., keep_null=keep_null, use.names = use.names)
        else y}),
        recursive = FALSE,
        use.names = use.names)
    }
    

    示例1:最简单

    df <- head(iris)[1:3]
    dfs<- list(df[1,],
                 NULL,
                 list(df[2,],
                      df[3,],
                      list(df[4,]),
                      NULL))
    
    unlist_unless(dfs, is.data.frame)
    # [[1]]
    # Sepal.Length Sepal.Width Petal.Length
    # 1          5.1         3.5          1.4
    # 
    # [[2]]
    # NULL
    # 
    # [[3]]
    # Sepal.Length Sepal.Width Petal.Length
    # 2          4.9           3          1.4
    # 
    # [[4]]
    # Sepal.Length Sepal.Width Petal.Length
    # 3          4.7         3.2          1.3
    # 
    # [[5]]
    # Sepal.Length Sepal.Width Petal.Length
    # 4          4.6         3.1          1.5
    # 
    # [[6]]
    # NULL
    

    示例2:keep_null=false

    unlist_unless(dfs, is.data.frame, keep_null = FALSE)
    # [[1]]
    # Sepal.Length Sepal.Width Petal.Length
    # 1          5.1         3.5          1.4
    # 
    # [[2]]
    # Sepal.Length Sepal.Width Petal.Length
    # 2          4.9           3          1.4
    # 
    # [[3]]
    # Sepal.Length Sepal.Width Petal.Length
    # 3          4.7         3.2          1.3
    # 
    # [[4]]
    # Sepal.Length Sepal.Width Petal.Length
    # 4          4.6         3.1          1.5
    
    unlist_unless(dfs, is.data.frame, recursive = FALSE)
    # [[1]]
    # Sepal.Length Sepal.Width Petal.Length
    # 1          5.1         3.5          1.4
    # 
    # [[2]]
    # NULL
    # 
    # [[3]]
    # Sepal.Length Sepal.Width Petal.Length
    # 2          4.9           3          1.4
    # 
    # [[4]]
    # Sepal.Length Sepal.Width Petal.Length
    # 3          4.7         3.2          1.3
    # 
    # [[5]]
    # [[5]][[1]]
    # Sepal.Length Sepal.Width Petal.Length
    # 4          4.6         3.1          1.5
    # 
    # 
    # [[6]]
    # NULL
    

    那么直接打电话 bind_rows(dfs_new) do.call(rbind, dfs_new) 关于结果。

    do.call(rbind,unlist_unless(dfs, is.data.frame))
    
    # Sepal.Length Sepal.Width Petal.Length
    # 1          5.1         3.5          1.4
    # 2          4.9         3.0          1.4
    # 3          4.7         3.2          1.3
    # 4          4.6         3.1          1.5
    
    
    # or
    library(dplyr)
    unlist_unless(dfs, is.data.frame) %>% bind_rows
    #   Sepal.Length Sepal.Width Petal.Length
    # 1          5.1         3.5          1.4
    # 2          4.9         3.0          1.4
    # 3          4.7         3.2          1.3
    # 4          4.6         3.1          1.5