代码之家 › 专栏 › 技术社区 › DeduciveR

从JSON生成的包含偶尔缺少元素的多级列表中提取到数据帧

purrr r

DeduciveR · 技术社区 · 6 年前

我正在通过一个API提取足球数据-结果JSON作为一个列表返回; dput 下面的例子:

list(list(id = 10332894L, league_id = 8L, season_id = 12962L, 
aggregate_id = NULL, venue_id = 201L, localteam_id = 51L, 
visitorteam_id = 27L, weather_report = list(code = "drizzle", 
    temperature = list(temp = 53.92, unit = "fahrenheit"), 
    clouds = "90%", humidity = "87%", wind = list(speed = "12.75 m/s", 
        degree = 200L)), attendance = 25098L, leg = "1/1", 
deleted = FALSE, referee = list(data = list(id = 15267L, 
    common_name = "L. Probert", fullname = "Lee Probert", 
    firstname = "Lee", lastname = "Probert"))), list(id = 10332895L, 
league_id = 8L, season_id = 12962L, aggregate_id = NULL, 
venue_id = 340L, localteam_id = 251L, visitorteam_id = 78L, 
weather_report = list(code = "drizzle", temperature = list(
    temp = 50.07, unit = "fahrenheit"), clouds = "90%", humidity = "93%", 
    wind = list(speed = "6.93 m/s", degree = 160L)), attendance = 22973L, 
leg = "1/1", deleted = FALSE, referee = list(data = list(
    id = 15273L, common_name = "M. Oliver", fullname = "Michael Oliver", 
    firstname = "Michael", lastname = "Oliver"))))

我现在正在使用for循环提取-当完整数据中有数百个数据时,reprex会显示两个顶级列表项。使用循环的主要缺点是有时会丢失导致循环停止的值。我想把这个搬到 purrr 但我仍在努力使用 at_depth 或 modify_depth . 嵌套中也有嵌套,这确实增加了复杂性。

结束状态应该是一个整洁的数据框架——从这个数据来看,df将只有2行,但是将有许多列,每个列代表一个项目,不管该项目嵌套在这个列表中的什么地方。如果有东西不见了,那应该是 NA 价值。

一个解决方案的理想方案是,即使它可能不太好,每个级别/生成的嵌套项都有一个数据帧,然后可以绑定在一起。

谢谢。

1 回复 | 直到 6 年前

A. Suliman 6 年前

步骤1:替换 NULL 具有 NA 使用社区wiki功能 here

simple_rapply <- function(x, fn)
{
  if(is.list(x))
  {
    lapply(x, simple_rapply, fn)
  } else
  {
    fn(x)
  }
}    
non.null.l <- simple_rapply(l, function(x) if(is.null(x)) NA else x)

第二步:

library(purrr)
map_df(map(non.null.l,unlist),bind_rows)

推荐文章

MCP_infiltrator · 在“tible”中添加一列,给出其列表位置

3 年前

sbac · 如何对R中数据帧的几列应用统计测试

3 年前

John-Henry · 使用“purrr::map”将“lm”对象循环到“broom::tidy”`

3 年前

deschen · 将列乘以向量的tidyverse解决方案

3 年前

Nuller · 无法对嵌套数据中的列进行子集设置。框架

3 年前

chrischi · R Purrr-系数最高

7 年前

Richard Herron · 从purr的pmap()调用var()返回NA

7 年前

Davide Lorino · 嵌套数据帧上的函数向量算法

7 年前

âÊÊá¸á¸½á¸ · 如何使用purrr从两个元素的列表中提取元素?

7 年前

andrew_reece · map\u dfr:填充。具有字符串标签的id列,而不是没有字符串标签的索引。x已命名

7 年前