代码之家  ›  专栏  ›  技术社区  ›  Bridgbro

使用R聚合数据树上的值

  •  8
  • Bridgbro  · 技术社区  · 7 年前

    这就是我得到的:

    levelName hours totalhours 1 Ned NA 1 2 °--John 1 3 3 °--Kate 1 3 4 ¦--Dan 1 1 5 ¦--Ron 1 1 6 °--Sienna 1 1

    这就是我想要的:

    levelName hours totalHours 1 Ned NA 5 2 °--John 1 5 3 °--Kate 1 4 4 ¦--Dan 1 1 5 ¦--Ron 1 1 6 °--Sienna 1 1

    这是我的代码:

    # Install package
    install.packages('data.tree')
    library(data.tree)
    
    # Create data frame
    to <- c("Ned", "John", "Kate", "Kate", "Kate")
    from <- c("John", "Kate", "Dan", "Ron", "Sienna")
    hours <- c(1,1,1,1,1)
    df <- data.frame(from,to,hours)
    
    # Create data tree
    tree <- FromDataFrameNetwork(df)
    print(tree, "hours")
    
    # Get running total of hours that includes all nodes and children values.
    tree$Do(function(x) x$total <- Aggregate(x, "hours", sum), traversal = "post-order")
    print(tree, "hours", runningtotal = tree$Get(Aggregate, "total", sum))
    
    3 回复  |  直到 7 年前
        1
  •  9
  •   F. Privé    7 年前

    您可以简单地使用递归函数:

    myApply <- function(node) {
      node$totalHours <- 
        sum(c(node$hours, purrr::map_dbl(node$children, myApply)), na.rm = TRUE)
    }
    myApply(tree)
    print(tree, "hours", "totalHours")
    

               levelName hours totalHours
    1 Ned                   NA          5
    2  °--John               1          5
    3      °--Kate           1          4
    4          ¦--Dan        1          1
    5          ¦--Ron        1          1
    6          °--Sienna     1          1
    

    编辑: 填充两个元素:

    # Create data frame
    to <- c("Ned", "John", "Kate", "Kate", "Kate")
    from <- c("John", "Kate", "Dan", "Ron", "Sienna")
    hours <- c(1,1,1,1,1)
    hours2 <- 5:1
    df <- data.frame(from,to,hours, hours2)
    
    # Create data tree
    tree <- FromDataFrameNetwork(df)
    print(tree, "hours", "hours2")
    
    myApply <- function(node) {
      res.ch <- purrr::map(node$children, myApply)
      a <- node$totalHours <- 
        sum(c(node$hours,  purrr::map_dbl(res.ch, 1)), na.rm = TRUE)
      b <- node$totalHours2 <- 
        sum(c(node$hours2, purrr::map_dbl(res.ch, 2)), na.rm = TRUE)
      list(a, b)
    }
    myApply(tree)
    print(tree, "hours", "totalHours", "hours2", "totalHours2")
    

               levelName hours totalHours hours2 totalHours2
    1 Ned                   NA          5     NA          15
    2  °--John               1          5      5          15
    3      °--Kate           1          4      4          10
    4          ¦--Dan        1          1      3           3
    5          ¦--Ron        1          1      2           2
    6          °--Sienna     1          1      1           1
    
        2
  •  5
  •   eddi    7 年前

    这个 Aggregate Do 似乎只适用于同一领域:

    tree$Do(function(node) node$totalHours = node$hours)
    
    tree$Do(function(node) node$totalHours = sum(if(!node$isLeaf) node$totalHours else 0,
                                                 Aggregate(node, "totalHours", sum)),
            traversal = "post-order")
    print(tree, "hours", "totalHours")
    #           levelName hours totalHours
    #1 Ned                   NA          5
    #2  °--John               1          5
    #3      °--Kate           1          4
    #4          ¦--Dan        1          1
    #5          ¦--Ron        1          1
    #6          °--Sienna     1          1
    
        3
  •  3
  •   Christoph Glur    7 年前

    数据的聚合函数。如果您想要递归地对子项求和,则tree包特别有用。在您的情况下,您需要做两件事:

    1. 将总和存储在单独的变量中

    library(data.tree)
    
    # Create data frame
    to <- c("Ned", "John", "Kate", "Kate", "Kate")
    from <- c("John", "Kate", "Dan", "Ron", "Sienna")
    hours <- c(1,1,1,1,1)
    df <- data.frame(from,to,hours)
    
    # Create data tree
    tree <- FromDataFrameNetwork(df)
    print(tree, "hours")
    
    # Get running total of hours that includes all nodes and children values.
    tree$Do(function(x) x$total <- ifelse(is.null(x$hours), 0, x$hours) + sum(Get(x$children, "total")), traversal = "post-order")
    print(tree, "hours", "total")