代码之家  ›  专栏  ›  技术社区  ›  Alex Holcombe

为数据帧创建一个新的计算列,每行原始数据帧有多个值

  •  0
  • Alex Holcombe  · 技术社区  · 10 月前

    有时我想根据数据帧每行的一列计算多个值。 在tidyverse,dplyr现在提供 reframe 作为对应 mutate 为此,吐出一个较长的数据帧,但它不输出原始数据帧其他列的值,因此失去了与原始数据帧的对应关系。

    原始数据帧应如何与新数据帧组合?

    例如,假设我们想制作一份报告,说明《星球大战》中的每个角色是否足够轻,可以在各种体重级别的奥运会上装箱。

        boxingPopularWeightClasses <- c(57,80,92,9999) #kilograms
        boxingWeightClassName<-c("Featherweight","Middleweight","Heavyweight","Super Heavyweight")
    
        checkAllowedToCompeteInEachWeightClass <- function(mass) {
          mass < boxingPopularWeightClasses
        }
    
        library(dplyr)
    
        allowed <- starwars %>% rowwise() |>
          reframe( class = boxingWeightClassName,
                   criterion = boxingPopularWeightClasses, 
                   allowed = checkAllowedToCompeteInEachWeightClass(mass)
          ) 
    
    head(allowed)
    

    以上计算资格( allowed )对于每个字符,吐出以下内容,其中其他列(如字符名称)已丢失。

    # A tibble: 6 × 3
      class             criterion allowed
      <chr>                 <dbl> <lgl>  
    1 Featherweight            57 FALSE  
    2 Middleweight             80 TRUE   
    3 Heavyweight              92 TRUE   
    4 Super Heavyweight      9999 TRUE   
    5 Featherweight            57 FALSE  
    6 Middleweight             80 TRUE  
    

    我们如何将其与原始数据帧重新组合?

    2 回复  |  直到 10 月前
        1
  •  1
  •   thelatemail    10 月前

    我想你只是想要一个 cross_join -它将连接数据集的每一行 x 每一个 y ,之后你可以做你的 allowed 计算:

    out <- cross_join(starwars, data.frame(boxingPopularWeightClasses, boxingWeightClassName)) %>%
        mutate(allowed = mass < boxingPopularWeightClasses)
    head(out[c(1:3,15:17)])
    ## A tibble: 6 × 6
    #  name         height  mass boxingPopularWeightC…¹ boxingWeightClassName allowed
    #  <chr>         <int> <dbl>                  <dbl> <chr>                 <lgl>  
    #1 Luke Skywal…    172    77                     57 Featherweight         FALSE  
    #2 Luke Skywal…    172    77                     80 Middleweight          TRUE   
    #3 Luke Skywal…    172    77                     92 Heavyweight           TRUE   
    #4 Luke Skywal…    172    77                   9999 Super Heavyweight     TRUE   
    #5 C-3PO           167    75                     57 Featherweight         FALSE  
    #6 C-3PO           167    75                     80 Middleweight          TRUE   
    ## ℹ abbreviated name: ¹​boxingPopularWeightClasses
    
        2
  •  1
  •   Alex Holcombe    10 月前

    可以使用 slice rep 复制原始数据帧的每一行,使其具有与新数据帧相同的行数(权重类数*字符数):

    #Duplicate original dataframe, each row repeat number of weightclasses times
    longer_starwars<- starwars %>% slice(rep(1:n(), each = length(boxingWeightClassName)))
    
    #Combine the new columns with the old (only the first 6 columns, for readability)
    starwarsBoxingEligibility<- cbind(longer_starwars[,1:6], allowed)
    
    head(starwarsBoxingEligibility
    

    以上给出了我想要的结果(见下文),尽管也许有更好的方法:

                name height mass hair_color skin_color eye_color             class criterion allowed
    1 Luke Skywalker    172   77      blond       fair      blue     Featherweight        57   FALSE
    2 Luke Skywalker    172   77      blond       fair      blue      Middleweight        80    TRUE
    3 Luke Skywalker    172   77      blond       fair      blue       Heavyweight        92    TRUE
    4 Luke Skywalker    172   77      blond       fair      blue Super Heavyweight      9999    TRUE
    5          C-3PO    167   75       <NA>       gold    yellow     Featherweight        57   FALSE
    6          C-3PO    167   75       <NA>       gold    yellow      Middleweight        80    TRUE