代码之家 › 专栏 › 技术社区 › Alex Holcombe

为数据帧创建一个新的计算列,每行原始数据帧有多个值

tidyverse dplyr r

Alex Holcombe · 技术社区 · 10 月前

有时我想根据数据帧每行的一列计算多个值。在tidyverse,dplyr现在提供 reframe 作为对应 mutate 为此,吐出一个较长的数据帧,但它不输出原始数据帧其他列的值,因此失去了与原始数据帧的对应关系。

原始数据帧应如何与新数据帧组合?

例如,假设我们想制作一份报告,说明《星球大战》中的每个角色是否足够轻,可以在各种体重级别的奥运会上装箱。

    boxingPopularWeightClasses <- c(57,80,92,9999) #kilograms
    boxingWeightClassName<-c("Featherweight","Middleweight","Heavyweight","Super Heavyweight")

    checkAllowedToCompeteInEachWeightClass <- function(mass) {
      mass < boxingPopularWeightClasses
    }

    library(dplyr)

    allowed <- starwars %>% rowwise() |>
      reframe( class = boxingWeightClassName,
               criterion = boxingPopularWeightClasses, 
               allowed = checkAllowedToCompeteInEachWeightClass(mass)
      ) 

head(allowed)

以上计算资格( allowed )对于每个字符,吐出以下内容,其中其他列(如字符名称)已丢失。

# A tibble: 6 Ã 3
  class             criterion allowed
  <chr>                 <dbl> <lgl>  
1 Featherweight            57 FALSE  
2 Middleweight             80 TRUE   
3 Heavyweight              92 TRUE   
4 Super Heavyweight      9999 TRUE   
5 Featherweight            57 FALSE  
6 Middleweight             80 TRUE

我们如何将其与原始数据帧重新组合?

2 回复 | 直到 10 月前

thelatemail 10 月前

我想你只是想要一个 cross_join -它将连接数据集的每一行 x 每一个 y ,之后你可以做你的 allowed 计算:

out <- cross_join(starwars, data.frame(boxingPopularWeightClasses, boxingWeightClassName)) %>%
    mutate(allowed = mass < boxingPopularWeightClasses)
head(out[c(1:3,15:17)])
## A tibble: 6 Ã 6
#  name         height  mass boxingPopularWeightCâ¦Â¹ boxingWeightClassName allowed
#  <chr>         <int> <dbl>                  <dbl> <chr>                 <lgl>  
#1 Luke Skywalâ¦    172    77                     57 Featherweight         FALSE  
#2 Luke Skywalâ¦    172    77                     80 Middleweight          TRUE   
#3 Luke Skywalâ¦    172    77                     92 Heavyweight           TRUE   
#4 Luke Skywalâ¦    172    77                   9999 Super Heavyweight     TRUE   
#5 C-3PO           167    75                     57 Featherweight         FALSE  
#6 C-3PO           167    75                     80 Middleweight          TRUE   
## â¹ abbreviated name: Â¹âboxingPopularWeightClasses

Alex Holcombe 10 月前

可以使用 slice 和 rep 复制原始数据帧的每一行,使其具有与新数据帧相同的行数(权重类数*字符数):

#Duplicate original dataframe, each row repeat number of weightclasses times
longer_starwars<- starwars %>% slice(rep(1:n(), each = length(boxingWeightClassName)))

#Combine the new columns with the old (only the first 6 columns, for readability)
starwarsBoxingEligibility<- cbind(longer_starwars[,1:6], allowed)

head(starwarsBoxingEligibility

以上给出了我想要的结果(见下文),尽管也许有更好的方法:

            name height mass hair_color skin_color eye_color             class criterion allowed
1 Luke Skywalker    172   77      blond       fair      blue     Featherweight        57   FALSE
2 Luke Skywalker    172   77      blond       fair      blue      Middleweight        80    TRUE
3 Luke Skywalker    172   77      blond       fair      blue       Heavyweight        92    TRUE
4 Luke Skywalker    172   77      blond       fair      blue Super Heavyweight      9999    TRUE
5          C-3PO    167   75       <NA>       gold    yellow     Featherweight        57   FALSE
6          C-3PO    167   75       <NA>       gold    yellow      Middleweight        80    TRUE