代码之家 › 专栏 › 技术社区 › DeltaIV

使用dplyr以编程方式对任意变量进行乘法、选择和可选分组

tidyeval dplyr group-by select r

DeltaIV · 技术社区 · 6 年前

在我的代码中,它使用 dplyr ,我经常对一个dataframe变量执行某些操作(这里假设简单地乘以2,以简化MRE),还可以选择对另一个变量进行分组,然后 select 只有一些结果变量。为了防止代码重复,我想编写一个函数。

library(ggplot2)
msleep_mini <- msleep[1:10, ]

函数必须再现以下行为。如果是一个单独的论点,比如说, sleep_total ,它只会成倍增长 ,并返回包含列的数据帧 name , vore order 和 :

# test_1
msleep_mini %>%
  group_double_select(sleep_total)
#> # A tibble: 20 x 4
#>    name                       vore  order           sleep_total
#>    <chr>                      <chr> <chr>                 <dbl>
#>  1 Cheetah                    carni Carnivora              24.2
#>  2 Owl monkey                 omni  Primates               34  
#>  3 Mountain beaver            herbi Rodentia               28.8
#>  4 Greater short-tailed shrew omni  Soricomorpha           29.8
#>  5 Cow                        herbi Artiodactyla            8  
#>  6 Three-toed sloth           herbi Pilosa                 28.8
#>  7 Northern fur seal          carni Carnivora              17.4
#>  8 Vesper mouse               <NA>  Rodentia               14  
#>  9 Dog                        carni Carnivora              20.2
#> 10 Roe deer                   herbi Artiodactyla            6

id 列(包含每个组内的累进行号)将添加到数据帧中。换句话说,输出将是

# test_2
msleep_mini %>%
  group_double_select(sleep_total, vore)
#> # A tibble: 20 x 5
#> # Groups:   vore [4]
#>    vore  name                       order           sleep_total    id
#>    <chr> <chr>                      <chr>                 <dbl> <int>
#>  1 carni Cheetah                    Carnivora              24.2     1
#>  2 carni Northern fur seal          Carnivora              17.4     2
#>  3 carni Dog                        Carnivora              20.2     3
#>  4 carni Long-nosed armadillo       Cingulata              34.8     4
#>  5 herbi Mountain beaver            Rodentia               28.8     1
#>  6 herbi Cow                        Artiodactyla            8       2
#>  7 herbi Three-toed sloth           Pilosa                 28.8     3
#>  8 herbi Roe deer                   Artiodactyla            6       4
#>  9 herbi Goat                       Artiodactyla           10.6     5
#> 10 herbi Guinea pig                 Rodentia               18.8     6

当然,函数必须处理任意变量(只要在数据帧中可以找到它们):

# test_3
msleep_mini %>%
  group_double_select(sleep_rem, order)
#> # A tibble: 20 x 5
#> # Groups:   order [9]
#>    order           name                       vore  sleep_rem    id
#>    <chr>           <chr>                      <chr>     <dbl> <int>
#>  1 Artiodactyla    Cow                        herbi       1.4     1
#>  2 Artiodactyla    Roe deer                   herbi      NA       2
#>  3 Artiodactyla    Goat                       herbi       1.2     3
#>  4 Carnivora       Cheetah                    carni      NA       1
#>  5 Carnivora       Northern fur seal          carni       2.8     2
#>  6 Carnivora       Dog                        carni       5.8     3
#>  7 Cingulata       Long-nosed armadillo       carni       6.2     1
#>  8 Didelphimorphia North American Opossum     omni        9.8     1
#>  9 Hyracoidea      Tree hyrax                 herbi       1       1
#> 10 Pilosa          Three-toed sloth           herbi       4.4     1

group_double_select 在一个健壮和可维护的方式是使用整洁的评估,但我可能是错的。你能帮助我吗?

0 回复 | 直到 6 年前

akrun 6 年前

我们可以利用 missing 检查函数中是否缺少参数

group_double_select <- function(data, colVar, groupVar) {
   colVar <- enquo(colVar)



   if(missing(groupVar)) {
        data %>% 
              select(name, vore, order, !!colVar) %>% 
              mutate(!! quo_name(colVar) :=  !! colVar * 2)


   } else {
       groupVar <- enquo(groupVar)
       data %>%
            select(name, vore, order, !!colVar) %>%
            mutate(!! quo_name(colVar) :=  !! colVar * 2) %>%
            group_by(!! groupVar) %>%
            mutate(id = row_number()) %>%
            arrange(!! groupVar)





}

}

msleep_mini %>%
       group_double_select(sleep_total, vore) %>%
       head
# A tibble: 6 x 5
# Groups:   vore [2]
#  name                 vore  order        sleep_total    id
#  <chr>                <chr> <chr>              <dbl> <int>
#1 Cheetah              carni Carnivora           24.2     1
#2 Northern fur seal    carni Carnivora           17.4     2
#3 Dog                  carni Carnivora           20.2     3
#4 Long-nosed armadillo carni Cingulata           34.8     4
#5 Mountain beaver      herbi Rodentia            28.8     1
#6 Cow                  herbi Artiodactyla         8       2



msleep_mini %>% 
       group_double_select(sleep_total) %>%
       head
# A tibble: 6 x 4
#  name                       vore  order        sleep_total
#  <chr>                      <chr> <chr>              <dbl>
#1 Cheetah                    carni Carnivora           24.2
#2 Owl monkey                 omni  Primates            34  
#3 Mountain beaver            herbi Rodentia            28.8
#4 Greater short-tailed shrew omni  Soricomorpha        29.8
#5 Cow                        herbi Artiodactyla         8  
#6 Three-toed sloth           herbi Pilosa              28.8




msleep_mini %>%
       group_double_select(sleep_rem, order) %>%
       head
# A tibble: 6 x 5
# Groups:   order [2]
#  name              vore  order        sleep_rem    id
#  <chr>             <chr> <chr>            <dbl> <int>
#1 Cow               herbi Artiodactyla       1.4     1
#2 Roe deer          herbi Artiodactyla      NA       2
#3 Goat              herbi Artiodactyla       1.2     3
#4 Cheetah           carni Carnivora         NA       1
#5 Northern fur seal carni Carnivora          2.8     2
#6 Dog               carni Carnivora          5.8     3