代码之家 › 专栏 › 技术社区 › camille

按列位置传递函数参数以在

purrr dplyr r

11

camille · 技术社区 · 7 年前

我正试图收紧一条裤子 %>% 管道式工作流,其中我需要对多个列应用相同的函数,但每次更改一个参数。我觉得 purrr 的 map 或 invoke 函数应该会有帮助,但我不能对此束手无策。

我的数据框架中有关于预期寿命、贫困率和家庭收入中位数的列。我可以将所有这些列名传递给 vars 在里面 mutate_at ,使用 round 作为应用于每个的函数,并可选地提供 digits 论点但我想不出一种方法,如果存在的话,可以传递不同的值 数字 与每个列关联。我希望寿命四舍五入到1位数,贫困四舍五入到2位数,收入四舍五入到0位数。

我可以打电话 mutate 在每一列上,但是考虑到我可能有更多的列都接收相同的函数,只改变了一个额外的参数,我想要更简洁的东西。

library(tidyverse)

df <- tibble::tribble(
        ~name, ~life_expectancy,          ~poverty, ~household_income,
  "New Haven", 78.0580437642378, 0.264221051111753,  42588.7592521085
  )

在我的想象中,我可以这样做:

df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), 
            round, digits = c(1, 2, 0))

但是,我得到了错误

mutate_impl(.data,dots)中出错: 纵队 life_expectancy 长度必须为1(行数),而不是3

使用 突变 变异 只是为了与理想情况下的语法相同:

df %>%
  mutate_at(vars(life_expectancy), round, digits = 1) %>%
  mutate_at(vars(poverty), round, digits = 2) %>%
  mutate_at(vars(household_income), round, digits = 0)
#> # A tibble: 1 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589

数字映射使用每个 数字 选择每个列,而不是按位置,给我3行,每行四舍五入到不同的位数。

df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), 
            function(x) map(x, round, digits = c(1, 2, 0))) %>%
  unnest()
#> # A tibble: 3 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.3            42589.
#> 2 New Haven            78.1    0.26           42589.
#> 3 New Haven            78      0              42589

^{于2018年11月13日由

reprex package

(v0.2.1)}

3 回复 | 直到 7 年前

1

11

moodymudskipper 7 年前

2解决方案

mutate 具有 !!!

invoke 这是个好主意,但现在你需要的更少了 tidyverse 功能支持 !!! 接线员,你可以这样做:

digits <- c(life_expectancy = 1, poverty = 2, household_income = 0)  
df %>% mutate(!!!imap(digits, ~round(..3[[.y]], .x),.))
# # A tibble: 1 x 4
#          name life_expectancy poverty household_income
#         <chr>           <dbl>   <dbl>            <dbl>
#   1 New Haven            78.1    0.26            42589

..3 是初始数据帧,作为第三个参数通过调用末尾的点传递给函数。

df %>% mutate(!!!imap(
  digits, 
  function(digit, name, data) round(data[[name]], digit),
  data = .))

如果需要从旧界面开始(尽管我建议的界面会更灵活),请首先:

digits <- setNames(c(1, 2, 0), c("life_expectancy", "poverty", "household_income"))

mutate_at 和 <<-

在这里,我们稍微改变一下避免这种情况的良好做法 <<- 只要有可能,但可读性很重要,这本书很容易阅读。

digits <- c(1, 2, 0)
i <- 0
df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), ~round(., digits[i<<- i+1]))
# A tibble: 1 x 4
#     name      life_expectancy poverty household_income
#     <chr>               <dbl>   <dbl>            <dbl>
#   1 New Haven            78.1    0.26            42589

(或只是 df %>% mutate_at(names(digits), ~round(., digits[i<<- i+1]))

2

Calum You 7 年前

这里有一个 map2 解决方案与亨里克的评论一致。然后可以将其包装到自定义函数中。我提供了一个粗略的第一次尝试,但我做了最少的测试,所以如果评估是奇怪的,它可能会在各种情况下中断。它也不使用tidyselect .at ,但两者都不是 modify_at ...

library(tidyverse)

df <- tibble::tribble(
  ~name, ~life_expectancy,          ~poverty, ~household_income,
  "New Haven", 78.0580437642378, 0.264221051111753,  42588.7592521085,
  "New York", 12.349685329, 0.324067934, 32156.230974623
)

rounded <- df %>%
  select(life_expectancy, poverty, household_income) %>%
  map2_dfc(
    .y = c(1, 2, 0),
    .f = ~ round(.x, digits = .y)
  )
df %>%
  select(-life_expectancy, -poverty, -household_income) %>%
  bind_cols(rounded)
#> # A tibble: 2 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589
#> 2 New York             12.3    0.32            32156


modify2_at <- function(.x, .y, .at, .f) {
  modified <- .x[.at] %>%
    map2(.y, .f)
  .x[.at] <- modified
  return(.x)
}

df %>%
  modify2_at(
    .y = c(1, 2, 0),
    .at = c("life_expectancy", "poverty", "household_income"),
    .f = ~ round(.x, digits = .y)
  )
#> # A tibble: 2 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589
#> 2 New York             12.3    0.32            32156

^{于2018年11月13日由

reprex package

(v0.2.1)}

3

2

Aurèle 7 年前

tidyeval的乐趣:

prepared_pairs <- 
  map2(
    set_names(syms(list("life_expectancy", "poverty", "household_income"))),
    c(1, 2, 0), 
    ~expr(round(!!.x, digits = !!.y))
  )

mutate(df, !!! prepared_pairs)

# # A tibble: 1 x 4
#   name      life_expectancy poverty household_income
#   <chr>               <dbl>   <dbl>            <dbl>
# 1 New Haven            78.1    0.26            42589