代码之家  ›  专栏  ›  技术社区  ›  a_todd12

如何使用R中foreach循环的Stata等价物与mutate()[复制]

  •  0
  • a_todd12  · 技术社区  · 6 月前

    我的原始数据中有4个不同的变量(都以前缀开头 生的_ ),其次是时间/金钱和学校/社交的组合。在我的真实数据中,我对变量的每个值都有更多的选择,所以最终需要迭代4次,这是一段相当长的代码。

    在R中,必须有一种简单的方法来执行Stata中的foreach循环(我更经常使用它进行数据清理)的等效操作,将4次迭代压缩为1次。


    在Stata中,循环看起来像:

    foreach suffix in hours_school hours_social money_school money_social {
       gen olderch_`suffix' = .
       replace olderch_`suffix' = 90 if raw_`suffix' == "90% or more 10% or less"
       replace olderch_`suffix' = 10 if raw_`suffix' == "10% or less 90% or more"
    }
    

    这是我目前的R代码版本,我想用类似的方式进行压缩:

    d <- d %>% 
      mutate(olderch_hours_school = case_when (
        raw_hours_school == "90% or more 10% or less" ~ 90,
        raw_hours_school == "10% or less 90% or more" ~ 10
      ))
    
    d <- d %>% 
      mutate(olderch_hours_social = case_when (
        raw_hours_social == "90% or more 10% or less" ~ 90,
        raw_hours_social == "10% or less 90% or more" ~ 10
      ))
    
    d <- d %>% 
      mutate(olderch_money_school = case_when (
        raw_money_school == "90% or more 10% or less" ~ 90,
        raw_money_school == "10% or less 90% or more" ~ 10
      ))
    
    d <- d %>% 
      mutate(olderch_money_social = case_when (
        raw_money_social == "90% or more 10% or less" ~ 90,
        raw_money_social == "10% or less 90% or more" ~ 10
      ))
    
    
    1 回复  |  直到 6 月前
        1
  •  2
  •   Seth    6 月前

    编辑:根据评论中的要求,答案示例会根据原始内容创建新列。

    dplyr::across() ( docs )允许您在数据帧中的多列之间应用函数。你可以调整你的 case_when 通过添加语句 ~ 创建一个lambda函数并使用 . 代表每一列。

    您可以提供正在转换的所有列的向量,或使用 selection helper ,例如 everything() starts_with() .

    测试数据遵循示例。

    library(dplyr)
    
    glimpse(df)
    #> Rows: 10
    #> Columns: 4
    #> $ raw_hours_school <chr> "90% or more 10% or less", "90% or more 10% or less",…
    #> $ raw_hours_social <chr> "10% or less 90% or more", "10% or less 90% or more",…
    #> $ raw_money_school <chr> "90% or more 10% or less", "10% or less 90% or more",…
    #> $ raw_money_social <chr> "90% or more 10% or less", "10% or less 90% or more",…
    
    df |>
      mutate(across(
        c(raw_hours_school, raw_hours_social,
          raw_money_school, raw_money_social),
        ~ case_when(. ==  "90% or more 10% or less" ~ 90,
                    . == "10% or less 90% or more" ~ 10),
        .names = "new_{.col}")) %>%
      setNames(gsub('_raw', '', names(.)))
    #>           raw_hours_school        raw_hours_social        raw_money_school
    #> 1  90% or more 10% or less 10% or less 90% or more 90% or more 10% or less
    #> 2  90% or more 10% or less 10% or less 90% or more 10% or less 90% or more
    #> 3  90% or more 10% or less 10% or less 90% or more 90% or more 10% or less
    #> 4  10% or less 90% or more 90% or more 10% or less 90% or more 10% or less
    #> 5  90% or more 10% or less 10% or less 90% or more 90% or more 10% or less
    #> 6  10% or less 90% or more 90% or more 10% or less 90% or more 10% or less
    #> 7  10% or less 90% or more 10% or less 90% or more 10% or less 90% or more
    #> 8  10% or less 90% or more 90% or more 10% or less 10% or less 90% or more
    #> 9  90% or more 10% or less 90% or more 10% or less 90% or more 10% or less
    #> 10 90% or more 10% or less 90% or more 10% or less 10% or less 90% or more
    #>           raw_money_social new_hours_school new_hours_social new_money_school
    #> 1  90% or more 10% or less               90               10               90
    #> 2  10% or less 90% or more               90               10               10
    #> 3  90% or more 10% or less               90               10               90
    #> 4  10% or less 90% or more               10               90               90
    #> 5  10% or less 90% or more               90               10               90
    #> 6  90% or more 10% or less               10               90               90
    #> 7  90% or more 10% or less               10               10               10
    #> 8  90% or more 10% or less               10               90               10
    #> 9  90% or more 10% or less               90               90               90
    #> 10 10% or less 90% or more               90               90               10
    #>    new_money_social
    #> 1                90
    #> 2                10
    #> 3                90
    #> 4                10
    #> 5                10
    #> 6                90
    #> 7                90
    #> 8                90
    #> 9                90
    #> 10               10
    
    df |>
      mutate(across(
       starts_with("raw"),
        ~ case_when(. ==  "90% or more 10% or less" ~ 90,
                    . == "10% or less 90% or more" ~ 10),
       .names = "new_{.col}")) %>%
      setNames(gsub('_raw', '', names(.)))
    #>           raw_hours_school        raw_hours_social        raw_money_school
    #> 1  90% or more 10% or less 10% or less 90% or more 90% or more 10% or less
    #> 2  90% or more 10% or less 10% or less 90% or more 10% or less 90% or more
    #> 3  90% or more 10% or less 10% or less 90% or more 90% or more 10% or less
    #> 4  10% or less 90% or more 90% or more 10% or less 90% or more 10% or less
    #> 5  90% or more 10% or less 10% or less 90% or more 90% or more 10% or less
    #> 6  10% or less 90% or more 90% or more 10% or less 90% or more 10% or less
    #> 7  10% or less 90% or more 10% or less 90% or more 10% or less 90% or more
    #> 8  10% or less 90% or more 90% or more 10% or less 10% or less 90% or more
    #> 9  90% or more 10% or less 90% or more 10% or less 90% or more 10% or less
    #> 10 90% or more 10% or less 90% or more 10% or less 10% or less 90% or more
    #>           raw_money_social new_hours_school new_hours_social new_money_school
    #> 1  90% or more 10% or less               90               10               90
    #> 2  10% or less 90% or more               90               10               10
    #> 3  90% or more 10% or less               90               10               90
    #> 4  10% or less 90% or more               10               90               90
    #> 5  10% or less 90% or more               90               10               90
    #> 6  90% or more 10% or less               10               90               90
    #> 7  90% or more 10% or less               10               10               10
    #> 8  90% or more 10% or less               10               90               10
    #> 9  90% or more 10% or less               90               90               90
    #> 10 10% or less 90% or more               90               90               10
    #>    new_money_social
    #> 1                90
    #> 2                10
    #> 3                90
    #> 4                10
    #> 5                10
    #> 6                90
    #> 7                90
    #> 8                90
    #> 9                90
    #> 10               10
    
    set.seed(123)
    
    df <- data.frame(
      raw_hours_school = sample(c("90% or more 10% or less",
                              "10% or less 90% or more"),
                            10, replace = TRUE),
      raw_hours_social = sample(c("90% or more 10% or less",
                              "10% or less 90% or more"),
                            10, replace = TRUE),
      raw_money_school = sample(c("90% or more 10% or less",
                              "10% or less 90% or more"),
                            10, replace = TRUE),
      raw_money_social = sample(c("90% or more 10% or less",
                              "10% or less 90% or more"),
                            10, replace = TRUE)
    )