代码之家  ›  专栏  ›  技术社区  ›  Yehuda

filter()中哪个()函数带有dplyr

  •  -1
  • Yehuda  · 技术社区  · 6 年前

    我试图过滤一个数据集,然后将异常值设置为平均值。示例数据框:

    structure(list(INDEX = c(1, 2, 3, 4, 5, 6), TARGET_WINS = c(39, 
    70, 86, 70, 82, 75), TEAM_BATTING_H = c(1445, 1339, 1377, 1387, 
    1297, 1279), TEAM_BATTING_2B = c(194, 219, 232, 209, 186, 200
    ), TEAM_BATTING_3B = c(39, 22, 35, 38, 27, 36), TEAM_BATTING_HR = c(13, 
    190, 137, 96, 102, 92), TEAM_BATTING_BB = c(143, 685, 602, 451, 
    472, 443), TEAM_BATTING_SO = c(842, 1075, 917, 922, 920, 973), 
        TEAM_BASERUN_SB = c(NA, 37, 46, 43, 49, 107), TEAM_BASERUN_CS = c(NA, 
        28, 27, 30, 39, 59), TEAM_BATTING_HBP = c(NA_real_, NA_real_, 
        NA_real_, NA_real_, NA_real_, NA_real_), TEAM_PITCHING_H = c(9364, 
        1347, 1377, 1396, 1297, 1279), TEAM_PITCHING_HR = c(84, 191, 
        137, 97, 102, 92), TEAM_PITCHING_BB = c(927, 689, 602, 454, 
        472, 443), TEAM_PITCHING_SO = c(5456, 1082, 917, 928, 920, 
        973), TEAM_FIELDING_E = c(1011, 193, 175, 164, 138, 123), 
        TEAM_FIELDING_DP = c(NA, 155, 153, 156, 168, 149)), row.names = c(NA, 
    -6L), class = c("tbl_df", "tbl", "data.frame"))
    

    使用 dplyr ,我过滤异常值,然后尝试根据修正的(非异常值)平均值改变团队的fielding列:

    train %>% 
      filter(which(boxplot.stats(train$TEAM_FIELDING_E)$out %in% train$TEAM_FIELDING_E, arr.ind = TRUE) == TRUE) %>% 
      mutate(
        TEAM_FIELDING_E = NA,
        TEAM_FIELDING_E = mean(train$TEAM_FIELDING_E)
      )
    

    这将返回错误 Error in filter_impl(.data, quo) : Result must have length 2276, not 303 (原始数据集包含303 TEAM_FIELDING_E 离群值和2276行)。我如何利用 filter() 这样我的 mutate() 只会影响那些筛选的行吗?

    1 回复  |  直到 6 年前
        1
  •  1
  •   Jake Kaupp    6 年前

    dplyr 动词,使用裸变量名,不使用 [[ $ . 另外,如果您试图过滤一个值,您可以直接过滤该值,而不是尝试使用 which 以确定匹配的位置。

    对于这种情况,您可以用 if_else 在内部 mutate .

    out <- boxplot.stats(train$TEAM_FIELDING_E)$out
    
     train %>% 
      mutate(TEAM_FIELDING_E = if_else(TEAM_FIELDING_E %in% out, mean(TEAM_FIELDING_E[!(TEAM_FIELDING_E %in% out)]), TEAM_FIELDING_E))