代码之家  ›  专栏  ›  技术社区  ›  Ali Hadjihoseini

通过比较R[duplicate]中一列中的值来选择行

  •  1
  • Ali Hadjihoseini  · 技术社区  · 7 年前

    我有一个数据框,看起来像:

    n4= 
        sector turb    dist
        1  sector1  T02  828.66
        2  sector1  T04 1114.58
        3  sector1  T05 1012.22
        4  sector2  T03  992.64
        5  sector2  T05 1012.22
        6  sector2  T06 1158.38
        7  sector3  T03  992.64
        8 sector12  T02  828.66
        9 sector12  T04 1114.58
    

    我希望保留具有唯一扇区名称的行,并且度量值是在dist列中保留具有最小值的行:

     sector turb   dist
    1  sector1  T02 828.66
    4  sector2  T03 992.64
    7  sector3  T03 992.64
    8 sector12  T02 828.66
    

    我知道我必须根据行业对他们进行分类:

    result = n4 %>%
    dplyr::group_by(sector)
    

    但是使用select或filter命令并没有像我尝试的那样起作用:

    result = n4 %>%
        dplyr::group_by(sector)%>%
        dplyr::select(which.min(dist))
    

    你知道我怎么做吗?

    2 回复  |  直到 7 年前
        1
  •  3
  •   Jilber Urbina    7 年前

    你可以用 filter 而不是 select 作为替代品 slice

    > n4 %>%
        dplyr::group_by(sector)%>%
        dplyr::filter(dist==min(dist))
    # A tibble: 4 x 3
    # Groups:   sector [4]
      sector   turb   dist
      <fct>    <fct> <dbl>
    1 sector1  T02    829.
    2 sector2  T03    993.
    3 sector3  T03    993.
    4 sector12 T02    829.
    

    如果你喜欢使用R基,试试 aggregate

    > aggregate(.~sector, data=n4, min)
        sector turb   dist
    1  sector1    1 828.66
    2 sector12    1 828.66
    3  sector2    2 992.64
    4  sector3    2 992.64
    

    check this answer

        2
  •  2
  •   akrun    7 年前

    我们需要 slice 而不是 select 将行子集化。这个 函数用于选择数据集的列。如果“扇区”的顺序应与输入数据中“扇区”的出现顺序相同,则将列更改为a factor 具有 levels 在输入数据顺序中指定

    n4 %>%       
       dplyr::group_by(sector = factor(sector, levels = unique(sector)))%>%
       dplyr::slice(which.min(dist))
    # A tibble: 4 x 3
    # Groups:   sector [4]
    #  sector   turb   dist
    #  <fct>    <chr> <dbl>
    #1 sector1  T02    829.
    #2 sector2  T03    993.
    #3 sector3  T03    993.
    #4 sector12 T02    829.
    

    或使用 base R

    n4[with(n4, ave(dist, sector, FUN = min) == dist),]
    #     sector turb   dist
    #1  sector1  T02 828.66
    #4  sector2  T03 992.64
    #7  sector3  T03 992.64
    #8 sector12  T02 828.66
    

    数据

    n4 <- structure(list(sector = c("sector1", "sector1", "sector1", "sector2", 
    "sector2", "sector2", "sector3", "sector12", "sector12"), turb = c("T02", 
    "T04", "T05", "T03", "T05", "T06", "T03", "T02", "T04"), dist = c(828.66, 
     1114.58, 1012.22, 992.64, 1012.22, 1158.38, 992.64, 828.66, 1114.58
    )), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
    "6", "7", "8", "9"))