代码之家  ›  专栏  ›  技术社区  ›  Anonymous coward

无法在带有tible的summary()之后的mutate()中生成子集

  •  1
  • Anonymous coward  · 技术社区  · 7 年前

    我不知道这是否是处理问题的独特行为 tibbles

    library(dplyr)
    library(gapminder)
    df <- gapminder %>%
      group_by(year, continent) %>% 
      summarize(avg_life = mean(lifeExp))
    

    这就产生了一个tibble, df .

    # A tibble: 60 x 3
    # Groups:   year [?]
        year continent avg_life
       <int> <fct>        <dbl>
     1  1952 Africa        39.1
     2  1952 Americas      53.3
     3  1952 Asia          46.3
     4  1952 Europe        64.4
     5  1952 Oceania       69.3
     6  1957 Africa        41.3
     7  1957 Americas      56.0
     8  1957 Asia          49.3
     9  1957 Europe        66.7
    10  1957 Oceania       70.3
    # ... with 50 more rows
    

    我以为下一步会奏效,然后 this post 建议应该这样。

    如果我以标准方式将其子集,它将生成预期的输出。

    df$avg_life[df$year == 1952]
    [1] 39.13550 53.27984 46.31439 64.40850 69.25500
    

    如果我试着在一个小时内完成 mutate() ,它什么也得不到。

    df <- gapminder %>%
      group_by(year, continent) %>% 
      summarize(avg_life = mean(lifeExp)) %>% 
      mutate(life_chg = avg_life - avg_life[year == 1952])
    

    mutate\u impl(.data,dots)出错: 列 life_chg 长度必须为5(组大小)或1,而不是0

    更改 == > 0

    0 .

    df <- gapminder %>%
      group_by(year, continent) %>% 
      summarize(avg_life = mean(lifeExp)) %>% 
      mutate(life_chg = avg_life - avg_life[c(T, T, T, T, T, rep(F, 55))])
    

    为什么这在一个小时内不起作用

    df的结构:

    str(df)
    Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 60 obs. of  4 variables:
     $ year     : int  1952 1952 1952 1952 1952 1957 1957 1957 1957 1957 ...
     $ continent: Factor w/ 5 levels "Africa","Americas",..: 1 2 3 4 5 1 2 3 4 5 ...
     $ avg_life : num  39.1 53.3 46.3 64.4 69.3 ...
     $ life_chg : num  0 0 0 0 0 0 0 0 0 0 ...
     - attr(*, "vars")= chr "year"
     - attr(*, "labels")='data.frame':  12 obs. of  1 variable:
      ..$ year: int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
      ..- attr(*, "vars")= chr "year"
      ..- attr(*, "drop")= logi TRUE
     - attr(*, "indices")=List of 12
      ..$ : int  0 1 2 3 4
      ..$ : int  5 6 7 8 9
      ..$ : int  10 11 12 13 14
      ..$ : int  15 16 17 18 19
      ..$ : int  20 21 22 23 24
      ..$ : int  25 26 27 28 29
      ..$ : int  30 31 32 33 34
      ..$ : int  35 36 37 38 39
      ..$ : int  40 41 42 43 44
      ..$ : int  45 46 47 48 49
      ..$ : int  50 51 52 53 54
      ..$ : int  55 56 57 58 59
     - attr(*, "drop")= logi TRUE
     - attr(*, "group_sizes")= int  5 5 5 5 5 5 5 5 5 5 ...
     - attr(*, "biggest_group_size")= int 5
    
    1 回复  |  直到 7 年前
        1
  •  0
  •   Anonymous coward    7 年前

    joran ,你必须 ungroup

    library(dplyr)
    library(gapminder)
    
    gapminder %>%
      group_by(year, continent) %>%
      summarize(avg_life = mean(lifeExp)) %>%
      ungroup(.) %>%
      mutate(life_chg = avg_life - avg_life[year == 1952])
    
    # A tibble: 60 x 4
        year continent avg_life life_chg
       <int> <fct>        <dbl>    <dbl>
     1  1952 Africa        39.1     0   
     2  1952 Americas      53.3     0   
     3  1952 Asia          46.3     0   
     4  1952 Europe        64.4     0   
     5  1952 Oceania       69.3     0   
     6  1957 Africa        41.3     2.13
     7  1957 Americas      56.0     2.68
     8  1957 Asia          49.3     3.00
     9  1957 Europe        66.7     2.29
    10  1957 Oceania       70.3     1.04
    # ... with 50 more rows