代码之家  ›  专栏  ›  技术社区  ›  Floni

从前一行获取信息,r跟随某个因子

row r
  •  0
  • Floni  · 技术社区  · 7 年前

    我有这个样品:

    > a
       Ship duration.minutes event   Location
    1     a               NA enter     Skagen
    2     a             1616  trip       <NA>
    3     a             4308  stop Copenhagen
    4     b             1646  trip       <NA>
    5     b             5751  stop     Gdynia
    6     b               75  trip       <NA>
    7     b            45666  stop     Gdansk
    8     c             2531  trip       <NA>
    9     c             5360  stop   Szczecin
    10    d              287  trip       <NA>
    

    我想添加一个名为“destination”的新列,并在这些单元格中添加目的地的名称。

    结果将是:

    > output
       Ship duration.minutes event   Location  Destination
    1     a               NA enter     Skagen  NA
    2     a             1616  trip       <NA>  Copenhagen
    3     a             4308  stop Copenhagen  <NA>
    4     b             1646  trip       <NA>  Gdynia
    5     b             5751  stop     Gdynia  <NA> 
    6     b               75  trip       <NA>  Gdansk
    7     b            45666  stop     Gdansk  <NA>
    8     c             2531  trip       <NA>  Szczecin
    9     c             5360  stop   Szczecin  <NA>
    10    d              287  trip       <NA>  <NA>
    

    这意味着它在每艘船上工作:它只会给出船的目的地。在这艘船旅行后,它将前往下一个地点。

    我试过了 moves <- setDT(a)[, .(from = Location[-.N], to = Location[-1L]) , Ship] 但它没有保留列 duration.minutes 以下内容:

    > dput(moves)
    structure(list(Ship = c("a", "a", "b", "b", "b", "c"), from = structure(c(4L, 
    NA, NA, 3L, NA, NA), .Label = c("Copenhagen", "Gdansk", "Gdynia", 
    "Skagen", "Szczecin"), class = "factor"), to = structure(c(NA, 
    1L, 3L, NA, 2L, 5L), .Label = c("Copenhagen", "Gdansk", "Gdynia", 
    "Skagen", "Szczecin"), class = "factor")), row.names = c(NA, 
    -6L), class = c("data.table", "data.frame"), .Names = c("Ship", 
    "from", "to"), .internal.selfref = <pointer: 0x00000000003e0788>)
    

    看起来是这样的:

    > moves
       Ship   from         to
    1:    a Skagen       <NA>
    2:    a   <NA> Copenhagen
    3:    b   <NA>     Gdynia
    4:    b Gdynia       <NA>
    5:    b   <NA>     Gdansk
    6:    c   <NA>   Szczecin
    

    名为a的数据示例是:

    > dput(data)
    structure(list(Ship = c("a", "a", "a", "b", "b", "b", "b", "c", 
    "c", "d"), duration.minutes = c(NA, 1616L, 4308L, 1646L, 5751L, 
    75L, 45666L, 2531L, 5360L, 287L), event = structure(c(1L, 3L, 
    2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L), .Label = c("enter", "stop", 
    "trip"), class = "factor"), Location = structure(c(4L, NA, 1L, 
    NA, 3L, NA, 2L, NA, 5L, NA), .Label = c("Copenhagen", "Gdansk", 
    "Gdynia", "Skagen", "Szczecin"), class = "factor")), .Names = c("Ship", 
    "duration.minutes", "event", "Location"), row.names = c(NA, -10L
    ), class = c("data.table", "data.frame"))
    

    恐怕和塞特一起工作很难。是否有方法保持列持续时间。分钟?

    1 回复  |  直到 7 年前
        1
  •  0
  •   eipi10    7 年前

    我不确定这是否涵盖了您的所有用例,但是您可以使用 lead 函数为每个函数捕获下一个值 Ship 是的。似乎将所有值都放在一个列中比单独列更有意义 Location Destination 柱。

    library(tidyverse)              
    
    a %>% 
      group_by(Ship) %>% 
      mutate(Destination = lead(Location),
             Location = coalesce(Location, Destination)) %>% 
      select(-Destination)
    
       Ship  duration.minutes event Location  
       <chr>            <int> <fct> <fct>     
     1 a                   NA enter Skagen    
     2 a                 1616 trip  Copenhagen
     3 a                 4308 stop  Copenhagen
     4 b                 1646 trip  Gdynia    
     5 b                 5751 stop  Gdynia    
     6 b                   75 trip  Gdansk    
     7 b                45666 stop  Gdansk    
     8 c                 2531 trip  Szczecin  
     9 c                 5360 stop  Szczecin  
    10 d                  287 trip  <NA>
    

    如果要保留单独的列,则可以将代码缩短为:

    a %>% 
      group_by(Ship) %>% 
      mutate(Destination = lead(Location))
    

    对于你提供的数据样本, fill 也可以在一个步骤中创建一个列:

    a %>% 
      group_by(Ship) %>% 
      fill(Location, .direction="up")