代码之家  ›  专栏  ›  技术社区  ›  M.Teich

将数据帧重新格式化为长格式,保留分组列

  •  0
  • M.Teich  · 技术社区  · 7 年前

    考虑以下数据帧:

    set.seed(42)
    ID <- c(1:6) 
    OB <- c(rep("A",4),rep("B",2))
    lat_start <- rnorm(6,42,2)
    lon_start <- rnorm(6,12,2)
    lat_stopp <- rnorm(6,42,2)
    lon_stopp <- rnorm(6,12,2)
    df <- data.frame(ID,OB,lat_start,lon_start,lat_stopp,lon_stopp)
    

    我想重新格式化 df 以长格式,其中每个 ID gather() 解决方案,例如。 df_wrong <- gather(df,coords,val,lat_start:lon_stopp) 显然不起作用,因为我需要lat/lon列保持分组。我希望长数据帧看起来像这样:

       ID OB    SS      lat       lon
    1   1  A start 44.74192 15.023040
    2   1  A  stop 39.22228  7.119066
    3   2  A start 40.87060 11.810680
    4   2  A  stop 41.44242 14.640227
    5   3  A start 42.72626 16.036850
    6   3  A  stop 41.73336 11.386723
    7   4  A start 43.26573 11.874570
    8   4  A  stop 43.27190  8.437383
    9   5  B start 42.80854 14.609740
    10  5  B  stop 41.43149 11.656165
    11  6  B start 41.78775 16.573290
    12  6  B  stop 36.68709 14.429349
    

    SS 当然可以稍后添加列。任何建议都将不胜感激!

    1 回复  |  直到 7 年前
        1
  •  1
  •   tmfmnk    7 年前

    一个 tidyverse

    df %>%
     gather(var, val, -c(ID, OB)) %>%
     separate(var, c("var1", "SS")) %>%
     spread(var1, val)
    
       ID OB    SS      lat       lon
    1   1  A start 44.74192 15.023044
    2   1  A stopp 39.22228  7.119066
    3   2  A start 40.87060 11.810682
    4   2  A stopp 41.44242 14.640227
    5   3  A start 42.72626 16.036847
    6   3  A stopp 41.73336 11.386723
    7   4  A start 43.26573 11.874572
    8   4  A stopp 43.27190  8.437383
    9   5  B start 42.80854 14.609739
    10  5  B stopp 41.43149 11.656165
    11  6  B start 41.78775 16.573291
    12  6  B stopp 36.68709 14.429349
    
        2
  •  1
  •   jay.sf    7 年前

    你可以 unlist 创建一个重新排列的矩阵。

    d <- matrix(unlist(t(df[3:6])), ncol=2, byrow=TRUE)
    setNames(data.frame(do.call(cbind, lapply(list(df$ID, as.character(df$OB)), 
                                     function(x) rep(x, each=2))), 
                        c("start", "stop"),
                        d), c(names(df)[1:3], "lat", "lon"))
    #    ID OB lat_start      lat       lon
    # 1   1  A     start 44.74192 15.023044
    # 2   1  A      stop 39.22228  7.119066
    # 3   2  A     start 40.87060 11.810682
    # 4   2  A      stop 41.44242 14.640227
    # 5   3  A     start 42.72626 16.036847
    # 6   3  A      stop 41.73336 11.386723
    # 7   4  A     start 43.26573 11.874572
    # 8   4  A      stop 43.27190  8.437383
    # 9   5  B     start 42.80854 14.609739
    # 10  5  B      stop 41.43149 11.656165
    # 11  6  B     start 41.78775 16.573291
    # 12  6  B      stop 36.68709 14.429349