代码之家 › 专栏 › 技术社区 › M.Teich

将数据帧重新格式化为长格式,保留分组列

reformat tidyr dplyr r

M.Teich · 技术社区 · 7 年前

考虑以下数据帧:

set.seed(42)
ID <- c(1:6) 
OB <- c(rep("A",4),rep("B",2))
lat_start <- rnorm(6,42,2)
lon_start <- rnorm(6,12,2)
lat_stopp <- rnorm(6,42,2)
lon_stopp <- rnorm(6,12,2)
df <- data.frame(ID,OB,lat_start,lon_start,lat_stopp,lon_stopp)

我想重新格式化 df 以长格式,其中每个 ID gather() 解决方案,例如。 df_wrong <- gather(df,coords,val,lat_start:lon_stopp) 显然不起作用,因为我需要lat/lon列保持分组。我希望长数据帧看起来像这样:

   ID OB    SS      lat       lon
1   1  A start 44.74192 15.023040
2   1  A  stop 39.22228  7.119066
3   2  A start 40.87060 11.810680
4   2  A  stop 41.44242 14.640227
5   3  A start 42.72626 16.036850
6   3  A  stop 41.73336 11.386723
7   4  A start 43.26573 11.874570
8   4  A  stop 43.27190  8.437383
9   5  B start 42.80854 14.609740
10  5  B  stop 41.43149 11.656165
11  6  B start 41.78775 16.573290
12  6  B  stop 36.68709 14.429349

SS 当然可以稍后添加列。任何建议都将不胜感激!

1 回复 | 直到 7 年前

tmfmnk 7 年前

一个 tidyverse

df %>%
 gather(var, val, -c(ID, OB)) %>%
 separate(var, c("var1", "SS")) %>%
 spread(var1, val)

   ID OB    SS      lat       lon
1   1  A start 44.74192 15.023044
2   1  A stopp 39.22228  7.119066
3   2  A start 40.87060 11.810682
4   2  A stopp 41.44242 14.640227
5   3  A start 42.72626 16.036847
6   3  A stopp 41.73336 11.386723
7   4  A start 43.26573 11.874572
8   4  A stopp 43.27190  8.437383
9   5  B start 42.80854 14.609739
10  5  B stopp 41.43149 11.656165
11  6  B start 41.78775 16.573291
12  6  B stopp 36.68709 14.429349

jay.sf 7 年前

你可以 unlist 创建一个重新排列的矩阵。

d <- matrix(unlist(t(df[3:6])), ncol=2, byrow=TRUE)
setNames(data.frame(do.call(cbind, lapply(list(df$ID, as.character(df$OB)), 
                                 function(x) rep(x, each=2))), 
                    c("start", "stop"),
                    d), c(names(df)[1:3], "lat", "lon"))
#    ID OB lat_start      lat       lon
# 1   1  A     start 44.74192 15.023044
# 2   1  A      stop 39.22228  7.119066
# 3   2  A     start 40.87060 11.810682
# 4   2  A      stop 41.44242 14.640227
# 5   3  A     start 42.72626 16.036847
# 6   3  A      stop 41.73336 11.386723
# 7   4  A     start 43.26573 11.874572
# 8   4  A      stop 43.27190  8.437383
# 9   5  B     start 42.80854 14.609739
# 10  5  B      stop 41.43149 11.656165
# 11  6  B     start 41.78775 16.573291
# 12  6  B      stop 36.68709 14.429349

推荐文章

Amp · 使用R ggplot2删除geom_radial中axis.line和panel.border之间的空格

1 年前

Hard_Course · 用另一列中的值替换行的最后一个非NA条目

1 年前

Mark R · 使用geom_sf()删除地球仪上不需要的网格线

1 年前

Joe · 根据对工作日和本周早些时候的日期的了解,找到一个日期

1 年前

Ben · 统计向量中的单词在字符串中出现的频率

1 年前

TheCodeNovice · R中符号格式的尾随零和其他问题[重复]

1 年前

katefull06 · 在R中使用terra修改范围时,会为单独的SpatRaster重写范围

1 年前

dez93_2000 · 在R管道子功能中引用管道对象的当前状态

1 年前

accibio · 在ggplot2中为同一变量创建两个连续的颜色渐变比例

1 年前

Mankka · 如何在Ggplot2中绘制均匀的径向图

1 年前