我有随机的df和相似的量表答案。所有列都是命名为q1、q2、q3…的问题,。。。,问题6。
我现在有了另一个数据框架,它给了我一组问题。例如,q1、q2、q3在组A中,q4在组B中,q5和q6在组C中。
我想首先计算每组的平均值。例如,在新生成的数据帧中,我必须具有平均值(按行)为q1、q2、q3的列A。
所以我做到了:
likert_levels <- c(1,2,3,4,5)
set.seed(42)
library(dplyr)
df <-
tibble(
"q1" = sample(likert_levels, 150, replace = TRUE),
"q2" = sample(likert_levels, 150, replace = TRUE, prob = 5:1),
"q3" = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
"q4" = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
"q5" = sample(c(likert_levels, NA), 150, replace = TRUE),
"q6" = sample(likert_levels, 150, replace = TRUE, prob = c(1, 0, 1, 1, 0))
) %>%
mutate(across(everything(), ~ factor(.x, levels = likert_levels)))
df
df2 = tibble(categories = c("A","A","A","B","C","C"),
questions = c("q1","q2","q3","q4","q5","q6"))
df2
df%>%
mutate(id = row_number())%>%
tidyr::pivot_longer(!id,names_to = "questions",values_to = "responses")%>%
left_join(.,df2,by="questions")
df_cor=df%>%
mutate_if(is.factor,as.double)%>%
rowwise() %>%
mutate(QA = mean(c(q1, q2, q3),na.rm=TRUE),
QB = mean(c(q4),na.rm=TRUE),
QC = mean(c(q5, q6),na.rm=TRUE))%>%
select(QA,QB,QC)
df_cor
我的问题是:因为我的现实生活数据集包含100个问题和20多个组,我如何避免在dplyr中键入逐行平均值突变,而是用不同的分组方式自动完成?