假设我有一个包含8只股票的财务历史数据集,它们属于3类。我想使用dplyr软件包计算R中每组股票的相关性。
library(tidyverse)
library(tidyquant)
Category = c("Social","Social","Internet","Technology",
"Technology","Internet","Internet")
symbol = c("TWTR","FB","GOOG","TSLA","NOK","AMZN","AAPL")
A = tibble(Category,symbol)
B = tq_get(symbol,
from = "2021-01-01",
to = "2022-01-01")
BA = left_join(B,A,by="symbol")
BA%>%select(symbol,Category,close)
几天前我发布了
this
类似的问题,但分组变量是数字,我的真实世界数据集不适用。理想的输出是这样的:
类别
|
库存1
|
库存2
|
cor
|
社会的
|
TWTR
|
FB
|
cor(TWTR,FB)
|
互联网
|
谷歌
|
AMZN
|
cor(GOOG,AMZN)
|
互联网
|
谷歌
|
AAPL
|
cor(GOOG,AMZN)
|
互联网
|
AMZN
|
AAPL
|
cor(GOOG,AAPL)
|
技术
|
TSLA
|
NOK
|
cor(TSLA,NOK)
|
有什么帮助我可以用dplyr在R中做到这一点吗?
可选数据
var2 = c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3),rep("F",3),
rep("H",3),rep("I",3))
y2 = c(-1.23, -0.983, 1.28, -0.268, -0.46, -1.23,
1.87, 0.416, -1.99, 0.289, 1.7, -0.455,
-0.648, 0.376, -0.887,0.534,-0.679,-0.923,
0.987,0.324,-0.783,-0.679,0.326,0.998);length(y2)
group2 = as.character(c(rep("xx",6),rep("xy",6),rep("xz",6),rep("xx",6)))
data2 = tibble(var2,group2,y2);data2