我有公司的面板数据集:
df <- structure(list(id = c("00127264", "00127264", "00127264", "00127264",
"00127264", "00127264", "00127264", "00127264", "00127264", "00127264",
"00127264", "00127264", "00127264", "00127264", "00127264", "00128538",
"00128538", "00128538", "00128538", "00128538", "00128538", "00128538",
"00128538", "00128538", "00128538", "00129879", "00129879", "00129879",
"00129879", "00129879", "00129879", "00129879", "00129879", "00129879",
"00129879", "00132241", "00132241", "00132241", "00132241", "00132241",
"00132241", "00132241", "00132241", "00132241", "00132241", "00132241",
"00132241", "00132241", "00132241", "00132241"), time = c(2003L,
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L,
2013L, 2014L, 2015L, 2016L, 2017L, 2008L, 2009L, 2010L, 2011L,
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 2017L), sales = c(18778913, 26246705, 24577605,
20555975, 22803119, 30493587, 47409381, 39648917, 24164698, 26667934,
36939340, 37303488, 36095594, 47863204, 81470728, 17082948, 19218374,
17775729, 18719393, 17682127, 17648132, 19868021, 20034845, 20291386,
28511274, 23842198, 33364335, 38006554, 44051316, 41017519, 44559215,
38096697, 39532944, 32250063, 20456725, 36737613, 36788480, 34432314,
45703706, 51318203, 57966879, 57314960, 69108257, 83337772, 95862115,
78796350, 73897366, 122529286, 114051176, 140727472), costs = c(2776879,
6661626, 7383728, 8148280, 6965171, 15952938, 28537059, 20336344,
8049578, 8313115, 17175621, 17864169, 17323966, 25772512, 56918048,
13617240, 14974971, 13919060, 14317811, 13879155, 14374214, 14607183,
14718348, 15511957, 22142396, 21523985, 30354647, 33001065, 38699618,
35369730, 50308253, 37174212, 38743973, 28852158, 16476830, 31420842,
30050214, 28193685, 35918673, 40847638, 45944119, 44448831, 56898404,
70216220, 80454840, 63808983, 60155914, 106046623, 96525104,
119211752)), row.names = c(NA, -50L), class = c("tbl_df", "tbl",
"data.frame"))
如您所见,它有4列:id、time、sales和costs。
我想计算所有公司的销售额和成本之间的相关性。例如,我想计算ID为00127264的公司的销售额与所有其他公司的成本之间的相关性(“00128538”“00129879”“00132241”)。相关性应考虑到时间维度。面板数据集不平衡。
我在这里发现了类似的问题和解决方法:
Correlation matrix in panel data in R
但是
widyr
包只能计算一个值变量的相关性:
widyr::pairwise_cor(sample, id, year, sales)
我需要一些
widyr::pairwise_cor(sample, id, year, c(sales, costs))
这是不可能的。
预期输出(相关性只是一些随机数):
从到更正
127264 128538 0,54号
127264 129879 0,68号
127264 132241 0,78号
128538 127264 0,43号
128538 129879 0,48号
128538 132241 0,17号
129879 127264 0,57号
129879 128538 0,36号
129879 132241 0,89号
132241 127264 0,15号
132241 128538 0,6号
132241 129879 0,8号
或者它可以是一个相关矩阵,如我所提到的。