代码之家 › 专栏 › 技术社区 › Mislav

计算R中面板数据的相关矩阵

panel-data correlation r

0

Mislav · 技术社区 · 6 年前

我有公司的面板数据集:

df <- structure(list(id = c("00127264", "00127264", "00127264", "00127264", 
"00127264", "00127264", "00127264", "00127264", "00127264", "00127264", 
"00127264", "00127264", "00127264", "00127264", "00127264", "00128538", 
"00128538", "00128538", "00128538", "00128538", "00128538", "00128538", 
"00128538", "00128538", "00128538", "00129879", "00129879", "00129879", 
"00129879", "00129879", "00129879", "00129879", "00129879", "00129879", 
"00129879", "00132241", "00132241", "00132241", "00132241", "00132241", 
"00132241", "00132241", "00132241", "00132241", "00132241", "00132241", 
"00132241", "00132241", "00132241", "00132241"), time = c(2003L, 
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 
2013L, 2014L, 2015L, 2016L, 2017L, 2008L, 2009L, 2010L, 2011L, 
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2003L, 2004L, 2005L, 
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 2004L, 
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 
2014L, 2015L, 2016L, 2017L), sales = c(18778913, 26246705, 24577605, 
20555975, 22803119, 30493587, 47409381, 39648917, 24164698, 26667934, 
36939340, 37303488, 36095594, 47863204, 81470728, 17082948, 19218374, 
17775729, 18719393, 17682127, 17648132, 19868021, 20034845, 20291386, 
28511274, 23842198, 33364335, 38006554, 44051316, 41017519, 44559215, 
38096697, 39532944, 32250063, 20456725, 36737613, 36788480, 34432314, 
45703706, 51318203, 57966879, 57314960, 69108257, 83337772, 95862115, 
78796350, 73897366, 122529286, 114051176, 140727472), costs = c(2776879, 
6661626, 7383728, 8148280, 6965171, 15952938, 28537059, 20336344, 
8049578, 8313115, 17175621, 17864169, 17323966, 25772512, 56918048, 
13617240, 14974971, 13919060, 14317811, 13879155, 14374214, 14607183, 
14718348, 15511957, 22142396, 21523985, 30354647, 33001065, 38699618, 
35369730, 50308253, 37174212, 38743973, 28852158, 16476830, 31420842, 
30050214, 28193685, 35918673, 40847638, 45944119, 44448831, 56898404, 
70216220, 80454840, 63808983, 60155914, 106046623, 96525104, 
119211752)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", 
"data.frame"))

如您所见,它有4列:id、time、sales和costs。我想计算所有公司的销售额和成本之间的相关性。例如,我想计算ID为00127264的公司的销售额与所有其他公司的成本之间的相关性(“00128538”“00129879”“00132241”)。相关性应考虑到时间维度。面板数据集不平衡。

我在这里发现了类似的问题和解决方法: Correlation matrix in panel data in R 但是 widyr 包只能计算一个值变量的相关性:

widyr::pairwise_cor(sample, id, year, sales)

我需要一些

widyr::pairwise_cor(sample, id, year, c(sales, costs))

这是不可能的。预期输出(相关性只是一些随机数):

从到更正

127264 128538 0,54号

127264 129879 0,68号

127264 132241 0,78号

128538 127264 0,43号

128538 129879 0,48号

128538 132241 0,17号

129879 127264 0,57号

129879 128538 0,36号

129879 132241 0,89号

132241 127264 0,15号

132241 128538 0,6号

132241 129879 0,8号

或者它可以是一个相关矩阵,如我所提到的。

0 回复 | 直到 6 年前