代码之家  ›  专栏  ›  技术社区  ›  Anastasia

绘制lm预测值

  •  1
  • Anastasia  · 技术社区  · 7 年前

    我是一个新的可视化回归结果,需要帮助得到一个图表显示预测值从线性模型回归。

    使用获取的数据结构 dput :

    `> dput(head(dat,4))
    structure(list(X = c(809L, 3L, 1L, 2L), cntry = structure(c(1L, 
    1L, 1L, 1L), .Label = c("AT", "BE", "CH", "CZ", "DE", "DK", "ES", 
    "FI", "GB", "GR", "HU", "IE", "IL", "NL", "NO", "PL", "PT", "SE", 
    "SI", "EE", "IS", "LU", "SK", "TR", "UA", "BG", "CY", "FR", "RU", 
    "HR", "LV", "RO", "LT", "AL", "IT", "XK"), class = "factor"), 
    ipshabt = c(4L, 2L, 3L, 2L), ipsuces = c(3L, 2L, 3L, 1L), 
    imprich = c(3L, 3L, 3L, 3L), iprspot = c(3L, 3L, 4L, 4L), 
    impsafe = c(3L, 3L, 2L, 4L), ipstrgv = c(2L, 2L, 2L, 3L), 
    ipfrule = c(3L, 2L, 1L, 6L), ipbhprp = c(3L, 2L, 4L, 4L), 
    ipmodst = c(3L, 3L, 2L, 5L), imptrad = c(2L, 2L, 1L, 6L), 
    ipeqopt = c(1L, 2L, 1L, 1L), ipudrst = c(1L, 2L, 3L, 3L), 
    impenv = c(3L, 2L, 2L, 1L), iphlppl = c(1L, 2L, 1L, 4L), 
    iplylfr = c(2L, 2L, 2L, 3L), ipcrtiv = c(2L, 2L, 2L, 1L), 
    impfree = c(2L, 3L, 1L, 1L), impdiff = c(2L, 3L, 3L, 1L), 
    ipadvnt = c(6L, 4L, 3L, 1L), ipgdtim = c(2L, 2L, 1L, 1L), 
    impfun = c(6L, 5L, 1L, 3L), gndr = c(2L, 2L, 1L, 1L), agea = c(69L, 
    63L, 54L, 50L), hincfel = c(1L, 2L, 1L, 3L), educ = c(1L, 
    2L, 3L, 3L), year = c(2002L, 2002L, 2002L, 2002L), Achievement = c(3.5, 
    2, 3, 1.5), Power = c(3, 3, 3.5, 3.5), Security = c(2.5, 
    2.5, 2, 3.5), Conformity = c(3, 2, 2.5, 5), Tradition = c(2.5, 
    2.5, 1.5, 5.5), Universalism = c(1.66666666666667, 2, 2, 
    1.66666666666667), Benevolence = c(1.5, 2, 1.5, 3.5), SelfDirection = c(2, 2.5, 1.5, 1), Stimulation = c(4, 3.5, 3, 1), Hedonism = c(4, 
    3.5, 1, 2), SelfEnh = c(3.25, 2.5, 3.25, 2.5), SelfTran = c(1.6, 
    2, 1.8, 2.4), Cons = c(2.66666666666667, 2.33333333333333, 
    2, 4.66666666666667), Open = c(3.33333333333333, 3.16666666666667, 
    1.83333333333333, 1.33333333333333), SelfTranNet = c(-1.65, 
    -0.5, -1.45, -0.1), OpenNet = c(0.666666666666667, 0.833333333333333, 
    -0.166666666666667, -3.33333333333333), east = c(0, 0, 0, 
    0), eastyear = c(0, 0, 0, 0), income = c(1L, 2L, 1L, 3L), 
    year2002 = c(1, 1, 1, 1), eastyear2002 = c(0, 0, 0, 0), year2004 = c(0, 
    0, 0, 0), eastyear2004 = c(0, 0, 0, 0), year2006 = c(0, 0, 
    0, 0), eastyear2006 = c(0, 0, 0, 0), year2008 = c(0, 0, 0, 
    0), eastyear2008 = c(0, 0, 0, 0), year2010 = c(0, 0, 0, 0
    ), eastyear2010 = c(0, 0, 0, 0), year2012 = c(0, 0, 0, 0), 
    eastyear2012 = c(0, 0, 0, 0), year2014 = c(0, 0, 0, 0), eastyear2014 = c(0, 
    0, 0, 0), year2016 = c(0, 0, 0, 0), eastyear2016 = c(0, 0, 
    0, 0)), .Names = c("X", "cntry", "ipshabt", "ipsuces", "imprich", 
    "iprspot", "impsafe", "ipstrgv", "ipfrule", "ipbhprp", "ipmodst", 
    "imptrad", "ipeqopt", "ipudrst", "impenv", "iphlppl", "iplylfr", 
    "ipcrtiv", "impfree", "impdiff", "ipadvnt", "ipgdtim", "impfun", 
    "gndr", "agea", "hincfel", "educ", "year", "Achievement", "Power", 
    "Security", "Conformity", "Tradition", "Universalism", "Benevolence", 
    "SelfDirection", "Stimulation", "Hedonism", "SelfEnh", "SelfTran", 
    "Cons", "Open", "SelfTranNet", "OpenNet", "east", "eastyear", 
    "income", "year2002", "eastyear2002", "year2004", "eastyear2004", 
    "year2006", "eastyear2006", "year2008", "eastyear2008", "year2010", 
    "eastyear2010", "year2012", "eastyear2012", "year2014", "eastyear2014", 
    "year2016", "eastyear2016"), row.names = c(NA, 4L), class =     "data.frame")`
    

    我的线性回归模型: > modelAchievement <- lm(Achievement~east+year+year2002+eastyear2002+year2004+eastyear2004+year2006+eastyear2006+year2008+eastyear2008+year2010+eastyear2010+year2012+eastyear2012+year2014+eastyear2014+year2016+eastyear2016+agea+gndr+income+educ, data = dat)

    我不知道该怎么做,试着用 ggplot(modelAchievement, aes(y = Achievement, x = year)) ,但它给出了一个空的情节。

    任何建议都将不胜感激。

    链接到完整数据: data

    1 回复  |  直到 7 年前
        1
  •  3
  •   Weihuang Wong    7 年前

    根据你的公式,你似乎在 east 每级 year ,我们可以更简洁地表示为

    fit <- lm(Achievement ~ east * factor(year) + agea + gndr + income + educ, data = dat)
    

    计算不同值的预测结果 我们首先要定义其他4个变量的值, agea , gndr , income educ . 我将这些值设置为它们的样本平均值,尽管您可以使用您想要的任何值。

    library(dplyr)
    new_dat <- summarise_at(dat, vars(agea, gndr, income, educ), mean)
    #       agea     gndr   income    educ
    # 1 47.88262 1.536708 2.031206 3.16173
    

    然后,我们将此数据帧与另一个数据帧组合,该数据帧具有 .

    new_dat <- cbind(expand.grid(year = seq(2002, 2016, 2), east = 0:1), new_dat)
    new_dat
    #    year east     agea     gndr   income    educ
    # 1  2002    0 47.88262 1.536708 2.031206 3.16173
    # 2  2004    0 47.88262 1.536708 2.031206 3.16173
    # 3  2006    0 47.88262 1.536708 2.031206 3.16173
    # 4  2008    0 47.88262 1.536708 2.031206 3.16173
    # 5  2010    0 47.88262 1.536708 2.031206 3.16173
    # 6  2012    0 47.88262 1.536708 2.031206 3.16173
    # 7  2014    0 47.88262 1.536708 2.031206 3.16173
    # 8  2016    0 47.88262 1.536708 2.031206 3.16173
    # 9  2002    1 47.88262 1.536708 2.031206 3.16173
    # 10 2004    1 47.88262 1.536708 2.031206 3.16173
    # 11 2006    1 47.88262 1.536708 2.031206 3.16173
    # 12 2008    1 47.88262 1.536708 2.031206 3.16173
    # 13 2010    1 47.88262 1.536708 2.031206 3.16173
    # 14 2012    1 47.88262 1.536708 2.031206 3.16173
    # 15 2014    1 47.88262 1.536708 2.031206 3.16173
    # 16 2016    1 47.88262 1.536708 2.031206 3.16173
    

    然后我们使用 predict 要计算此新数据集的预测结果,请执行以下操作:

    new_dat$predicted <- predict(fit, new_dat)
    

    现在我们可以作图了

    library(ggplot2)
    ggplot(new_dat, aes(x = year, y = predicted, colour = factor(east), group = east)) +
      geom_line()
    

    enter image description here

    推荐文章