代码之家  ›  专栏  ›  技术社区  ›  Endre mr. fixit

用来衡量的一组变量改变了估计结果

  •  0
  • Endre mr. fixit  · 技术社区  · 6 年前

    在比较传递给 svyby 函数的结果估计和标准误差,我发现,加权单一变量和两个变量产生相同的估计,但加权多个变量产生一个显着低于其他两种方法的估计。

    原因是什么?我怎样才能避免这种情况的发生?

    链接到数据集: https://drive.google.com/open?id=1xqFxUBLZifaz57yvoNFOcvhBDGuHuSMq

    这是我的密码:

    library(tidyverse)
    library(survey)
    
    load("des2004small.RData")
    
    weighUp <- function(variables) {
      svyby(formula = make.formula(variables), by = ~statefip, 
            design = des2004small,  
            FUN = svytotal, na.rm = TRUE)
    }
    
    # Weigh up a single variable:
    dfstate2004_singleVariable = weighUp(c("race_acs"))
    # Weigh up two variables:
    dfstate2004_twoVariables = weighUp(c("race_acs", "cvap_acs"))
    # Weigh up multiple variables:
    dfstate2004_multipleVariables = weighUp(c("race_acs", "cit_acs", 
                                              "educ_acs", "unemployed_acs", "labforce_acs", "poverty_acs", "cvap_acs"))
    
    # Compare the three diffent methods:
    comparison2004 = dfstate2004_singleVariable %>% 
      inner_join(dfstate2004_twoVariables, by = "statefip", suffix = c(".single", ".two")) %>%
      inner_join(dfstate2004_multipleVariables, by = "statefip", suffix = c("", ".multiple"))
    
    race_acswhite2004 = comparison2004 %>% 
      select(statefip, 
             single = race_acswhite.single, 
             two = race_acswhite.two, 
             multiple = race_acswhite)
    race_acswhite2004
    

    以下是由此产生的不同估计:

    +-------------------------------------+
    |   statefip  single     two multiple |
    +-------------------------------------+
    | 1        1 3084123 3084123  2128346 |
    | 2        2  427008  427008   277075 |
    +-------------------------------------+
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Thomas Lumley    6 年前

    “multiple”表中的变量缺少值,并且 svytotal 任何 NA 结果,但如果您要求它用 na.rm=TRUE 它抛弃了他们和整个观察。