代码之家 › 专栏 › 技术社区 › giac

匹配。用MatchIt和MatchIt进行数据模拟和估计。如何检索真实模型?

economics matching linear-regression regression r

giac · 技术社区 · 6 年前

我试着模拟 matching 但是我在某个地方做了一些错误的事情,因为我无法使用 matching .

我正在生成3个变量: x d 哪个是治疗变量(二进制)和 y 结果。与…关联 十 . 匹配的概念是,一旦 十 十

library(tidyverse)
library(Matching)
library(MatchIt)

N = 1000
# generate random variable normality dist #
x = rnorm(N, 0, 5)

和 (二进制)。

# generate Treatement associated with x, with different probailities after a certain threshold #
d = ifelse(x > 0.7, rbinom(0.7 * N, 1, 0.6) , rbinom( (1 - 0.7) * N, 1, 0.3) )
# beyond 0.7 the proba is 0.6 to receive treatment and below is 0.3 #

对我来说似乎是正确的,但是如果你有更好的方法,请告诉我。

# adding a bit more randomness #
d[ sample(length(d), 100) ] <- rbinom(100, 1, 0.5)

# also adding a cut-off point for the treated #  
d[x < -10] <- 0
d[x > 10] <- 0

d ifelse ,在我看来是对的,但我可能错了。

# generate outcome y, w/ polyn relationship with x and a Treatment effect of 15 # sd == 10 #
y = x*1 + x^2 + rnorm(N, ifelse(d == 1, 15, 0), 10)

#
df = cbind(x,d,y) %>% as.data.frame()
# check out the "common support"
df %>% ggplot(aes(x, y, colour = factor(d) )) + geom_point()
#

现在当我估计 d y d .

# omitted x #
lm(y ~ d, df) %>% summary()
# misspecification #
lm(y ~ d + x, df) %>% summary()
# true model #

15 (真实效果) d ).

lm(y ~ d + poly(x,2), df) %>% summary()
# we correctly retrieve 15 #

15 (d的真实效果)与匹配的包。

使用 MatchIt 包裹。

mahalanobis 倾向评分如下:

m1 = matchit(d ~ x, df, distance = 'mahalanobis', method = 'genetic')
m2a = matchit(d ~ x, df, distance = 'logit', method = 'genetic')
m2b = matchit(d ~ x + I(x^2), df, distance = 'logit', method = 'genetic')

匹配数据

mat1 = match.data(m1)
mat2a = match.data(m2a)
mat2b = match.data(m2b)

# OLS #
lm(y ~ d, mat1) %>% summary()
lm(y ~ d, mat2a) %>% summary()
lm(y ~ d, mat2b) %>% summary()

所以在这里我不检索 15 . 为什么?我是不是误解了结果? 我的印象是 ,您不必对多项式项或/和交互进行建模。这不对吗?

lm(y ~ d + poly(x,2), mat1) %>% summary()
lm(y ~ d + poly(x,2), mat2a) %>% summary()
lm(y ~ d + poly(x,2), mat2b) %>% summary()

因为如果我把 poly(x,2) 这里是术语。

Matching

x1 = df$x
gl = glm(d ~ x + I(x^2), df, family = binomial)
x1 = gl$fitted.values

# I thought that it could be because OLS only gives ATE #
m0 = Match(Y = y, Tr = d, X = x1, estimand = 'ATE')
# but no 
m0$est

有什么线索吗?

1 回复 | 直到 6 年前

Roland 6 年前

匹配过程的一个重要输出是控制观测值的权重。计算权重,使治疗组和对照组的倾向性得分分布相似(应用权重后)。

在您的情况下,这意味着(从您的dgp和您的符号开始):

lm(y ~ d, mat1, weights = weights) %>% summary()
lm(y ~ d, mat2a, weights = weights) %>% summary()
lm(y ~ d, mat2b, weights = weights) %>% summary()

我们到了: 15 回来了(实际上是14.9)!

推荐文章

Amp · 使用R ggplot2删除geom_radial中axis.line和panel.border之间的空格

4 月前

Hard_Course · 用另一列中的值替换行的最后一个非NA条目

4 月前

Mark R · 使用geom_sf()删除地球仪上不需要的网格线

4 月前

Joe · 根据对工作日和本周早些时候的日期的了解,找到一个日期

4 月前

Ben · 统计向量中的单词在字符串中出现的频率

4 月前

TheCodeNovice · R中符号格式的尾随零和其他问题[重复]

4 月前

katefull06 · 在R中使用terra修改范围时,会为单独的SpatRaster重写范围

5 月前

dez93_2000 · 在R管道子功能中引用管道对象的当前状态

5 月前

accibio · 在ggplot2中为同一变量创建两个连续的颜色渐变比例

5 月前

Mankka · 如何在Ggplot2中绘制均匀的径向图

5 月前