I’ve got two examples here - one purely cross-sectional and one time-series cross-sectional. The TSCS one is a study by David Rueda (2008), which was replicated by Lall in his 2016 piece. The second is from the 2014 American National Election Study.
For the sake of this exercise, we’re going to replicate one of Rueda’s models and assess the quality of the imputations. You can read in and manage the data like below. This puts an object named x
in your workspace that contains the data.
f <- file("https://quantoid.net/files/reg3/rueda.rda")
load(f)
close(f)
tmp <- x[,c("country", "year", "govem", "govpart", "hkcorp",
"govpxcor", "open", "finop", "dccggx", "unemp",
"gdpgr", "deca70", "deca80", "deca85", "deca90", "count")]
tmp$count <- as.factor(tmp$count)
Next we can make the listwise deleted data. We can do this as follows:
tmpNA <- na.omit(tmp[,c("country", "year", "govem", "govpart", "hkcorp",
"govpxcor", "open", "finop", "dccggx", "unemp",
"gdpgr", "deca70", "deca80", "deca85", "deca90", "count")])
govem
(government employment) on govpart
(cabinet partisanship), hkcorp
(corporatism), govpart*hkcorp
(the interaction of cabinet partisanship and corporatism), open
(internal openness), finop
(financial openness), dccggx
(government debt), unemp
(unemployment), gdpgr
(gdp growth), deca70
, deca80
, deca85
, deca90
and count
(year and country variables).mod <- lm(govem ~ govpart*hkcorp + open + finop + dccggx +
unemp + gdpgr + deca70 + deca80 + deca85 + deca90 + count, data=tmpNA)
summary(mod)
##
## Call:
## lm(formula = govem ~ govpart * hkcorp + open + finop + dccggx +
## unemp + gdpgr + deca70 + deca80 + deca85 + deca90 + count,
## data = tmpNA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1283 -0.7209 -0.1021 0.9666 4.1376
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.998227 1.449952 15.172 < 2e-16 ***
## govpart 0.046705 0.012530 3.727 0.000227 ***
## hkcorp 0.238680 1.990129 0.120 0.904610
## open -0.021053 0.017769 -1.185 0.236935
## finop 0.215997 0.099397 2.173 0.030482 *
## dccggx -0.019114 0.857783 -0.022 0.982236
## unemp 0.382621 0.049026 7.805 7.86e-14 ***
## gdpgr -0.008315 0.041912 -0.198 0.842867
## deca70 -0.874399 0.516221 -1.694 0.091233 .
## deca80 0.004772 0.470755 0.010 0.991918
## deca85 0.202725 0.444762 0.456 0.648828
## deca90 -0.038592 0.444986 -0.087 0.930941
## count2 -4.724493 1.907703 -2.477 0.013764 *
## count3 -5.949257 2.080673 -2.859 0.004514 **
## count4 -6.933322 0.753173 -9.205 < 2e-16 ***
## count5 2.435253 1.522843 1.599 0.110739
## count6 -5.455875 1.706360 -3.197 0.001520 **
## count7 -5.119669 0.864316 -5.923 7.88e-09 ***
## count8 -5.810753 0.732881 -7.929 3.41e-14 ***
## count9 -11.300278 1.559554 -7.246 3.03e-12 ***
## count10 -11.994008 0.788833 -15.205 < 2e-16 ***
## count11 -17.961347 1.265935 -14.188 < 2e-16 ***
## count12 2.086153 1.913545 1.090 0.276414
## count13 -11.757483 1.707576 -6.885 2.89e-11 ***
## count14 6.404991 1.737182 3.687 0.000265 ***
## count15 -12.249213 1.403775 -8.726 < 2e-16 ***
## count16 -11.155238 0.667400 -16.714 < 2e-16 ***
## govpart:hkcorp -0.094657 0.023048 -4.107 5.05e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.547 on 332 degrees of freedom
## Multiple R-squared: 0.9464, Adjusted R-squared: 0.9421
## F-statistic: 217.3 on 27 and 332 DF, p-value: < 2.2e-16
Rueda used panel corrected standard errors so, I’ll show you how you can do that in parallel with the original analysis.
library(pcse)
pmod <- pcse(mod, tmpNA$count, tmpNA$year)
summary(pmod)
##
## Results:
##
## Estimate PCSE t value Pr(>|t|)
## (Intercept) 21.998226520 1.192064080 18.45389597 7.845121e-53
## govpart 0.046705048 0.010371734 4.50310908 9.283994e-06
## hkcorp 0.238679581 1.910250066 0.12494677 9.006413e-01
## open -0.021053453 0.009852876 -2.13678250 3.334628e-02
## finop 0.215996881 0.083545106 2.58539240 1.015266e-02
## dccggx -0.019113688 0.346879227 -0.05510185 9.560905e-01
## unemp 0.382621252 0.048028293 7.96658027 2.633303e-14
## gdpgr -0.008314584 0.032484946 -0.25595193 7.981465e-01
## deca70 -0.874399029 0.357386593 -2.44664754 1.493734e-02
## deca80 0.004772191 0.294733642 0.01619154 9.870913e-01
## deca85 0.202725294 0.274291112 0.73908809 4.603757e-01
## deca90 -0.038592453 0.269175077 -0.14337305 8.860825e-01
## count2 -4.724492690 1.708922768 -2.76460281 6.017581e-03
## count3 -5.949257193 1.476335851 -4.02974512 6.927911e-05
## count4 -6.933322327 0.724668003 -9.56758447 2.592777e-19
## count5 2.435252926 1.360125504 1.79046192 7.429071e-02
## count6 -5.455874505 1.657760839 -3.29111074 1.105195e-03
## count7 -5.119669164 0.883387089 -5.79549919 1.583891e-08
## count8 -5.810752832 0.601261040 -9.66427632 1.243382e-19
## count9 -11.300278490 1.372714862 -8.23206538 4.258659e-15
## count10 -11.994008065 0.737942417 -16.25331163 4.108030e-44
## count11 -17.961346545 1.188884033 -15.10773637 1.295701e-39
## count12 2.086153424 1.717416409 1.21470449 2.253420e-01
## count13 -11.757483100 1.211022629 -9.70872288 8.857538e-20
## count14 6.404991050 1.670019790 3.83527853 1.500767e-04
## count15 -12.249213181 1.046692012 -11.70278653 1.014701e-26
## count16 -11.155237811 0.686801608 -16.24230008 4.540244e-44
## govpart:hkcorp -0.094657272 0.017947206 -5.27420661 2.409373e-07
##
## ---------------------------------------------
##
## # Valid Obs = 360; # Missing Obs = 8; Degrees of Freedom = 332.
mice
with the default settings, mice
using random forests (make sure that the time and year variables are in the model) as the imputation method and amelia
. After doing that, re-estimate the models with the three different imputed datasets. How do the three models compare?library(mice)
library(Amelia)
library(mitools)
mi.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "deca70", "deca80", "deca85", "deca90", "count")], m=50, printFlag=FALSE)
I talked in class about a way to make sure that transformations are respected in imputations. Imagine we wanted to use the log of dccggx
and unemp
in the imputation models. We also want to make sure that the interaction gets imputed the right way. Imputing these variables is done through what is called ‘passive imputation’. Here, we set the imputation method to the transformation we want to execute. We could do that with the following:
tmp$logunemp <- log(tmp$unemp)
tmp$logdc <- log(tmp$dccggx)
## run mice one time to get the imputation methods for everything
tmp.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor",
"open", "finop", "dccggx", "unemp", "gdpgr", "logunemp",
"logdc","deca70", "deca80", "deca85", "deca90", "year",
"count")], m=1, printFlag=FALSE)
pm <- tmp.imp$predictorMatrix
pm[,12:15] <- 0
imp.meth <- tmp.imp$method
imp.meth["logunemp"] <- "~log(unemp)"
imp.meth["logdc"] <- "~log(dccggx)"
imp.meth["govpxcor"] <- "~I(govpart*hkcorp)"
mi.imp2 <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor",
"open", "finop", "dccggx", "unemp", "gdpgr", "logunemp",
"logdc","deca70", "deca80", "deca85", "deca90",
"year", "count")],
m=50, predictorMatrix=pm, printFlag=FALSE)
Next, we could do the random forests imputation. For this we would have to make sure to add the year in, and we can put it in as a factor even if we want
# find variables with missing data
tmp$year <- as.factor(tmp$year)
tmp.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", "open",
"finop", "dccggx", "unemp", "gdpgr", "year", "count",
"deca70", "deca80", "deca85", "deca90")],
m=1, printFlag=FALSE)
## Warning: Number of logged events: 30
## since we've got all of the variables in propmiss, we simply want to set the
## imputation method for anything that has missing info to 'rf'
imp.methods <- tmp.imp$method
imp.methods <- gsub("pmm", "rf", imp.methods)
imp.methods["govpxcor"] <- "~I(govpart*hkcorp)"
pm <- tmp.imp$predictorMatrix
pm[,12:15] <- 0
rf.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", "open",
"finop", "dccggx", "unemp", "gdpgr", "year", "count",
"deca70", "deca80", "deca85", "deca90")],
methods=imp.methods, m=50, printFlag=FALSE,
predictorMatrix=pm)
Finally, we can use amelia
tmp.am <- x[,c("govem", "govpart", "hkcorp", "open", "finop",
"dccggx", "unemp", "gdpgr", "deca70", "deca80", "deca85",
"deca90", "year", "count")]
am.imp <- amelia(tmp.am, cs = "count", ts="year",
logs = c("dccggx", "unemp"), polytime=3, intercs=F, m=50, p2s=0,
idvars=c("deca70", "deca80", "deca85", "deca90"))
Now, we can re-estimate the models. I’m going to use MIcombine
for all of them, but you could use zelig
or lm.mids
, too.
# make the default mice datasets
mi.dats <- lapply(1:50, function(i)complete(mi.imp2, i))
rf.dats <- lapply(1:50, function(i)complete(rf.imp, i))
am.dats <- am.imp$imputations
mi.mods <- MIcombine(lapply(mi.dats, function(x)update(mod, data=x)))
rf.mods <- MIcombine(lapply(rf.dats, function(x)update(mod, data=x)))
am.mods <- MIcombine(lapply(am.dats, function(x)update(mod, data=x)))
If we wanted to do this with the panel corrected standard errors, it’s a touch more complicated
## first, we need to get the coefficients from all of the models
mi.b <- lapply(mi.dats, function(x)update(mod, data=x)$coef)
rf.b <- lapply(rf.dats, function(x)update(mod, data=x)$coef)
am.b <- lapply(am.dats, function(x)update(mod, data=x)$coef)
## then we need to get the PC variane-covariande matrices
mi.v <- lapply(1:50, function(x)pcse(update(mod, data=mi.dats[[x]]),
mi.dats[[x]]$count, mi.dats[[x]]$year)$vcov)
rf.v <- lapply(1:50, function(x)pcse(update(mod, data=rf.dats[[x]]),
rf.dats[[x]]$count, rf.dats[[x]]$year)$vcov)
am.v <- lapply(1:50, function(x)pcse(update(mod, data=am.dats[[x]]),
am.dats[[x]]$count, am.dats[[x]]$year)$vcov)
## finally, we can combine the coefficients and variances using Rubin's rules.
mi.pmods <- MIcombine(mi.b, mi.v)
rf.pmods <- MIcombine(rf.b, rf.v)
am.pmods <- MIcombine(am.b, am.v)
Next, we can summarize the models and compare to the original results
# summarize the models
summary(mod)
##
## Call:
## lm(formula = govem ~ govpart * hkcorp + open + finop + dccggx +
## unemp + gdpgr + deca70 + deca80 + deca85 + deca90 + count,
## data = tmpNA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1283 -0.7209 -0.1021 0.9666 4.1376
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.998227 1.449952 15.172 < 2e-16 ***
## govpart 0.046705 0.012530 3.727 0.000227 ***
## hkcorp 0.238680 1.990129 0.120 0.904610
## open -0.021053 0.017769 -1.185 0.236935
## finop 0.215997 0.099397 2.173 0.030482 *
## dccggx -0.019114 0.857783 -0.022 0.982236
## unemp 0.382621 0.049026 7.805 7.86e-14 ***
## gdpgr -0.008315 0.041912 -0.198 0.842867
## deca70 -0.874399 0.516221 -1.694 0.091233 .
## deca80 0.004772 0.470755 0.010 0.991918
## deca85 0.202725 0.444762 0.456 0.648828
## deca90 -0.038592 0.444986 -0.087 0.930941
## count2 -4.724493 1.907703 -2.477 0.013764 *
## count3 -5.949257 2.080673 -2.859 0.004514 **
## count4 -6.933322 0.753173 -9.205 < 2e-16 ***
## count5 2.435253 1.522843 1.599 0.110739
## count6 -5.455875 1.706360 -3.197 0.001520 **
## count7 -5.119669 0.864316 -5.923 7.88e-09 ***
## count8 -5.810753 0.732881 -7.929 3.41e-14 ***
## count9 -11.300278 1.559554 -7.246 3.03e-12 ***
## count10 -11.994008 0.788833 -15.205 < 2e-16 ***
## count11 -17.961347 1.265935 -14.188 < 2e-16 ***
## count12 2.086153 1.913545 1.090 0.276414
## count13 -11.757483 1.707576 -6.885 2.89e-11 ***
## count14 6.404991 1.737182 3.687 0.000265 ***
## count15 -12.249213 1.403775 -8.726 < 2e-16 ***
## count16 -11.155238 0.667400 -16.714 < 2e-16 ***
## govpart:hkcorp -0.094657 0.023048 -4.107 5.05e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.547 on 332 degrees of freedom
## Multiple R-squared: 0.9464, Adjusted R-squared: 0.9421
## F-statistic: 217.3 on 27 and 332 DF, p-value: < 2.2e-16
summary(mi.mods)
## Multiple imputation results:
## MIcombine.default(lapply(mi.dats, function(x) update(mod, data = x)))
## results se (lower upper) missInfo
## (Intercept) 22.45756839 1.54571064 19.428006821 25.48712995 2 %
## govpart 0.03228210 0.01313136 0.006545098 0.05801910 0 %
## hkcorp 0.40584934 2.13032246 -3.769510101 4.58120878 1 %
## open -0.02785975 0.01937942 -0.065845739 0.01012624 6 %
## finop 0.12132214 0.10515055 -0.084769332 0.32741362 1 %
## dccggx -0.07653779 1.03954428 -2.117131330 1.96405576 25 %
## unemp 0.36780414 0.05220229 0.265489034 0.47011925 1 %
## gdpgr -0.03337330 0.04458582 -0.120759904 0.05401330 0 %
## deca70 -1.18755104 0.55373546 -2.272872445 -0.10222964 3 %
## deca80 -0.13942052 0.50631053 -1.131778947 0.85293791 2 %
## deca85 0.11755473 0.47504671 -0.813519706 1.04862917 0 %
## deca90 -0.33079337 0.47380713 -1.259440229 0.59785350 1 %
## count2 -3.41656108 2.06349472 -7.461127898 0.62800573 4 %
## count3 -4.18277466 2.23450938 -8.562465368 0.19691605 4 %
## count4 -5.39131864 0.79045536 -6.940694533 -3.84194275 5 %
## count5 3.74509781 1.67306882 0.465412900 7.02478272 8 %
## count6 -4.28306465 1.86152303 -7.931911745 -0.63421756 6 %
## count7 -3.82540627 0.94268987 -5.673475022 -1.97733751 10 %
## count8 -4.43075905 0.77054622 -5.941079137 -2.92043896 5 %
## count9 -9.87967607 1.68792111 -13.188142109 -6.57121003 5 %
## count10 -10.68448352 0.81959814 -12.290866776 -9.07810027 0 %
## count11 -17.06131012 1.34660767 -19.700614228 -14.42200602 0 %
## count12 3.24004414 2.08145844 -0.839836812 7.31992509 5 %
## count13 -9.93210083 1.85210033 -13.562529387 -6.30167228 7 %
## count14 7.57114374 1.87252324 3.900951285 11.24133619 4 %
## count15 -11.02900340 1.52444506 -14.017362302 -8.04064450 8 %
## count16 -9.98472365 0.68710521 -11.331426158 -8.63802115 1 %
## govpart:hkcorp -0.07600218 0.02438433 -0.123794593 -0.02820976 0 %
## make the confidence intervals
ci.lwd <- confint(mod)
ci.mi <- summary(mi.mods)[,3:4]
## Multiple imputation results:
## MIcombine.default(lapply(mi.dats, function(x) update(mod, data = x)))
## results se (lower upper) missInfo
## (Intercept) 22.45756839 1.54571064 19.428006821 25.48712995 2 %
## govpart 0.03228210 0.01313136 0.006545098 0.05801910 0 %
## hkcorp 0.40584934 2.13032246 -3.769510101 4.58120878 1 %
## open -0.02785975 0.01937942 -0.065845739 0.01012624 6 %
## finop 0.12132214 0.10515055 -0.084769332 0.32741362 1 %
## dccggx -0.07653779 1.03954428 -2.117131330 1.96405576 25 %
## unemp 0.36780414 0.05220229 0.265489034 0.47011925 1 %
## gdpgr -0.03337330 0.04458582 -0.120759904 0.05401330 0 %
## deca70 -1.18755104 0.55373546 -2.272872445 -0.10222964 3 %
## deca80 -0.13942052 0.50631053 -1.131778947 0.85293791 2 %
## deca85 0.11755473 0.47504671 -0.813519706 1.04862917 0 %
## deca90 -0.33079337 0.47380713 -1.259440229 0.59785350 1 %
## count2 -3.41656108 2.06349472 -7.461127898 0.62800573 4 %
## count3 -4.18277466 2.23450938 -8.562465368 0.19691605 4 %
## count4 -5.39131864 0.79045536 -6.940694533 -3.84194275 5 %
## count5 3.74509781 1.67306882 0.465412900 7.02478272 8 %
## count6 -4.28306465 1.86152303 -7.931911745 -0.63421756 6 %
## count7 -3.82540627 0.94268987 -5.673475022 -1.97733751 10 %
## count8 -4.43075905 0.77054622 -5.941079137 -2.92043896 5 %
## count9 -9.87967607 1.68792111 -13.188142109 -6.57121003 5 %
## count10 -10.68448352 0.81959814 -12.290866776 -9.07810027 0 %
## count11 -17.06131012 1.34660767 -19.700614228 -14.42200602 0 %
## count12 3.24004414 2.08145844 -0.839836812 7.31992509 5 %
## count13 -9.93210083 1.85210033 -13.562529387 -6.30167228 7 %
## count14 7.57114374 1.87252324 3.900951285 11.24133619 4 %
## count15 -11.02900340 1.52444506 -14.017362302 -8.04064450 8 %
## count16 -9.98472365 0.68710521 -11.331426158 -8.63802115 1 %
## govpart:hkcorp -0.07600218 0.02438433 -0.123794593 -0.02820976 0 %
##identify the significant results
sig.lwd <- sign(ci.lwd[,1]) == sign(ci.lwd[,2])
sig.mi <- sign(ci.mi[,1]) == sign(ci.mi[,2])
## put the two significant results descriptions together.
cbind(sig.lwd, sig.mi)
## sig.lwd sig.mi
## (Intercept) TRUE TRUE
## govpart TRUE TRUE
## hkcorp FALSE FALSE
## open FALSE FALSE
## finop TRUE FALSE
## dccggx FALSE FALSE
## unemp TRUE TRUE
## gdpgr FALSE FALSE
## deca70 FALSE TRUE
## deca80 FALSE FALSE
## deca85 FALSE FALSE
## deca90 FALSE FALSE
## count2 TRUE FALSE
## count3 TRUE FALSE
## count4 TRUE TRUE
## count5 FALSE TRUE
## count6 TRUE TRUE
## count7 TRUE TRUE
## count8 TRUE TRUE
## count9 TRUE TRUE
## count10 TRUE TRUE
## count11 TRUE TRUE
## count12 FALSE FALSE
## count13 TRUE TRUE
## count14 TRUE TRUE
## count15 TRUE TRUE
## count16 TRUE TRUE
## govpart:hkcorp TRUE TRUE
If we were going to do this with the panel corrected standard errors:
library(lmtest)
ci.lwd <- coefci(mod, vcov.=pmod$vcov)
ci.mi <- summary(mi.pmods)[,3:4]
## Multiple imputation results:
## MIcombine.default(mi.b, mi.v)
## results se (lower upper)
## (Intercept) 22.45756839 1.61934257 19.283694056 25.631442715
## govpart 0.03228210 0.01335299 0.006110715 0.058453486
## hkcorp 0.40584934 2.40589853 -4.309628015 5.121326696
## open -0.02785975 0.01351665 -0.054360804 -0.001358697
## finop 0.12132214 0.10626639 -0.086956344 0.329600628
## dccggx -0.07653779 0.73919135 -1.534053954 1.380978383
## unemp 0.36780414 0.06055361 0.249120930 0.486487349
## gdpgr -0.03337330 0.04350026 -0.118632241 0.051885641
## deca70 -1.18755104 0.47201036 -2.112706374 -0.262395707
## deca80 -0.13942052 0.38991479 -0.903657022 0.624815986
## deca85 0.11755473 0.36379390 -0.595468213 0.830577677
## deca90 -0.33079337 0.37225366 -1.060401161 0.398814431
## count2 -3.41656108 2.25230787 -7.831150633 0.998028469
## count3 -4.18277466 1.91622019 -7.938707810 -0.426841503
## count4 -5.39131864 0.78355129 -6.927165796 -3.855471491
## count5 3.74509781 1.75362815 0.307589296 7.182606319
## count6 -4.28306465 2.09399546 -8.387451458 -0.178677848
## count7 -3.82540627 1.07500332 -5.932664372 -1.718148163
## count8 -4.43075905 0.52905508 -5.467926649 -3.393591445
## count9 -9.87967607 1.70601617 -13.223601429 -6.535750711
## count10 -10.68448352 0.87836058 -12.406038966 -8.962928081
## count11 -17.06131012 1.50058975 -20.002413137 -14.120207108
## count12 3.24004414 2.36556510 -1.396580834 7.876669111
## count13 -9.93210083 1.55934461 -12.988994574 -6.875207095
## count14 7.57114374 2.08928326 3.476141477 11.666145998
## count15 -11.02900340 1.43695912 -13.845990325 -8.212016474
## count16 -9.98472365 0.75979005 -11.473885552 -8.495561756
## govpart:hkcorp -0.07600218 0.02528089 -0.125551819 -0.026452533
## missInfo
## (Intercept) 2 %
## govpart 0 %
## hkcorp 0 %
## open 12 %
## finop 1 %
## dccggx 50 %
## unemp 1 %
## gdpgr 0 %
## deca70 4 %
## deca80 3 %
## deca85 0 %
## deca90 1 %
## count2 4 %
## count3 5 %
## count4 6 %
## count5 7 %
## count6 5 %
## count7 7 %
## count8 10 %
## count9 5 %
## count10 0 %
## count11 0 %
## count12 4 %
## count13 9 %
## count14 3 %
## count15 9 %
## count16 0 %
## govpart:hkcorp 0 %
##identify the significant results
sig.lwd <- sign(ci.lwd[,1]) == sign(ci.lwd[,2])
sig.mi <- sign(ci.mi[,1]) == sign(ci.mi[,2])
## put the two significant results descriptions together.
out <- cbind(sig.lwd, sig.mi[-c(1,28)])
rownames(out) <- names(sig.lwd)
out
## sig.lwd
## govpart TRUE TRUE
## hkcorp FALSE FALSE
## open TRUE TRUE
## finop TRUE FALSE
## dccggx FALSE FALSE
## unemp TRUE TRUE
## gdpgr FALSE FALSE
## deca70 TRUE TRUE
## deca80 FALSE FALSE
## deca85 FALSE FALSE
## deca90 FALSE FALSE
## count2 TRUE FALSE
## count3 TRUE TRUE
## count4 TRUE TRUE
## count5 FALSE TRUE
## count6 TRUE TRUE
## count7 TRUE TRUE
## count8 TRUE TRUE
## count9 TRUE TRUE
## count10 TRUE TRUE
## count11 TRUE TRUE
## count12 FALSE FALSE
## count13 TRUE TRUE
## count14 TRUE TRUE
## count15 TRUE TRUE
## count16 TRUE TRUE
Here, the coefficients of open
and finop
both become insignificant in the imputed data model. We could look at the same thing for the other two models. Here, I’ll just use the PCSE models.
ci.rf <- summary(rf.pmods)[,3:4]
## Multiple imputation results:
## MIcombine.default(rf.b, rf.v)
## results se (lower upper)
## (Intercept) 22.55512433 1.62711883 19.365961942 25.744286725
## govpart 0.03268335 0.01354147 0.006142508 0.059224188
## hkcorp 0.54387074 2.42080698 -4.200826008 5.288567494
## open -0.02387748 0.01369089 -0.050718679 0.002963721
## finop 0.12247198 0.10809579 -0.089392465 0.334336430
## dccggx -0.51577878 0.77533562 -2.043135917 1.011578363
## unemp 0.37241432 0.06094505 0.252963770 0.491864873
## gdpgr -0.03378632 0.04368057 -0.119398671 0.051826036
## deca70 -1.25758341 0.47195337 -2.182638183 -0.332528635
## deca80 -0.19427476 0.39140590 -0.961435227 0.572885712
## deca85 0.11686024 0.36620606 -0.600890462 0.834610950
## deca90 -0.29037389 0.37352715 -1.022479117 0.441731335
## count2 -3.79159259 2.27759585 -8.255734285 0.672549105
## count3 -4.54980194 1.95553809 -8.382795829 -0.716808057
## count4 -5.55753721 0.80562760 -7.136680179 -3.978394247
## count5 3.33004253 1.78619259 -0.171266844 6.831351894
## count6 -4.68060333 2.12398715 -8.843758485 -0.517448166
## count7 -4.08482476 1.09994268 -6.240984567 -1.928664957
## count8 -4.57782407 0.55176940 -5.659548711 -3.496099433
## count9 -10.20908865 1.74107549 -13.621719276 -6.796458030
## count10 -10.70265338 0.89074978 -12.448492111 -8.956814649
## count11 -17.14594141 1.51221611 -20.109832123 -14.182050705
## count12 2.81970580 2.38368170 -1.852412215 7.491823818
## count13 -10.34482258 1.60173306 -13.484737828 -7.204907329
## count14 7.26421949 2.11158191 3.125517260 11.402921728
## count15 -11.40950914 1.46027872 -14.272184923 -8.546833355
## count16 -10.00940791 0.77447486 -11.527355606 -8.491460213
## govpart:hkcorp -0.07577052 0.02537913 -0.125512742 -0.026028306
## missInfo
## (Intercept) 3 %
## govpart 1 %
## hkcorp 0 %
## open 11 %
## finop 1 %
## dccggx 46 %
## unemp 1 %
## gdpgr 0 %
## deca70 4 %
## deca80 3 %
## deca85 0 %
## deca90 2 %
## count2 4 %
## count3 5 %
## count4 6 %
## count5 7 %
## count6 5 %
## count7 8 %
## count8 10 %
## count9 5 %
## count10 1 %
## count11 0 %
## count12 4 %
## count13 9 %
## count14 3 %
## count15 9 %
## count16 1 %
## govpart:hkcorp 1 %
ci.am <- summary(am.pmods)[,3:4]
## Multiple imputation results:
## MIcombine.default(am.b, am.v)
## results se (lower upper) missInfo
## (Intercept) 36.36639556 3.428704430 29.64623175 43.08655937 1 %
## govpart -0.09135295 0.030688913 -0.15150261 -0.03120329 2 %
## hkcorp 0.10918964 0.891579497 -1.63829163 1.85667092 2 %
## open 0.02581194 0.005009416 0.01599247 0.03563142 7 %
## finop -1.25552436 0.257208591 -1.75964422 -0.75140450 0 %
## dccggx -6.89735924 0.851255590 -8.56670489 -5.22801359 15 %
## unemp 0.58444326 0.063871832 0.45925385 0.70963267 3 %
## gdpgr -0.14084036 0.144109469 -0.42328973 0.14160902 0 %
## deca70 -5.48061161 0.919559137 -7.28292048 -3.67830274 1 %
## deca80 -3.53968864 0.845824622 -5.19747552 -1.88190175 1 %
## deca85 -1.53130767 0.752320724 -3.00582923 -0.05678612 0 %
## deca90 0.47801675 0.708648656 -0.91091944 1.86695295 2 %
## count -0.27723414 0.029614499 -0.33527965 -0.21918863 4 %
## govpart:hkcorp 0.19579678 0.053592265 0.09075768 0.30083588 1 %
sig.rf <- sign(ci.rf[,1]) == sign(ci.rf[,2])
sig.am <- sign(ci.am[,1]) == sign(ci.am[,2])
cbind(out, sig.rf[-c(1,28)], sig.am[-c(1,28)])
## sig.lwd
## govpart TRUE TRUE TRUE TRUE
## hkcorp FALSE FALSE FALSE FALSE
## open TRUE TRUE FALSE TRUE
## finop TRUE FALSE FALSE TRUE
## dccggx FALSE FALSE FALSE TRUE
## unemp TRUE TRUE TRUE TRUE
## gdpgr FALSE FALSE FALSE FALSE
## deca70 TRUE TRUE TRUE TRUE
## deca80 FALSE FALSE FALSE TRUE
## deca85 FALSE FALSE FALSE TRUE
## deca90 FALSE FALSE FALSE FALSE
## count2 TRUE FALSE FALSE TRUE
## count3 TRUE TRUE TRUE TRUE
## count4 TRUE TRUE TRUE TRUE
## count5 FALSE TRUE FALSE FALSE
## count6 TRUE TRUE TRUE TRUE
## count7 TRUE TRUE TRUE TRUE
## count8 TRUE TRUE TRUE TRUE
## count9 TRUE TRUE TRUE TRUE
## count10 TRUE TRUE TRUE FALSE
## count11 TRUE TRUE TRUE TRUE
## count12 FALSE FALSE FALSE TRUE
## count13 TRUE TRUE TRUE TRUE
## count14 TRUE TRUE TRUE FALSE
## count15 TRUE TRUE TRUE TRUE
## count16 TRUE TRUE TRUE TRUE
Notice that the largest set of substantive variables is statistically significant using the amelia
imputations.
mice
method (with the default imputation methods) and with amelia
three methods. Is there any difference in what you find across the methods? Pick one missing data point from each variable with missing data and plot out the posterior distribution of the missing observations for the three different methods.First, we can do this for mice
. There’s an easier way of doing this than the way we talked about in class, at least with the mice
object:
## identify the variables with missing observations
missvars <- c("govem", "dccggx", "unemp")
# make the padded out data frame:
w <- is.na(tmp[,c("govem", "govpart", "hkcorp",
"govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp", "logdc",
"deca70", "deca80", "deca85", "deca90", "year", "count")])
w[,missvars] <- TRUE
mi.ppd <- update(mi.imp2, where=w, m=500, printFlag=F)
mi.comp <- lapply(1:500, function(x)complete(mi.ppd, x))
Now, we can make the ppds for amelia
newdat <- tmp[,c("govem", "govpart", "hkcorp",
"govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp", "logdc",
"deca70", "deca80", "deca85", "deca90", "year", "count")]
for(i in 1:length(missvars)){
tmp.dat <- tmp[,c("govem", "govpart", "hkcorp",
"govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp", "logdc",
"deca70", "deca80", "deca85", "deca90", "year", "count")]
tmp.dat[[missvars[i]]] <- NA
newdat <- rbind(newdat, tmp.dat)
}
am.ppd <- amelia(newdat, cs = "count", ts="year",
logs = c("dccggx", "unemp"), polytime=3, intercs=F, m=500, p2s=0,
idvars=c("deca70", "deca80", "deca85", "deca90"))
## Amelia Error Code: 37
## The following variable(s) are 'factors':
## year
## You may have wanted to set this as a ID variable to remove it
## from the imputation model or as an ordinal or nominal
## variable to be imputed. Please set it as either and
## try again.
d1 <- 1:368
d2 <- 369:736
d3 <- 737:1104
d4 <- 1105:1472
am.comp <- list()
for(i in 1:500){
tmp.comp <- am.ppd$imputations[[i]][d1, ]
tmp.comp[[missvars[1]]] <- am.ppd$imputations[[i]][d2, missvars[1]]
tmp.comp[[missvars[2]]] <- am.ppd$imputations[[i]][d3, missvars[2]]
tmp.comp[[missvars[3]]] <- am.ppd$imputations[[i]][d4, missvars[3]]
am.comp[[i]] <- tmp.comp
}
Next, we can estimate the models.
mi.orig <- update(mi.imp2, m=500)
mi.orig.comp <- lapply(1:500, function(i)complete(mi.orig, i))
l1 <- lapply(mi.orig.comp, function(x)update(mod, data=x))
l2 <- lapply(mi.comp, function(x)update(mod, data=x))
## save the coefficients from each model
b1 <- sapply(l1, coef)
b2 <- sapply(l2, coef)
## fin pval as the proportion of times that b1 > b2
diff <- t(b1) - t(b2)
pval <- apply(diff, 2, function(x)mean(x > 0))
## replace one-sided p-value with two-sided p-value
pval <- 2*ifelse(pval > .5, 1-pval, pval)
pval
## (Intercept) govpart hkcorp open finop
## 0.420 0.748 0.752 0.604 0.804
## dccggx unemp gdpgr deca70 deca80
## 0.344 0.760 0.876 0.392 0.388
## deca85 deca90 count2 count3 count4
## 0.300 0.920 0.928 0.884 0.520
## count5 count6 count7 count8 count9
## 0.932 0.964 0.660 0.520 0.936
## count10 count11 count12 count13 count14
## 0.944 0.936 0.808 0.840 0.980
## count15 count16 govpart:hkcorp
## 0.756 0.780 0.792
Next, we could do the omnibus test
coef <- colMeans(diff)[c(1:8,28)]
v <- var(diff[,c(1:8,28)])
library(car)
linearHypothesis(model=list(df.residual=NULL),
hypothesis.matrix=diag(length(coef)),
vcov.=v, coef.=coef, rhs=rep(0, length(coef)))
## Linear hypothesis test
##
## Hypothesis:
## (Intercept) = 0
## govpart = 0
## hkcorp = 0
## open = 0
## finop = 0
## dccggx = 0
## unemp = 0
## gdpgr = 0
## govpart:hkcorp = 0
##
## Model 1: restricted model
## Model 2: list(df.residual = NULL)
##
## Note: Coefficient covariance matrix supplied.
##
## Df Chisq Pr(>Chisq)
## 1
## 2 9 2.6501 0.9766
Now, we could do the same thing for amelia
am.orig.comp <- lapply(1:500, function(i)am.ppd$imputations[[i]][d1, ])
l1 <- lapply(am.orig.comp, function(x)update(mod, data=x))
## Error in eval(predvars, data, env): object 'govem' not found
l2 <- lapply(am.comp, function(x)update(mod, data=x))
## save the coefficients from each model
b1 <- sapply(l1, coef)
b2 <- sapply(l2, coef)
## fin pval as the proportion of times that b1 > b2
diff <- t(b1) - t(b2)
## Error in t(b1) - t(b2): non-numeric argument to binary operator
pval <- apply(diff, 2, function(x)mean(x > 0))
## replace one-sided p-value with two-sided p-value
pval <- 2*ifelse(pval > .5, 1-pval, pval)
pval
## (Intercept) govpart hkcorp open finop
## 0.420 0.748 0.752 0.604 0.804
## dccggx unemp gdpgr deca70 deca80
## 0.344 0.760 0.876 0.392 0.388
## deca85 deca90 count2 count3 count4
## 0.300 0.920 0.928 0.884 0.520
## count5 count6 count7 count8 count9
## 0.932 0.964 0.660 0.520 0.936
## count10 count11 count12 count13 count14
## 0.944 0.936 0.808 0.840 0.980
## count15 count16 govpart:hkcorp
## 0.756 0.780 0.792
Notice, we do quite a bit better here - the quality of these imputations is higher.
coef <- colMeans(diff)[c(1:8,28)]
v <- var(diff[,c(1:8,28)])
linearHypothesis(model=list(df.residual=NULL),
hypothesis.matrix=diag(length(coef)),
vcov.=v, coef.=coef, rhs=rep(0, length(coef)))
## Linear hypothesis test
##
## Hypothesis:
## (Intercept) = 0
## govpart = 0
## hkcorp = 0
## open = 0
## finop = 0
## dccggx = 0
## unemp = 0
## gdpgr = 0
## govpart:hkcorp = 0
##
## Model 1: restricted model
## Model 2: list(df.residual = NULL)
##
## Note: Coefficient covariance matrix supplied.
##
## Df Chisq Pr(>Chisq)
## 1
## 2 9 2.6501 0.9766
The omnibus result, though is the same. This would probably suggest trying to fit a better imputation model. The rf
model might work, but it will take a pretty long time to run. If you want, try this at home when you’ve got some time to let your computer run.
Using lab3b_data.dta
, do multiple imputation (with 5 imputations) on all of the variables in the dataset. You can download the data with:
library(haven)
dat <- read_dta('http://www.quantoid.net/files/reg3/lab3b_data.dta')
searchVarLabels(dat, "")
## ind
## indsocial 1
## indspend 2
## dem_edugroup 3
## dem_agegrp 4
## gender_respondent 5
## libcpre_self 6
## inc_incgroup_pre 7
## relig_chmember 8
## label
## indsocial Index of social liberalism-conservatism (higher values more conservative)
## indspend Index of economic liberalism-conservatism (higher values more conservative)
## dem_edugroup Education group of respondent
## dem_agegrp Age group of respondent
## gender_respondent Respondent's Gender
## libcpre_self libcpre_self
## inc_incgroup_pre Income group (pre-election survey)
## relig_chmember Member of a religious congregation/denomination
dat$dem_edugroup <- as_factor(dat$dem_edugroup)
levels(dat$dem_edugroup) <- c("ltHS", "HS", "Some Coll", "BA", "Grad Deg")
dat$dem_agegrp <- as_factor(dat$dem_agegrp)
dat$dem_agegrp <- car:::recode(dat$dem_agegrp,
'c("17-20", "21-24", "25-29") = "17-30"; c("30-34", "35-39", "40-44", "45-49") = "30-49";
c("50-54", "55-59", "60-64", "65-69") = "50-69"; c("70-74", "Over 75") = "70+"',
as.factor = TRUE)
dat$gender_respondent <- as_factor(dat$gender_respondent)
dat <- as.data.frame(dat)
mice
, mice
with random forest imputation and amelia
each with 5 imputations. Are there differences between the models?mod <- lm(libcpre_self ~ ., data=dat)
summary(mod)
##
## Call:
## lm(formula = libcpre_self ~ ., data = dat)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -4.8691 -0.7613 0.0114 0.7620 4.7738
##
## Labels:
## value label
## 1 ex liberal
## 2 liberal
## 3 slightly liberal
## 4 moderate
## 5 slightly conservative
## 6 conservative
## 7 ex conservative
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5559259 0.1583679 28.768 < 2e-16 ***
## indsocial 1.0665836 0.0463205 23.026 < 2e-16 ***
## indspend 0.8219730 0.1041089 7.895 4.74e-15 ***
## dem_edugroupHS 0.0789380 0.1007610 0.783 0.433475
## dem_edugroupSome Coll 0.0503617 0.0992494 0.507 0.611912
## dem_edugroupBA -0.0302927 0.1109651 -0.273 0.784887
## dem_edugroupGrad Deg -0.2344944 0.1193196 -1.965 0.049522 *
## dem_agegrp30-49 0.1565359 0.0797136 1.964 0.049700 *
## dem_agegrp50-69 0.2867809 0.0768760 3.730 0.000196 ***
## dem_agegrp70+ 0.4007806 0.1062326 3.773 0.000166 ***
## gender_respondent3 -0.1360355 0.0520209 -2.615 0.008990 **
## inc_incgroup_pre 0.0009367 0.0034512 0.271 0.786098
## relig_chmember -0.2389988 0.0526998 -4.535 6.10e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.135 on 1994 degrees of freedom
## (3909 observations deleted due to missingness)
## Multiple R-squared: 0.3513, Adjusted R-squared: 0.3474
## F-statistic: 89.97 on 12 and 1994 DF, p-value: < 2.2e-16
Now, we can do the imputations
nes.mi <- mice(dat, m=5, printFlag=F)
## Warning: Number of logged events: 175
meths <- nes.mi$method
meths <- gsub("[a-z]*", "rf", meths)
nes.rf <- mice(dat, m=5, method=meths, printFlag=F)
Next, we can do amelia
nes.am <- amelia(dat, m=5, noms=c("dem_edugroup", "dem_agegrp", "gender_respondent", "relig_chmember"))
nes.mi.dats <- lapply(1:5, function(i)complete(nes.mi, i))
nes.rf.dats <- lapply(1:5, function(i)complete(nes.rf, i))
nes.am.dats <- nes.am$imputations
nes.mi.mods <- MIcombine(lapply(nes.mi.dats, function(x)update(mod, data=x)))
nes.rf.mods <- MIcombine(lapply(nes.rf.dats, function(x)update(mod, data=x)))
nes.am.mods <- MIcombine(lapply(nes.am.dats, function(x)update(mod, data=x)))
summary(nes.mi.mods)
## Multiple imputation results:
## MIcombine.default(lapply(nes.mi.dats, function(x) update(mod,
## data = x)))
## results se (lower upper)
## (Intercept) 4.476488151 0.153786682 4.135676792 4.817299511
## indsocial 1.018084613 0.027768925 0.963209026 1.072960200
## indspend 0.835239750 0.063007374 0.711504886 0.958974613
## dem_edugroupHS 0.154267321 0.064955362 0.025333865 0.283200777
## dem_edugroupSome Coll 0.226252140 0.063547646 0.100046033 0.352458248
## dem_edugroupBA 0.219961128 0.080848033 0.054653387 0.385268869
## dem_edugroupGrad Deg -0.056836794 0.078740418 -0.212749651 0.099076062
## dem_agegrp30-49 0.177983590 0.062185655 0.050525790 0.305441390
## dem_agegrp50-69 0.272021943 0.059727734 0.149768489 0.394275397
## dem_agegrp70+ 0.309883486 0.067836466 0.175597639 0.444169332
## gender_respondent3 -0.071787623 0.033538213 -0.137710556 -0.005864690
## inc_incgroup_pre 0.001328402 0.002502264 -0.003658249 0.006315052
## relig_chmember -0.211864312 0.053930061 -0.332501870 -0.091226754
## missInfo
## (Intercept) 68 %
## indsocial 18 %
## indspend 8 %
## dem_edugroupHS 22 %
## dem_edugroupSome Coll 22 %
## dem_edugroupBA 41 %
## dem_edugroupGrad Deg 20 %
## dem_agegrp30-49 42 %
## dem_agegrp50-69 41 %
## dem_agegrp70+ 19 %
## gender_respondent3 10 %
## inc_incgroup_pre 25 %
## relig_chmember 70 %
summary(nes.rf.mods)
## Multiple imputation results:
## MIcombine.default(lapply(nes.rf.dats, function(x) update(mod,
## data = x)))
## results se (lower upper)
## (Intercept) 4.419201308 0.149555613 4.089746973 4.748655643
## indsocial 0.955566023 0.027057161 0.902351769 1.008780277
## indspend 0.766254857 0.064762962 0.638795109 0.893714604
## dem_edugroupHS 0.119456480 0.064959336 -0.009231225 0.248144184
## dem_edugroupSome Coll 0.173367971 0.066451081 0.040375778 0.306360164
## dem_edugroupBA 0.148272323 0.072370921 0.004645913 0.291898734
## dem_edugroupGrad Deg -0.138182158 0.082555452 -0.303039173 0.026674858
## dem_agegrp30-49 0.157002599 0.053358372 0.051779053 0.262226145
## dem_agegrp50-69 0.242098388 0.054821695 0.132586805 0.351609971
## dem_agegrp70+ 0.290117904 0.068432180 0.154536719 0.425699089
## gender_respondent3 -0.035480157 0.034063542 -0.102531245 0.031570930
## inc_incgroup_pre 0.001159069 0.002716066 -0.004372040 0.006690177
## relig_chmember -0.131327074 0.064862031 -0.284506896 0.021852749
## missInfo
## (Intercept) 66 %
## indsocial 11 %
## indspend 12 %
## dem_edugroupHS 20 %
## dem_edugroupSome Coll 29 %
## dem_edugroupBA 22 %
## dem_edugroupGrad Deg 27 %
## dem_agegrp30-49 15 %
## dem_agegrp50-69 27 %
## dem_agegrp70+ 20 %
## gender_respondent3 13 %
## inc_incgroup_pre 39 %
## relig_chmember 80 %
summary(nes.am.mods)
## Multiple imputation results:
## MIcombine.default(lapply(nes.am.dats, function(x) update(mod,
## data = x)))
## results se (lower upper)
## (Intercept) 4.43989793 0.136654033 4.145925781 4.733870082
## indsocial 1.01875665 0.026988829 0.965603446 1.071909861
## indspend 0.83001335 0.069207972 0.691818911 0.968207783
## dem_edugroupHS 0.14265378 0.073133182 -0.007106265 0.292413827
## dem_edugroupSome Coll 0.21334760 0.077760351 0.049843393 0.376851814
## dem_edugroupBA 0.20984383 0.081835736 0.041686163 0.378001506
## dem_edugroupGrad Deg -0.06787562 0.087081292 -0.244728675 0.108977436
## dem_agegrp30-49 0.16693824 0.056292182 0.054410531 0.279465944
## dem_agegrp50-69 0.27262715 0.049318996 0.175675040 0.369579267
## dem_agegrp70+ 0.32197343 0.061943746 0.200515046 0.443431815
## gender_respondent3 -0.05637915 0.035159033 -0.126082669 0.013324366
## inc_incgroup_pre 0.00198976 0.002297375 -0.002528912 0.006508431
## relig_chmember -0.19424941 0.086571314 -0.412583695 0.024084873
## missInfo
## (Intercept) 60 %
## indsocial 13 %
## indspend 27 %
## dem_edugroupHS 42 %
## dem_edugroupSome Coll 52 %
## dem_edugroupBA 43 %
## dem_edugroupGrad Deg 37 %
## dem_agegrp30-49 28 %
## dem_agegrp50-69 10 %
## dem_agegrp70+ 4 %
## gender_respondent3 21 %
## inc_incgroup_pre 11 %
## relig_chmember 90 %
mi.mi <- as.numeric(gsub(" %", "", summary(nes.mi.mods)[,5]))
## Multiple imputation results:
## MIcombine.default(lapply(nes.mi.dats, function(x) update(mod,
## data = x)))
## results se (lower upper)
## (Intercept) 4.476488151 0.153786682 4.135676792 4.817299511
## indsocial 1.018084613 0.027768925 0.963209026 1.072960200
## indspend 0.835239750 0.063007374 0.711504886 0.958974613
## dem_edugroupHS 0.154267321 0.064955362 0.025333865 0.283200777
## dem_edugroupSome Coll 0.226252140 0.063547646 0.100046033 0.352458248
## dem_edugroupBA 0.219961128 0.080848033 0.054653387 0.385268869
## dem_edugroupGrad Deg -0.056836794 0.078740418 -0.212749651 0.099076062
## dem_agegrp30-49 0.177983590 0.062185655 0.050525790 0.305441390
## dem_agegrp50-69 0.272021943 0.059727734 0.149768489 0.394275397
## dem_agegrp70+ 0.309883486 0.067836466 0.175597639 0.444169332
## gender_respondent3 -0.071787623 0.033538213 -0.137710556 -0.005864690
## inc_incgroup_pre 0.001328402 0.002502264 -0.003658249 0.006315052
## relig_chmember -0.211864312 0.053930061 -0.332501870 -0.091226754
## missInfo
## (Intercept) 68 %
## indsocial 18 %
## indspend 8 %
## dem_edugroupHS 22 %
## dem_edugroupSome Coll 22 %
## dem_edugroupBA 41 %
## dem_edugroupGrad Deg 20 %
## dem_agegrp30-49 42 %
## dem_agegrp50-69 41 %
## dem_agegrp70+ 19 %
## gender_respondent3 10 %
## inc_incgroup_pre 25 %
## relig_chmember 70 %
rf.mi <- as.numeric(gsub(" %", "", summary(nes.rf.mods)[,5]))
## Multiple imputation results:
## MIcombine.default(lapply(nes.rf.dats, function(x) update(mod,
## data = x)))
## results se (lower upper)
## (Intercept) 4.419201308 0.149555613 4.089746973 4.748655643
## indsocial 0.955566023 0.027057161 0.902351769 1.008780277
## indspend 0.766254857 0.064762962 0.638795109 0.893714604
## dem_edugroupHS 0.119456480 0.064959336 -0.009231225 0.248144184
## dem_edugroupSome Coll 0.173367971 0.066451081 0.040375778 0.306360164
## dem_edugroupBA 0.148272323 0.072370921 0.004645913 0.291898734
## dem_edugroupGrad Deg -0.138182158 0.082555452 -0.303039173 0.026674858
## dem_agegrp30-49 0.157002599 0.053358372 0.051779053 0.262226145
## dem_agegrp50-69 0.242098388 0.054821695 0.132586805 0.351609971
## dem_agegrp70+ 0.290117904 0.068432180 0.154536719 0.425699089
## gender_respondent3 -0.035480157 0.034063542 -0.102531245 0.031570930
## inc_incgroup_pre 0.001159069 0.002716066 -0.004372040 0.006690177
## relig_chmember -0.131327074 0.064862031 -0.284506896 0.021852749
## missInfo
## (Intercept) 66 %
## indsocial 11 %
## indspend 12 %
## dem_edugroupHS 20 %
## dem_edugroupSome Coll 29 %
## dem_edugroupBA 22 %
## dem_edugroupGrad Deg 27 %
## dem_agegrp30-49 15 %
## dem_agegrp50-69 27 %
## dem_agegrp70+ 20 %
## gender_respondent3 13 %
## inc_incgroup_pre 39 %
## relig_chmember 80 %
am.mi <- as.numeric(gsub(" %", "", summary(nes.am.mods)[,5]))
## Multiple imputation results:
## MIcombine.default(lapply(nes.am.dats, function(x) update(mod,
## data = x)))
## results se (lower upper)
## (Intercept) 4.43989793 0.136654033 4.145925781 4.733870082
## indsocial 1.01875665 0.026988829 0.965603446 1.071909861
## indspend 0.83001335 0.069207972 0.691818911 0.968207783
## dem_edugroupHS 0.14265378 0.073133182 -0.007106265 0.292413827
## dem_edugroupSome Coll 0.21334760 0.077760351 0.049843393 0.376851814
## dem_edugroupBA 0.20984383 0.081835736 0.041686163 0.378001506
## dem_edugroupGrad Deg -0.06787562 0.087081292 -0.244728675 0.108977436
## dem_agegrp30-49 0.16693824 0.056292182 0.054410531 0.279465944
## dem_agegrp50-69 0.27262715 0.049318996 0.175675040 0.369579267
## dem_agegrp70+ 0.32197343 0.061943746 0.200515046 0.443431815
## gender_respondent3 -0.05637915 0.035159033 -0.126082669 0.013324366
## inc_incgroup_pre 0.00198976 0.002297375 -0.002528912 0.006508431
## relig_chmember -0.19424941 0.086571314 -0.412583695 0.024084873
## missInfo
## (Intercept) 60 %
## indsocial 13 %
## indspend 27 %
## dem_edugroupHS 42 %
## dem_edugroupSome Coll 52 %
## dem_edugroupBA 43 %
## dem_edugroupGrad Deg 37 %
## dem_agegrp30-49 28 %
## dem_agegrp50-69 10 %
## dem_agegrp70+ 4 %
## gender_respondent3 21 %
## inc_incgroup_pre 11 %
## relig_chmember 90 %
# show the missing info for the mi and rf models relative to amelia
mi.mi/am.mi
## [1] 1.1333333 1.3846154 0.2962963 0.5238095 0.4230769 0.9534884 0.5405405
## [8] 1.5000000 4.1000000 4.7500000 0.4761905 2.2727273 0.7777778
rf.mi/am.mi
## [1] 1.1000000 0.8461538 0.4444444 0.4761905 0.5576923 0.5116279 0.7297297
## [8] 0.5357143 2.7000000 5.0000000 0.6190476 3.5454545 0.8888889
mean(mi.mi/am.mi)
## [1] 1.471681
mean(rf.mi/am.mi)
## [1] 1.38115
This suggests that with either of the mice
impuation methods, the amelia
imputations generate lower amounts of missing information. If you looked at mi.mi/rf.mi
you would find that the average is in favor of the default mice
model.
Repeat above with m=25
.
Using the data above, estimate a finite mixture model where you assume that indsocial
and indspend
operationalize the two different theories (don’t use relig_chmember
and inc_incgroup_pre
in the models, yet.) Evaluate the resulting model against the linear model where there is an interaction between indsocial
and indspend
including the other controls.
libcpre_self_num
on indsocial
(a composite of social policy attitudes) and indspend
(a composite of spending policy attitudes), their interaction and the controls gender_respondent
, dem_edugroup
and dem_agegrp_num
.library(flexmix)
dat <- read.dta('http://www.quantoid.net/files/reg3/lab3_data.dta')
dat$libcpre_self_num <- as.numeric(dat$libcpre_self)
dat$dem_agegrp_num <- as.numeric(dat$dem_agegrp)
mod <- lm(libcpre_self_num ~ indsocial*indspend + gender_respondent + dem_edugroup + dem_agegrp_num, data=dat)
DAintfun2(mod, c("indsocial", "indspend"), hist=T, scale.hist=.3)
indsocial
is the variable of interest in one component and indspend
is the variable of interest in the other. That is set the indspend
coefficient to zero in the model with indsocial
and indsocial
’s coefficient to zero in the model with indspend
. Fix the coefficients on the other regressors to be constant across the two components.model <- FLXMRglmfix(family = "gaussian", nested=list(k=c(1,1),
formula = c(~indsocial,
~indspend)), fixed= ~ dem_agegrp_num + gender_respondent + dem_edugroup)
out.a <- stepFlexmix(libcpre_self_num ~ 1 , k=2, model=model, data=dat, nrep=20)
## 2 : * * * * * * * * * * * * * * * * * * * *
mod.refit <- refit(out.a)
fit1 <- predict(out.a)$Comp.1[[1]]
fit2 <- predict(out.a)$Comp.2[[1]]
post <- out.a@posterior$scaled
predfix1 <- rowSums(cbind(fit1, fit2)*post)
## Correlations of fitted values and observed values from the mixture model
cor(predfix1, dat$libcpre_self)^2
## [1] 0.6004018
## Correlation using only the fits from best predicted theory
predfix2 <- fit1
predfix2[which(post[,2] > .5)] <- fit2[which(post[,2] > .5)]
cor(predfix2, dat$libcpre_self)^2
## [,1]
## [1,] 0.5591659
## correlations of fitted values and observed from the linear model
cor(mod$fitted, dat$libcpre_self)^2
## [1] 0.2213779
inc_incgroup_pre
) and church membership (a social predictor operationalized by relig_chmember
) might tell us something about the probabilities of being in one or the other group. Incorporate that information and see whether, in fact, they do provide information about group membership.out.b <- stepFlexmix(libcpre_self ~ 1 , k=2, model=model, data=dat, nrep=20,
concomitant = FLXPmultinom( ~ inc_incgroup_pre + relig_chmember))
## 2 : * * * * * * * * * * * * * * * * * * * *
out.b.refit <- refit(out.b)
out.b.refit@concomitant
## $Comp.2
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.558476 0.233169 -2.3952 0.01661 *
## inc_incgroup_pre -0.052821 0.010526 -5.0183 5.214e-07 ***
## relig_chmember2. No 0.298105 0.164661 1.8104 0.07023 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1