Missing Data and Multiple Imputation

I’ve got two examples here - one purely cross-sectional and one time-series cross-sectional. The TSCS one is a study by David Rueda (2008), which was replicated by Lall in his 2016 piece. The second is from the 2014 American National Election Study.

Rudea Replication

For the sake of this exercise, we’re going to replicate one of Rueda’s models and assess the quality of the imputations. You can read in and manage the data like below. This puts an object named x in your workspace that contains the data.

f <- file("https://quantoid.net/files/reg3/rueda.rda")
load(f)
close(f)
tmp <- x[,c("country", "year", "govem", "govpart", "hkcorp", 
"govpxcor", "open", "finop", "dccggx", "unemp", 
"gdpgr", "deca70", "deca80", "deca85", "deca90", "count")]
tmp$count <- as.factor(tmp$count)

Next we can make the listwise deleted data. We can do this as follows:

tmpNA <- na.omit(tmp[,c("country", "year", "govem", "govpart", "hkcorp", 
"govpxcor", "open", "finop", "dccggx", "unemp", 
"gdpgr", "deca70", "deca80", "deca85", "deca90", "count")])
  1. Estimate the model of govem (government employment) on govpart (cabinet partisanship), hkcorp (corporatism), govpart*hkcorp (the interaction of cabinet partisanship and corporatism), open (internal openness), finop (financial openness), dccggx (government debt), unemp (unemployment), gdpgr (gdp growth), deca70, deca80, deca85, deca90 and count (year and country variables).
mod <- lm(govem ~ govpart*hkcorp  + open + finop + dccggx + 
          unemp + gdpgr + deca70 + deca80 + deca85 + deca90 + count, data=tmpNA)
summary(mod)
## 
## Call:
## lm(formula = govem ~ govpart * hkcorp + open + finop + dccggx + 
##     unemp + gdpgr + deca70 + deca80 + deca85 + deca90 + count, 
##     data = tmpNA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1283 -0.7209 -0.1021  0.9666  4.1376 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     21.998227   1.449952  15.172  < 2e-16 ***
## govpart          0.046705   0.012530   3.727 0.000227 ***
## hkcorp           0.238680   1.990129   0.120 0.904610    
## open            -0.021053   0.017769  -1.185 0.236935    
## finop            0.215997   0.099397   2.173 0.030482 *  
## dccggx          -0.019114   0.857783  -0.022 0.982236    
## unemp            0.382621   0.049026   7.805 7.86e-14 ***
## gdpgr           -0.008315   0.041912  -0.198 0.842867    
## deca70          -0.874399   0.516221  -1.694 0.091233 .  
## deca80           0.004772   0.470755   0.010 0.991918    
## deca85           0.202725   0.444762   0.456 0.648828    
## deca90          -0.038592   0.444986  -0.087 0.930941    
## count2          -4.724493   1.907703  -2.477 0.013764 *  
## count3          -5.949257   2.080673  -2.859 0.004514 ** 
## count4          -6.933322   0.753173  -9.205  < 2e-16 ***
## count5           2.435253   1.522843   1.599 0.110739    
## count6          -5.455875   1.706360  -3.197 0.001520 ** 
## count7          -5.119669   0.864316  -5.923 7.88e-09 ***
## count8          -5.810753   0.732881  -7.929 3.41e-14 ***
## count9         -11.300278   1.559554  -7.246 3.03e-12 ***
## count10        -11.994008   0.788833 -15.205  < 2e-16 ***
## count11        -17.961347   1.265935 -14.188  < 2e-16 ***
## count12          2.086153   1.913545   1.090 0.276414    
## count13        -11.757483   1.707576  -6.885 2.89e-11 ***
## count14          6.404991   1.737182   3.687 0.000265 ***
## count15        -12.249213   1.403775  -8.726  < 2e-16 ***
## count16        -11.155238   0.667400 -16.714  < 2e-16 ***
## govpart:hkcorp  -0.094657   0.023048  -4.107 5.05e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.547 on 332 degrees of freedom
## Multiple R-squared:  0.9464, Adjusted R-squared:  0.9421 
## F-statistic: 217.3 on 27 and 332 DF,  p-value: < 2.2e-16

Rueda used panel corrected standard errors so, I’ll show you how you can do that in parallel with the original analysis.

library(pcse)
pmod <- pcse(mod, tmpNA$count, tmpNA$year)
summary(pmod)
## 
##  Results: 
##  
##                     Estimate        PCSE      t value     Pr(>|t|)
## (Intercept)     21.998226520 1.192064080  18.45389597 7.845121e-53
## govpart          0.046705048 0.010371734   4.50310908 9.283994e-06
## hkcorp           0.238679581 1.910250066   0.12494677 9.006413e-01
## open            -0.021053453 0.009852876  -2.13678250 3.334628e-02
## finop            0.215996881 0.083545106   2.58539240 1.015266e-02
## dccggx          -0.019113688 0.346879227  -0.05510185 9.560905e-01
## unemp            0.382621252 0.048028293   7.96658027 2.633303e-14
## gdpgr           -0.008314584 0.032484946  -0.25595193 7.981465e-01
## deca70          -0.874399029 0.357386593  -2.44664754 1.493734e-02
## deca80           0.004772191 0.294733642   0.01619154 9.870913e-01
## deca85           0.202725294 0.274291112   0.73908809 4.603757e-01
## deca90          -0.038592453 0.269175077  -0.14337305 8.860825e-01
## count2          -4.724492690 1.708922768  -2.76460281 6.017581e-03
## count3          -5.949257193 1.476335851  -4.02974512 6.927911e-05
## count4          -6.933322327 0.724668003  -9.56758447 2.592777e-19
## count5           2.435252926 1.360125504   1.79046192 7.429071e-02
## count6          -5.455874505 1.657760839  -3.29111074 1.105195e-03
## count7          -5.119669164 0.883387089  -5.79549919 1.583891e-08
## count8          -5.810752832 0.601261040  -9.66427632 1.243382e-19
## count9         -11.300278490 1.372714862  -8.23206538 4.258659e-15
## count10        -11.994008065 0.737942417 -16.25331163 4.108030e-44
## count11        -17.961346545 1.188884033 -15.10773637 1.295701e-39
## count12          2.086153424 1.717416409   1.21470449 2.253420e-01
## count13        -11.757483100 1.211022629  -9.70872288 8.857538e-20
## count14          6.404991050 1.670019790   3.83527853 1.500767e-04
## count15        -12.249213181 1.046692012 -11.70278653 1.014701e-26
## count16        -11.155237811 0.686801608 -16.24230008 4.540244e-44
## govpart:hkcorp  -0.094657272 0.017947206  -5.27420661 2.409373e-07
## 
##  --------------------------------------------- 
##  
## # Valid Obs = 360; # Missing Obs = 8; Degrees of Freedom = 332.
  1. Impute the data three different ways - mice with the default settings, mice using random forests (make sure that the time and year variables are in the model) as the imputation method and amelia. After doing that, re-estimate the models with the three different imputed datasets. How do the three models compare?
library(mice)
library(Amelia)
library(mitools)
mi.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "deca70", "deca80", "deca85", "deca90", "count")], m=50, printFlag=FALSE)

I talked in class about a way to make sure that transformations are respected in imputations. Imagine we wanted to use the log of dccggx and unemp in the imputation models. We also want to make sure that the interaction gets imputed the right way. Imputing these variables is done through what is called ‘passive imputation’. Here, we set the imputation method to the transformation we want to execute. We could do that with the following:

tmp$logunemp <- log(tmp$unemp)
tmp$logdc <- log(tmp$dccggx)
## run mice one time to get the imputation methods for everything
tmp.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", 
        "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp",
        "logdc","deca70", "deca80", "deca85", "deca90", "year",
         "count")], m=1, printFlag=FALSE)
pm <- tmp.imp$predictorMatrix
pm[,12:15] <- 0
imp.meth <- tmp.imp$method
imp.meth["logunemp"] <- "~log(unemp)"
imp.meth["logdc"] <- "~log(dccggx)"
imp.meth["govpxcor"] <- "~I(govpart*hkcorp)"
mi.imp2 <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", 
        "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp",
        "logdc","deca70", "deca80", "deca85", "deca90", 
        "year", "count")], 
        m=50, predictorMatrix=pm, printFlag=FALSE)

Next, we could do the random forests imputation. For this we would have to make sure to add the year in, and we can put it in as a factor even if we want

# find variables with missing data
tmp$year <- as.factor(tmp$year)
tmp.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", "open",
        "finop", "dccggx", "unemp", "gdpgr", "year", "count", 
        "deca70", "deca80", "deca85", "deca90")], 
        m=1, printFlag=FALSE)
## Warning: Number of logged events: 30
## since we've got all of the variables in propmiss, we simply want to set the 
## imputation method for anything that has missing info to 'rf'
imp.methods <- tmp.imp$method
imp.methods <- gsub("pmm", "rf", imp.methods)
imp.methods["govpxcor"] <- "~I(govpart*hkcorp)"
pm <- tmp.imp$predictorMatrix
pm[,12:15] <- 0

rf.imp <- mice(tmp[,c("govem", "govpart", "hkcorp", "govpxcor", "open",
        "finop", "dccggx", "unemp", "gdpgr", "year", "count", 
        "deca70", "deca80", "deca85", "deca90")], 
        methods=imp.methods, m=50, printFlag=FALSE, 
        predictorMatrix=pm)

Finally, we can use amelia

tmp.am <- x[,c("govem", "govpart", "hkcorp", "open", "finop",         
        "dccggx", "unemp", "gdpgr", "deca70", "deca80", "deca85",
        "deca90", "year", "count")]
am.imp <- amelia(tmp.am, cs = "count", ts="year", 
        logs = c("dccggx", "unemp"), polytime=3, intercs=F, m=50, p2s=0, 
        idvars=c("deca70", "deca80", "deca85", "deca90"))

Now, we can re-estimate the models. I’m going to use MIcombine for all of them, but you could use zelig or lm.mids, too.

# make the default mice datasets
mi.dats <- lapply(1:50, function(i)complete(mi.imp2, i))
rf.dats <- lapply(1:50, function(i)complete(rf.imp, i))
am.dats <- am.imp$imputations

mi.mods <- MIcombine(lapply(mi.dats, function(x)update(mod, data=x)))
rf.mods <- MIcombine(lapply(rf.dats, function(x)update(mod, data=x)))
am.mods <- MIcombine(lapply(am.dats, function(x)update(mod, data=x)))

If we wanted to do this with the panel corrected standard errors, it’s a touch more complicated

## first, we need to get the coefficients from all of the models
mi.b <- lapply(mi.dats, function(x)update(mod, data=x)$coef)
rf.b <- lapply(rf.dats, function(x)update(mod, data=x)$coef)
am.b <- lapply(am.dats, function(x)update(mod, data=x)$coef)

## then we need to get the PC variane-covariande matrices
mi.v <- lapply(1:50, function(x)pcse(update(mod, data=mi.dats[[x]]),
        mi.dats[[x]]$count, mi.dats[[x]]$year)$vcov)
rf.v <- lapply(1:50, function(x)pcse(update(mod, data=rf.dats[[x]]),
        rf.dats[[x]]$count, rf.dats[[x]]$year)$vcov)
am.v <- lapply(1:50, function(x)pcse(update(mod, data=am.dats[[x]]),
        am.dats[[x]]$count, am.dats[[x]]$year)$vcov)

## finally, we can combine the coefficients and variances using Rubin's rules. 
mi.pmods <- MIcombine(mi.b, mi.v)
rf.pmods <- MIcombine(rf.b, rf.v)
am.pmods <- MIcombine(am.b, am.v)

Next, we can summarize the models and compare to the original results

# summarize the models
summary(mod)
## 
## Call:
## lm(formula = govem ~ govpart * hkcorp + open + finop + dccggx + 
##     unemp + gdpgr + deca70 + deca80 + deca85 + deca90 + count, 
##     data = tmpNA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1283 -0.7209 -0.1021  0.9666  4.1376 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     21.998227   1.449952  15.172  < 2e-16 ***
## govpart          0.046705   0.012530   3.727 0.000227 ***
## hkcorp           0.238680   1.990129   0.120 0.904610    
## open            -0.021053   0.017769  -1.185 0.236935    
## finop            0.215997   0.099397   2.173 0.030482 *  
## dccggx          -0.019114   0.857783  -0.022 0.982236    
## unemp            0.382621   0.049026   7.805 7.86e-14 ***
## gdpgr           -0.008315   0.041912  -0.198 0.842867    
## deca70          -0.874399   0.516221  -1.694 0.091233 .  
## deca80           0.004772   0.470755   0.010 0.991918    
## deca85           0.202725   0.444762   0.456 0.648828    
## deca90          -0.038592   0.444986  -0.087 0.930941    
## count2          -4.724493   1.907703  -2.477 0.013764 *  
## count3          -5.949257   2.080673  -2.859 0.004514 ** 
## count4          -6.933322   0.753173  -9.205  < 2e-16 ***
## count5           2.435253   1.522843   1.599 0.110739    
## count6          -5.455875   1.706360  -3.197 0.001520 ** 
## count7          -5.119669   0.864316  -5.923 7.88e-09 ***
## count8          -5.810753   0.732881  -7.929 3.41e-14 ***
## count9         -11.300278   1.559554  -7.246 3.03e-12 ***
## count10        -11.994008   0.788833 -15.205  < 2e-16 ***
## count11        -17.961347   1.265935 -14.188  < 2e-16 ***
## count12          2.086153   1.913545   1.090 0.276414    
## count13        -11.757483   1.707576  -6.885 2.89e-11 ***
## count14          6.404991   1.737182   3.687 0.000265 ***
## count15        -12.249213   1.403775  -8.726  < 2e-16 ***
## count16        -11.155238   0.667400 -16.714  < 2e-16 ***
## govpart:hkcorp  -0.094657   0.023048  -4.107 5.05e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.547 on 332 degrees of freedom
## Multiple R-squared:  0.9464, Adjusted R-squared:  0.9421 
## F-statistic: 217.3 on 27 and 332 DF,  p-value: < 2.2e-16
summary(mi.mods)
## Multiple imputation results:
##       MIcombine.default(lapply(mi.dats, function(x) update(mod, data = x)))
##                     results         se        (lower       upper) missInfo
## (Intercept)     22.45756839 1.54571064  19.428006821  25.48712995      2 %
## govpart          0.03228210 0.01313136   0.006545098   0.05801910      0 %
## hkcorp           0.40584934 2.13032246  -3.769510101   4.58120878      1 %
## open            -0.02785975 0.01937942  -0.065845739   0.01012624      6 %
## finop            0.12132214 0.10515055  -0.084769332   0.32741362      1 %
## dccggx          -0.07653779 1.03954428  -2.117131330   1.96405576     25 %
## unemp            0.36780414 0.05220229   0.265489034   0.47011925      1 %
## gdpgr           -0.03337330 0.04458582  -0.120759904   0.05401330      0 %
## deca70          -1.18755104 0.55373546  -2.272872445  -0.10222964      3 %
## deca80          -0.13942052 0.50631053  -1.131778947   0.85293791      2 %
## deca85           0.11755473 0.47504671  -0.813519706   1.04862917      0 %
## deca90          -0.33079337 0.47380713  -1.259440229   0.59785350      1 %
## count2          -3.41656108 2.06349472  -7.461127898   0.62800573      4 %
## count3          -4.18277466 2.23450938  -8.562465368   0.19691605      4 %
## count4          -5.39131864 0.79045536  -6.940694533  -3.84194275      5 %
## count5           3.74509781 1.67306882   0.465412900   7.02478272      8 %
## count6          -4.28306465 1.86152303  -7.931911745  -0.63421756      6 %
## count7          -3.82540627 0.94268987  -5.673475022  -1.97733751     10 %
## count8          -4.43075905 0.77054622  -5.941079137  -2.92043896      5 %
## count9          -9.87967607 1.68792111 -13.188142109  -6.57121003      5 %
## count10        -10.68448352 0.81959814 -12.290866776  -9.07810027      0 %
## count11        -17.06131012 1.34660767 -19.700614228 -14.42200602      0 %
## count12          3.24004414 2.08145844  -0.839836812   7.31992509      5 %
## count13         -9.93210083 1.85210033 -13.562529387  -6.30167228      7 %
## count14          7.57114374 1.87252324   3.900951285  11.24133619      4 %
## count15        -11.02900340 1.52444506 -14.017362302  -8.04064450      8 %
## count16         -9.98472365 0.68710521 -11.331426158  -8.63802115      1 %
## govpart:hkcorp  -0.07600218 0.02438433  -0.123794593  -0.02820976      0 %
## make the confidence intervals
ci.lwd <- confint(mod)
ci.mi <- summary(mi.mods)[,3:4]
## Multiple imputation results:
##       MIcombine.default(lapply(mi.dats, function(x) update(mod, data = x)))
##                     results         se        (lower       upper) missInfo
## (Intercept)     22.45756839 1.54571064  19.428006821  25.48712995      2 %
## govpart          0.03228210 0.01313136   0.006545098   0.05801910      0 %
## hkcorp           0.40584934 2.13032246  -3.769510101   4.58120878      1 %
## open            -0.02785975 0.01937942  -0.065845739   0.01012624      6 %
## finop            0.12132214 0.10515055  -0.084769332   0.32741362      1 %
## dccggx          -0.07653779 1.03954428  -2.117131330   1.96405576     25 %
## unemp            0.36780414 0.05220229   0.265489034   0.47011925      1 %
## gdpgr           -0.03337330 0.04458582  -0.120759904   0.05401330      0 %
## deca70          -1.18755104 0.55373546  -2.272872445  -0.10222964      3 %
## deca80          -0.13942052 0.50631053  -1.131778947   0.85293791      2 %
## deca85           0.11755473 0.47504671  -0.813519706   1.04862917      0 %
## deca90          -0.33079337 0.47380713  -1.259440229   0.59785350      1 %
## count2          -3.41656108 2.06349472  -7.461127898   0.62800573      4 %
## count3          -4.18277466 2.23450938  -8.562465368   0.19691605      4 %
## count4          -5.39131864 0.79045536  -6.940694533  -3.84194275      5 %
## count5           3.74509781 1.67306882   0.465412900   7.02478272      8 %
## count6          -4.28306465 1.86152303  -7.931911745  -0.63421756      6 %
## count7          -3.82540627 0.94268987  -5.673475022  -1.97733751     10 %
## count8          -4.43075905 0.77054622  -5.941079137  -2.92043896      5 %
## count9          -9.87967607 1.68792111 -13.188142109  -6.57121003      5 %
## count10        -10.68448352 0.81959814 -12.290866776  -9.07810027      0 %
## count11        -17.06131012 1.34660767 -19.700614228 -14.42200602      0 %
## count12          3.24004414 2.08145844  -0.839836812   7.31992509      5 %
## count13         -9.93210083 1.85210033 -13.562529387  -6.30167228      7 %
## count14          7.57114374 1.87252324   3.900951285  11.24133619      4 %
## count15        -11.02900340 1.52444506 -14.017362302  -8.04064450      8 %
## count16         -9.98472365 0.68710521 -11.331426158  -8.63802115      1 %
## govpart:hkcorp  -0.07600218 0.02438433  -0.123794593  -0.02820976      0 %
##identify the significant results 
sig.lwd <- sign(ci.lwd[,1]) == sign(ci.lwd[,2])
sig.mi <- sign(ci.mi[,1]) == sign(ci.mi[,2])

## put the two significant results descriptions together. 
cbind(sig.lwd, sig.mi)
##                sig.lwd sig.mi
## (Intercept)       TRUE   TRUE
## govpart           TRUE   TRUE
## hkcorp           FALSE  FALSE
## open             FALSE  FALSE
## finop             TRUE  FALSE
## dccggx           FALSE  FALSE
## unemp             TRUE   TRUE
## gdpgr            FALSE  FALSE
## deca70           FALSE   TRUE
## deca80           FALSE  FALSE
## deca85           FALSE  FALSE
## deca90           FALSE  FALSE
## count2            TRUE  FALSE
## count3            TRUE  FALSE
## count4            TRUE   TRUE
## count5           FALSE   TRUE
## count6            TRUE   TRUE
## count7            TRUE   TRUE
## count8            TRUE   TRUE
## count9            TRUE   TRUE
## count10           TRUE   TRUE
## count11           TRUE   TRUE
## count12          FALSE  FALSE
## count13           TRUE   TRUE
## count14           TRUE   TRUE
## count15           TRUE   TRUE
## count16           TRUE   TRUE
## govpart:hkcorp    TRUE   TRUE

If we were going to do this with the panel corrected standard errors:

library(lmtest)
ci.lwd <- coefci(mod, vcov.=pmod$vcov)
ci.mi <- summary(mi.pmods)[,3:4]
## Multiple imputation results:
##       MIcombine.default(mi.b, mi.v)
##                     results         se        (lower        upper)
## (Intercept)     22.45756839 1.61934257  19.283694056  25.631442715
## govpart          0.03228210 0.01335299   0.006110715   0.058453486
## hkcorp           0.40584934 2.40589853  -4.309628015   5.121326696
## open            -0.02785975 0.01351665  -0.054360804  -0.001358697
## finop            0.12132214 0.10626639  -0.086956344   0.329600628
## dccggx          -0.07653779 0.73919135  -1.534053954   1.380978383
## unemp            0.36780414 0.06055361   0.249120930   0.486487349
## gdpgr           -0.03337330 0.04350026  -0.118632241   0.051885641
## deca70          -1.18755104 0.47201036  -2.112706374  -0.262395707
## deca80          -0.13942052 0.38991479  -0.903657022   0.624815986
## deca85           0.11755473 0.36379390  -0.595468213   0.830577677
## deca90          -0.33079337 0.37225366  -1.060401161   0.398814431
## count2          -3.41656108 2.25230787  -7.831150633   0.998028469
## count3          -4.18277466 1.91622019  -7.938707810  -0.426841503
## count4          -5.39131864 0.78355129  -6.927165796  -3.855471491
## count5           3.74509781 1.75362815   0.307589296   7.182606319
## count6          -4.28306465 2.09399546  -8.387451458  -0.178677848
## count7          -3.82540627 1.07500332  -5.932664372  -1.718148163
## count8          -4.43075905 0.52905508  -5.467926649  -3.393591445
## count9          -9.87967607 1.70601617 -13.223601429  -6.535750711
## count10        -10.68448352 0.87836058 -12.406038966  -8.962928081
## count11        -17.06131012 1.50058975 -20.002413137 -14.120207108
## count12          3.24004414 2.36556510  -1.396580834   7.876669111
## count13         -9.93210083 1.55934461 -12.988994574  -6.875207095
## count14          7.57114374 2.08928326   3.476141477  11.666145998
## count15        -11.02900340 1.43695912 -13.845990325  -8.212016474
## count16         -9.98472365 0.75979005 -11.473885552  -8.495561756
## govpart:hkcorp  -0.07600218 0.02528089  -0.125551819  -0.026452533
##                missInfo
## (Intercept)         2 %
## govpart             0 %
## hkcorp              0 %
## open               12 %
## finop               1 %
## dccggx             50 %
## unemp               1 %
## gdpgr               0 %
## deca70              4 %
## deca80              3 %
## deca85              0 %
## deca90              1 %
## count2              4 %
## count3              5 %
## count4              6 %
## count5              7 %
## count6              5 %
## count7              7 %
## count8             10 %
## count9              5 %
## count10             0 %
## count11             0 %
## count12             4 %
## count13             9 %
## count14             3 %
## count15             9 %
## count16             0 %
## govpart:hkcorp      0 %
##identify the significant results 
sig.lwd <- sign(ci.lwd[,1]) == sign(ci.lwd[,2])
sig.mi <- sign(ci.mi[,1]) == sign(ci.mi[,2])

## put the two significant results descriptions together. 
out <- cbind(sig.lwd, sig.mi[-c(1,28)])
rownames(out) <- names(sig.lwd)
out
##         sig.lwd      
## govpart    TRUE  TRUE
## hkcorp    FALSE FALSE
## open       TRUE  TRUE
## finop      TRUE FALSE
## dccggx    FALSE FALSE
## unemp      TRUE  TRUE
## gdpgr     FALSE FALSE
## deca70     TRUE  TRUE
## deca80    FALSE FALSE
## deca85    FALSE FALSE
## deca90    FALSE FALSE
## count2     TRUE FALSE
## count3     TRUE  TRUE
## count4     TRUE  TRUE
## count5    FALSE  TRUE
## count6     TRUE  TRUE
## count7     TRUE  TRUE
## count8     TRUE  TRUE
## count9     TRUE  TRUE
## count10    TRUE  TRUE
## count11    TRUE  TRUE
## count12   FALSE FALSE
## count13    TRUE  TRUE
## count14    TRUE  TRUE
## count15    TRUE  TRUE
## count16    TRUE  TRUE

Here, the coefficients of open and finop both become insignificant in the imputed data model. We could look at the same thing for the other two models. Here, I’ll just use the PCSE models.

ci.rf <- summary(rf.pmods)[,3:4]
## Multiple imputation results:
##       MIcombine.default(rf.b, rf.v)
##                     results         se        (lower        upper)
## (Intercept)     22.55512433 1.62711883  19.365961942  25.744286725
## govpart          0.03268335 0.01354147   0.006142508   0.059224188
## hkcorp           0.54387074 2.42080698  -4.200826008   5.288567494
## open            -0.02387748 0.01369089  -0.050718679   0.002963721
## finop            0.12247198 0.10809579  -0.089392465   0.334336430
## dccggx          -0.51577878 0.77533562  -2.043135917   1.011578363
## unemp            0.37241432 0.06094505   0.252963770   0.491864873
## gdpgr           -0.03378632 0.04368057  -0.119398671   0.051826036
## deca70          -1.25758341 0.47195337  -2.182638183  -0.332528635
## deca80          -0.19427476 0.39140590  -0.961435227   0.572885712
## deca85           0.11686024 0.36620606  -0.600890462   0.834610950
## deca90          -0.29037389 0.37352715  -1.022479117   0.441731335
## count2          -3.79159259 2.27759585  -8.255734285   0.672549105
## count3          -4.54980194 1.95553809  -8.382795829  -0.716808057
## count4          -5.55753721 0.80562760  -7.136680179  -3.978394247
## count5           3.33004253 1.78619259  -0.171266844   6.831351894
## count6          -4.68060333 2.12398715  -8.843758485  -0.517448166
## count7          -4.08482476 1.09994268  -6.240984567  -1.928664957
## count8          -4.57782407 0.55176940  -5.659548711  -3.496099433
## count9         -10.20908865 1.74107549 -13.621719276  -6.796458030
## count10        -10.70265338 0.89074978 -12.448492111  -8.956814649
## count11        -17.14594141 1.51221611 -20.109832123 -14.182050705
## count12          2.81970580 2.38368170  -1.852412215   7.491823818
## count13        -10.34482258 1.60173306 -13.484737828  -7.204907329
## count14          7.26421949 2.11158191   3.125517260  11.402921728
## count15        -11.40950914 1.46027872 -14.272184923  -8.546833355
## count16        -10.00940791 0.77447486 -11.527355606  -8.491460213
## govpart:hkcorp  -0.07577052 0.02537913  -0.125512742  -0.026028306
##                missInfo
## (Intercept)         3 %
## govpart             1 %
## hkcorp              0 %
## open               11 %
## finop               1 %
## dccggx             46 %
## unemp               1 %
## gdpgr               0 %
## deca70              4 %
## deca80              3 %
## deca85              0 %
## deca90              2 %
## count2              4 %
## count3              5 %
## count4              6 %
## count5              7 %
## count6              5 %
## count7              8 %
## count8             10 %
## count9              5 %
## count10             1 %
## count11             0 %
## count12             4 %
## count13             9 %
## count14             3 %
## count15             9 %
## count16             1 %
## govpart:hkcorp      1 %
ci.am <- summary(am.pmods)[,3:4]
## Multiple imputation results:
##       MIcombine.default(am.b, am.v)
##                    results          se      (lower      upper) missInfo
## (Intercept)    36.36639556 3.428704430 29.64623175 43.08655937      1 %
## govpart        -0.09135295 0.030688913 -0.15150261 -0.03120329      2 %
## hkcorp          0.10918964 0.891579497 -1.63829163  1.85667092      2 %
## open            0.02581194 0.005009416  0.01599247  0.03563142      7 %
## finop          -1.25552436 0.257208591 -1.75964422 -0.75140450      0 %
## dccggx         -6.89735924 0.851255590 -8.56670489 -5.22801359     15 %
## unemp           0.58444326 0.063871832  0.45925385  0.70963267      3 %
## gdpgr          -0.14084036 0.144109469 -0.42328973  0.14160902      0 %
## deca70         -5.48061161 0.919559137 -7.28292048 -3.67830274      1 %
## deca80         -3.53968864 0.845824622 -5.19747552 -1.88190175      1 %
## deca85         -1.53130767 0.752320724 -3.00582923 -0.05678612      0 %
## deca90          0.47801675 0.708648656 -0.91091944  1.86695295      2 %
## count          -0.27723414 0.029614499 -0.33527965 -0.21918863      4 %
## govpart:hkcorp  0.19579678 0.053592265  0.09075768  0.30083588      1 %
sig.rf <- sign(ci.rf[,1]) == sign(ci.rf[,2])
sig.am <- sign(ci.am[,1]) == sign(ci.am[,2])

cbind(out, sig.rf[-c(1,28)], sig.am[-c(1,28)])
##         sig.lwd                  
## govpart    TRUE  TRUE  TRUE  TRUE
## hkcorp    FALSE FALSE FALSE FALSE
## open       TRUE  TRUE FALSE  TRUE
## finop      TRUE FALSE FALSE  TRUE
## dccggx    FALSE FALSE FALSE  TRUE
## unemp      TRUE  TRUE  TRUE  TRUE
## gdpgr     FALSE FALSE FALSE FALSE
## deca70     TRUE  TRUE  TRUE  TRUE
## deca80    FALSE FALSE FALSE  TRUE
## deca85    FALSE FALSE FALSE  TRUE
## deca90    FALSE FALSE FALSE FALSE
## count2     TRUE FALSE FALSE  TRUE
## count3     TRUE  TRUE  TRUE  TRUE
## count4     TRUE  TRUE  TRUE  TRUE
## count5    FALSE  TRUE FALSE FALSE
## count6     TRUE  TRUE  TRUE  TRUE
## count7     TRUE  TRUE  TRUE  TRUE
## count8     TRUE  TRUE  TRUE  TRUE
## count9     TRUE  TRUE  TRUE  TRUE
## count10    TRUE  TRUE  TRUE FALSE
## count11    TRUE  TRUE  TRUE  TRUE
## count12   FALSE FALSE FALSE  TRUE
## count13    TRUE  TRUE  TRUE  TRUE
## count14    TRUE  TRUE  TRUE FALSE
## count15    TRUE  TRUE  TRUE  TRUE
## count16    TRUE  TRUE  TRUE  TRUE

Notice that the largest set of substantive variables is statistically significant using the amelia imputations.

  1. Do the posterior predictive checks to check the quality of the imputations using the original mice method (with the default imputation methods) and with amelia three methods. Is there any difference in what you find across the methods? Pick one missing data point from each variable with missing data and plot out the posterior distribution of the missing observations for the three different methods.

First, we can do this for mice. There’s an easier way of doing this than the way we talked about in class, at least with the mice object:

## identify the variables with missing observations
missvars <- c("govem", "dccggx", "unemp")

# make the padded out data frame: 
w <- is.na(tmp[,c("govem", "govpart", "hkcorp", 
"govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp", "logdc",
"deca70", "deca80", "deca85", "deca90", "year", "count")])
w[,missvars] <- TRUE

mi.ppd <- update(mi.imp2, where=w, m=500, printFlag=F)
mi.comp <- lapply(1:500, function(x)complete(mi.ppd, x))

Now, we can make the ppds for amelia

newdat <- tmp[,c("govem", "govpart", "hkcorp", 
"govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp", "logdc",
"deca70", "deca80", "deca85", "deca90", "year", "count")]
for(i in 1:length(missvars)){
        tmp.dat <- tmp[,c("govem", "govpart", "hkcorp", 
                "govpxcor", "open", "finop", "dccggx", "unemp", "gdpgr", "logunemp", "logdc",
                "deca70", "deca80", "deca85", "deca90", "year", "count")]
        tmp.dat[[missvars[i]]] <- NA
        newdat <- rbind(newdat, tmp.dat)
}
am.ppd <- amelia(newdat, cs = "count", ts="year", 
        logs = c("dccggx", "unemp"), polytime=3, intercs=F, m=500, p2s=0, 
        idvars=c("deca70", "deca80", "deca85", "deca90"))
## Amelia Error Code:  37 
##  The following variable(s) are 'factors': 
## year
## You may have wanted to set this as a ID variable to remove it
## from the imputation model or as an ordinal or nominal
## variable to be imputed.  Please set it as either and
## try again.
d1 <- 1:368
d2 <- 369:736
d3 <- 737:1104
d4 <- 1105:1472

am.comp <- list()
for(i in 1:500){
        tmp.comp <- am.ppd$imputations[[i]][d1, ]
        tmp.comp[[missvars[1]]] <- am.ppd$imputations[[i]][d2, missvars[1]]
        tmp.comp[[missvars[2]]] <- am.ppd$imputations[[i]][d3, missvars[2]]
        tmp.comp[[missvars[3]]] <- am.ppd$imputations[[i]][d4, missvars[3]]
        am.comp[[i]] <- tmp.comp   
}

Next, we can estimate the models.

mi.orig <- update(mi.imp2, m=500)
mi.orig.comp <- lapply(1:500, function(i)complete(mi.orig, i))
l1 <- lapply(mi.orig.comp, function(x)update(mod, data=x))
l2 <- lapply(mi.comp, function(x)update(mod, data=x))
## save the coefficients from each model
b1 <- sapply(l1, coef)
b2 <- sapply(l2, coef)
## fin pval as the proportion of times that b1 > b2
diff <- t(b1) - t(b2)
pval <- apply(diff, 2, function(x)mean(x > 0))
## replace one-sided p-value with two-sided p-value
pval <- 2*ifelse(pval > .5, 1-pval, pval)
pval
##    (Intercept)        govpart         hkcorp           open          finop 
##          0.420          0.748          0.752          0.604          0.804 
##         dccggx          unemp          gdpgr         deca70         deca80 
##          0.344          0.760          0.876          0.392          0.388 
##         deca85         deca90         count2         count3         count4 
##          0.300          0.920          0.928          0.884          0.520 
##         count5         count6         count7         count8         count9 
##          0.932          0.964          0.660          0.520          0.936 
##        count10        count11        count12        count13        count14 
##          0.944          0.936          0.808          0.840          0.980 
##        count15        count16 govpart:hkcorp 
##          0.756          0.780          0.792

Next, we could do the omnibus test

coef <- colMeans(diff)[c(1:8,28)]
v <- var(diff[,c(1:8,28)])
library(car)
linearHypothesis(model=list(df.residual=NULL),
  hypothesis.matrix=diag(length(coef)),
  vcov.=v, coef.=coef, rhs=rep(0, length(coef)))
## Linear hypothesis test
## 
## Hypothesis:
## (Intercept) = 0
## govpart = 0
## hkcorp = 0
## open = 0
## finop = 0
## dccggx = 0
## unemp = 0
## gdpgr = 0
## govpart:hkcorp = 0
## 
## Model 1: restricted model
## Model 2: list(df.residual = NULL)
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Df  Chisq Pr(>Chisq)
## 1                     
## 2  9 2.6501     0.9766

Now, we could do the same thing for amelia

am.orig.comp <- lapply(1:500, function(i)am.ppd$imputations[[i]][d1, ])
l1 <- lapply(am.orig.comp, function(x)update(mod, data=x))
## Error in eval(predvars, data, env): object 'govem' not found
l2 <- lapply(am.comp, function(x)update(mod, data=x))
## save the coefficients from each model
b1 <- sapply(l1, coef)
b2 <- sapply(l2, coef)
## fin pval as the proportion of times that b1 > b2
diff <- t(b1) - t(b2)
## Error in t(b1) - t(b2): non-numeric argument to binary operator
pval <- apply(diff, 2, function(x)mean(x > 0))
## replace one-sided p-value with two-sided p-value
pval <- 2*ifelse(pval > .5, 1-pval, pval)
pval
##    (Intercept)        govpart         hkcorp           open          finop 
##          0.420          0.748          0.752          0.604          0.804 
##         dccggx          unemp          gdpgr         deca70         deca80 
##          0.344          0.760          0.876          0.392          0.388 
##         deca85         deca90         count2         count3         count4 
##          0.300          0.920          0.928          0.884          0.520 
##         count5         count6         count7         count8         count9 
##          0.932          0.964          0.660          0.520          0.936 
##        count10        count11        count12        count13        count14 
##          0.944          0.936          0.808          0.840          0.980 
##        count15        count16 govpart:hkcorp 
##          0.756          0.780          0.792

Notice, we do quite a bit better here - the quality of these imputations is higher.

coef <- colMeans(diff)[c(1:8,28)]
v <- var(diff[,c(1:8,28)])
linearHypothesis(model=list(df.residual=NULL),
  hypothesis.matrix=diag(length(coef)),
  vcov.=v, coef.=coef, rhs=rep(0, length(coef)))
## Linear hypothesis test
## 
## Hypothesis:
## (Intercept) = 0
## govpart = 0
## hkcorp = 0
## open = 0
## finop = 0
## dccggx = 0
## unemp = 0
## gdpgr = 0
## govpart:hkcorp = 0
## 
## Model 1: restricted model
## Model 2: list(df.residual = NULL)
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Df  Chisq Pr(>Chisq)
## 1                     
## 2  9 2.6501     0.9766

The omnibus result, though is the same. This would probably suggest trying to fit a better imputation model. The rf model might work, but it will take a pretty long time to run. If you want, try this at home when you’ve got some time to let your computer run.

ANES 2014

Using lab3b_data.dta, do multiple imputation (with 5 imputations) on all of the variables in the dataset. You can download the data with:

library(haven)
dat <- read_dta('http://www.quantoid.net/files/reg3/lab3b_data.dta')
searchVarLabels(dat, "")
##                   ind
## indsocial           1
## indspend            2
## dem_edugroup        3
## dem_agegrp          4
## gender_respondent   5
## libcpre_self        6
## inc_incgroup_pre    7
## relig_chmember      8
##                                                                                         label
## indsocial           Index of social liberalism-conservatism (higher values more conservative)
## indspend          Index of economic liberalism-conservatism (higher values more conservative)
## dem_edugroup                                                    Education group of respondent
## dem_agegrp                                                            Age group of respondent
## gender_respondent                                                         Respondent's Gender
## libcpre_self                                                                     libcpre_self
## inc_incgroup_pre                                           Income group (pre-election survey)
## relig_chmember                                Member of a religious congregation/denomination
dat$dem_edugroup <- as_factor(dat$dem_edugroup)
levels(dat$dem_edugroup) <- c("ltHS", "HS", "Some Coll", "BA", "Grad Deg")
dat$dem_agegrp <- as_factor(dat$dem_agegrp)
dat$dem_agegrp <- car:::recode(dat$dem_agegrp, 
        'c("17-20", "21-24", "25-29") = "17-30"; c("30-34", "35-39", "40-44", "45-49") = "30-49"; 
         c("50-54", "55-59", "60-64", "65-69") = "50-69"; c("70-74", "Over 75") = "70+"', 
        as.factor = TRUE)
dat$gender_respondent <- as_factor(dat$gender_respondent)
dat <- as.data.frame(dat)
  1. See how the coefficients change in a model of on all the other variables in the dataset from listwise deletion to multiple imputation. Try mice, mice with random forest imputation and amelia each with 5 imputations. Are there differences between the models?
mod <- lm(libcpre_self ~ ., data=dat)
summary(mod)
## 
## Call:
## lm(formula = libcpre_self ~ ., data = dat)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -4.8691 -0.7613  0.0114  0.7620  4.7738 
## 
## Labels:
##  value                 label
##      1            ex liberal
##      2               liberal
##      3      slightly liberal
##      4              moderate
##      5 slightly conservative
##      6          conservative
##      7       ex conservative
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.5559259  0.1583679  28.768  < 2e-16 ***
## indsocial              1.0665836  0.0463205  23.026  < 2e-16 ***
## indspend               0.8219730  0.1041089   7.895 4.74e-15 ***
## dem_edugroupHS         0.0789380  0.1007610   0.783 0.433475    
## dem_edugroupSome Coll  0.0503617  0.0992494   0.507 0.611912    
## dem_edugroupBA        -0.0302927  0.1109651  -0.273 0.784887    
## dem_edugroupGrad Deg  -0.2344944  0.1193196  -1.965 0.049522 *  
## dem_agegrp30-49        0.1565359  0.0797136   1.964 0.049700 *  
## dem_agegrp50-69        0.2867809  0.0768760   3.730 0.000196 ***
## dem_agegrp70+          0.4007806  0.1062326   3.773 0.000166 ***
## gender_respondent3    -0.1360355  0.0520209  -2.615 0.008990 ** 
## inc_incgroup_pre       0.0009367  0.0034512   0.271 0.786098    
## relig_chmember        -0.2389988  0.0526998  -4.535 6.10e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.135 on 1994 degrees of freedom
##   (3909 observations deleted due to missingness)
## Multiple R-squared:  0.3513, Adjusted R-squared:  0.3474 
## F-statistic: 89.97 on 12 and 1994 DF,  p-value: < 2.2e-16

Now, we can do the imputations

nes.mi <- mice(dat, m=5, printFlag=F)
## Warning: Number of logged events: 175
meths <- nes.mi$method
meths <- gsub("[a-z]*", "rf", meths)
nes.rf <- mice(dat, m=5, method=meths, printFlag=F)

Next, we can do amelia

nes.am <- amelia(dat, m=5, noms=c("dem_edugroup", "dem_agegrp", "gender_respondent", "relig_chmember"))
nes.mi.dats <- lapply(1:5, function(i)complete(nes.mi, i))
nes.rf.dats <- lapply(1:5, function(i)complete(nes.rf, i))
nes.am.dats <- nes.am$imputations

nes.mi.mods <- MIcombine(lapply(nes.mi.dats, function(x)update(mod, data=x)))
nes.rf.mods <- MIcombine(lapply(nes.rf.dats, function(x)update(mod, data=x)))
nes.am.mods <- MIcombine(lapply(nes.am.dats, function(x)update(mod, data=x)))

summary(nes.mi.mods)
## Multiple imputation results:
##       MIcombine.default(lapply(nes.mi.dats, function(x) update(mod, 
##     data = x)))
##                            results          se       (lower       upper)
## (Intercept)            4.476488151 0.153786682  4.135676792  4.817299511
## indsocial              1.018084613 0.027768925  0.963209026  1.072960200
## indspend               0.835239750 0.063007374  0.711504886  0.958974613
## dem_edugroupHS         0.154267321 0.064955362  0.025333865  0.283200777
## dem_edugroupSome Coll  0.226252140 0.063547646  0.100046033  0.352458248
## dem_edugroupBA         0.219961128 0.080848033  0.054653387  0.385268869
## dem_edugroupGrad Deg  -0.056836794 0.078740418 -0.212749651  0.099076062
## dem_agegrp30-49        0.177983590 0.062185655  0.050525790  0.305441390
## dem_agegrp50-69        0.272021943 0.059727734  0.149768489  0.394275397
## dem_agegrp70+          0.309883486 0.067836466  0.175597639  0.444169332
## gender_respondent3    -0.071787623 0.033538213 -0.137710556 -0.005864690
## inc_incgroup_pre       0.001328402 0.002502264 -0.003658249  0.006315052
## relig_chmember        -0.211864312 0.053930061 -0.332501870 -0.091226754
##                       missInfo
## (Intercept)               68 %
## indsocial                 18 %
## indspend                   8 %
## dem_edugroupHS            22 %
## dem_edugroupSome Coll     22 %
## dem_edugroupBA            41 %
## dem_edugroupGrad Deg      20 %
## dem_agegrp30-49           42 %
## dem_agegrp50-69           41 %
## dem_agegrp70+             19 %
## gender_respondent3        10 %
## inc_incgroup_pre          25 %
## relig_chmember            70 %
summary(nes.rf.mods)
## Multiple imputation results:
##       MIcombine.default(lapply(nes.rf.dats, function(x) update(mod, 
##     data = x)))
##                            results          se       (lower      upper)
## (Intercept)            4.419201308 0.149555613  4.089746973 4.748655643
## indsocial              0.955566023 0.027057161  0.902351769 1.008780277
## indspend               0.766254857 0.064762962  0.638795109 0.893714604
## dem_edugroupHS         0.119456480 0.064959336 -0.009231225 0.248144184
## dem_edugroupSome Coll  0.173367971 0.066451081  0.040375778 0.306360164
## dem_edugroupBA         0.148272323 0.072370921  0.004645913 0.291898734
## dem_edugroupGrad Deg  -0.138182158 0.082555452 -0.303039173 0.026674858
## dem_agegrp30-49        0.157002599 0.053358372  0.051779053 0.262226145
## dem_agegrp50-69        0.242098388 0.054821695  0.132586805 0.351609971
## dem_agegrp70+          0.290117904 0.068432180  0.154536719 0.425699089
## gender_respondent3    -0.035480157 0.034063542 -0.102531245 0.031570930
## inc_incgroup_pre       0.001159069 0.002716066 -0.004372040 0.006690177
## relig_chmember        -0.131327074 0.064862031 -0.284506896 0.021852749
##                       missInfo
## (Intercept)               66 %
## indsocial                 11 %
## indspend                  12 %
## dem_edugroupHS            20 %
## dem_edugroupSome Coll     29 %
## dem_edugroupBA            22 %
## dem_edugroupGrad Deg      27 %
## dem_agegrp30-49           15 %
## dem_agegrp50-69           27 %
## dem_agegrp70+             20 %
## gender_respondent3        13 %
## inc_incgroup_pre          39 %
## relig_chmember            80 %
summary(nes.am.mods)
## Multiple imputation results:
##       MIcombine.default(lapply(nes.am.dats, function(x) update(mod, 
##     data = x)))
##                           results          se       (lower      upper)
## (Intercept)            4.43989793 0.136654033  4.145925781 4.733870082
## indsocial              1.01875665 0.026988829  0.965603446 1.071909861
## indspend               0.83001335 0.069207972  0.691818911 0.968207783
## dem_edugroupHS         0.14265378 0.073133182 -0.007106265 0.292413827
## dem_edugroupSome Coll  0.21334760 0.077760351  0.049843393 0.376851814
## dem_edugroupBA         0.20984383 0.081835736  0.041686163 0.378001506
## dem_edugroupGrad Deg  -0.06787562 0.087081292 -0.244728675 0.108977436
## dem_agegrp30-49        0.16693824 0.056292182  0.054410531 0.279465944
## dem_agegrp50-69        0.27262715 0.049318996  0.175675040 0.369579267
## dem_agegrp70+          0.32197343 0.061943746  0.200515046 0.443431815
## gender_respondent3    -0.05637915 0.035159033 -0.126082669 0.013324366
## inc_incgroup_pre       0.00198976 0.002297375 -0.002528912 0.006508431
## relig_chmember        -0.19424941 0.086571314 -0.412583695 0.024084873
##                       missInfo
## (Intercept)               60 %
## indsocial                 13 %
## indspend                  27 %
## dem_edugroupHS            42 %
## dem_edugroupSome Coll     52 %
## dem_edugroupBA            43 %
## dem_edugroupGrad Deg      37 %
## dem_agegrp30-49           28 %
## dem_agegrp50-69           10 %
## dem_agegrp70+              4 %
## gender_respondent3        21 %
## inc_incgroup_pre          11 %
## relig_chmember            90 %
mi.mi <- as.numeric(gsub(" %", "", summary(nes.mi.mods)[,5]))
## Multiple imputation results:
##       MIcombine.default(lapply(nes.mi.dats, function(x) update(mod, 
##     data = x)))
##                            results          se       (lower       upper)
## (Intercept)            4.476488151 0.153786682  4.135676792  4.817299511
## indsocial              1.018084613 0.027768925  0.963209026  1.072960200
## indspend               0.835239750 0.063007374  0.711504886  0.958974613
## dem_edugroupHS         0.154267321 0.064955362  0.025333865  0.283200777
## dem_edugroupSome Coll  0.226252140 0.063547646  0.100046033  0.352458248
## dem_edugroupBA         0.219961128 0.080848033  0.054653387  0.385268869
## dem_edugroupGrad Deg  -0.056836794 0.078740418 -0.212749651  0.099076062
## dem_agegrp30-49        0.177983590 0.062185655  0.050525790  0.305441390
## dem_agegrp50-69        0.272021943 0.059727734  0.149768489  0.394275397
## dem_agegrp70+          0.309883486 0.067836466  0.175597639  0.444169332
## gender_respondent3    -0.071787623 0.033538213 -0.137710556 -0.005864690
## inc_incgroup_pre       0.001328402 0.002502264 -0.003658249  0.006315052
## relig_chmember        -0.211864312 0.053930061 -0.332501870 -0.091226754
##                       missInfo
## (Intercept)               68 %
## indsocial                 18 %
## indspend                   8 %
## dem_edugroupHS            22 %
## dem_edugroupSome Coll     22 %
## dem_edugroupBA            41 %
## dem_edugroupGrad Deg      20 %
## dem_agegrp30-49           42 %
## dem_agegrp50-69           41 %
## dem_agegrp70+             19 %
## gender_respondent3        10 %
## inc_incgroup_pre          25 %
## relig_chmember            70 %
rf.mi <- as.numeric(gsub(" %", "", summary(nes.rf.mods)[,5]))
## Multiple imputation results:
##       MIcombine.default(lapply(nes.rf.dats, function(x) update(mod, 
##     data = x)))
##                            results          se       (lower      upper)
## (Intercept)            4.419201308 0.149555613  4.089746973 4.748655643
## indsocial              0.955566023 0.027057161  0.902351769 1.008780277
## indspend               0.766254857 0.064762962  0.638795109 0.893714604
## dem_edugroupHS         0.119456480 0.064959336 -0.009231225 0.248144184
## dem_edugroupSome Coll  0.173367971 0.066451081  0.040375778 0.306360164
## dem_edugroupBA         0.148272323 0.072370921  0.004645913 0.291898734
## dem_edugroupGrad Deg  -0.138182158 0.082555452 -0.303039173 0.026674858
## dem_agegrp30-49        0.157002599 0.053358372  0.051779053 0.262226145
## dem_agegrp50-69        0.242098388 0.054821695  0.132586805 0.351609971
## dem_agegrp70+          0.290117904 0.068432180  0.154536719 0.425699089
## gender_respondent3    -0.035480157 0.034063542 -0.102531245 0.031570930
## inc_incgroup_pre       0.001159069 0.002716066 -0.004372040 0.006690177
## relig_chmember        -0.131327074 0.064862031 -0.284506896 0.021852749
##                       missInfo
## (Intercept)               66 %
## indsocial                 11 %
## indspend                  12 %
## dem_edugroupHS            20 %
## dem_edugroupSome Coll     29 %
## dem_edugroupBA            22 %
## dem_edugroupGrad Deg      27 %
## dem_agegrp30-49           15 %
## dem_agegrp50-69           27 %
## dem_agegrp70+             20 %
## gender_respondent3        13 %
## inc_incgroup_pre          39 %
## relig_chmember            80 %
am.mi <- as.numeric(gsub(" %", "", summary(nes.am.mods)[,5]))
## Multiple imputation results:
##       MIcombine.default(lapply(nes.am.dats, function(x) update(mod, 
##     data = x)))
##                           results          se       (lower      upper)
## (Intercept)            4.43989793 0.136654033  4.145925781 4.733870082
## indsocial              1.01875665 0.026988829  0.965603446 1.071909861
## indspend               0.83001335 0.069207972  0.691818911 0.968207783
## dem_edugroupHS         0.14265378 0.073133182 -0.007106265 0.292413827
## dem_edugroupSome Coll  0.21334760 0.077760351  0.049843393 0.376851814
## dem_edugroupBA         0.20984383 0.081835736  0.041686163 0.378001506
## dem_edugroupGrad Deg  -0.06787562 0.087081292 -0.244728675 0.108977436
## dem_agegrp30-49        0.16693824 0.056292182  0.054410531 0.279465944
## dem_agegrp50-69        0.27262715 0.049318996  0.175675040 0.369579267
## dem_agegrp70+          0.32197343 0.061943746  0.200515046 0.443431815
## gender_respondent3    -0.05637915 0.035159033 -0.126082669 0.013324366
## inc_incgroup_pre       0.00198976 0.002297375 -0.002528912 0.006508431
## relig_chmember        -0.19424941 0.086571314 -0.412583695 0.024084873
##                       missInfo
## (Intercept)               60 %
## indsocial                 13 %
## indspend                  27 %
## dem_edugroupHS            42 %
## dem_edugroupSome Coll     52 %
## dem_edugroupBA            43 %
## dem_edugroupGrad Deg      37 %
## dem_agegrp30-49           28 %
## dem_agegrp50-69           10 %
## dem_agegrp70+              4 %
## gender_respondent3        21 %
## inc_incgroup_pre          11 %
## relig_chmember            90 %
# show the missing info for the mi and rf models relative to amelia
mi.mi/am.mi
##  [1] 1.1333333 1.3846154 0.2962963 0.5238095 0.4230769 0.9534884 0.5405405
##  [8] 1.5000000 4.1000000 4.7500000 0.4761905 2.2727273 0.7777778
rf.mi/am.mi
##  [1] 1.1000000 0.8461538 0.4444444 0.4761905 0.5576923 0.5116279 0.7297297
##  [8] 0.5357143 2.7000000 5.0000000 0.6190476 3.5454545 0.8888889
mean(mi.mi/am.mi)
## [1] 1.471681
mean(rf.mi/am.mi)
## [1] 1.38115

This suggests that with either of the mice impuation methods, the amelia imputations generate lower amounts of missing information. If you looked at mi.mi/rf.mi you would find that the average is in favor of the default mice model.

  1. How do the results change if you have use 25 imputations instead of 5?

Repeat above with m=25.

Finite Mixture Models

Using the data above, estimate a finite mixture model where you assume that indsocial and indspend operationalize the two different theories (don’t use relig_chmember and inc_incgroup_pre in the models, yet.) Evaluate the resulting model against the linear model where there is an interaction between indsocial and indspend including the other controls.

  1. Run an OLS regression of libcpre_self_num on indsocial (a composite of social policy attitudes) and indspend (a composite of spending policy attitudes), their interaction and the controls gender_respondent, dem_edugroup and dem_agegrp_num.
library(flexmix)
dat <- read.dta('http://www.quantoid.net/files/reg3/lab3_data.dta')
dat$libcpre_self_num <- as.numeric(dat$libcpre_self)
dat$dem_agegrp_num <- as.numeric(dat$dem_agegrp)
mod <- lm(libcpre_self_num ~ indsocial*indspend + gender_respondent + dem_edugroup + dem_agegrp_num, data=dat)
DAintfun2(mod, c("indsocial", "indspend"), hist=T, scale.hist=.3)

plot of chunk unnamed-chunk-24

  1. Estimate a finite mixture model where indsocial is the variable of interest in one component and indspend is the variable of interest in the other. That is set the indspend coefficient to zero in the model with indsocial and indsocial’s coefficient to zero in the model with indspend. Fix the coefficients on the other regressors to be constant across the two components.
model <- FLXMRglmfix(family = "gaussian", nested=list(k=c(1,1), 
    formula = c(~indsocial,
         ~indspend)), fixed= ~ dem_agegrp_num + gender_respondent + dem_edugroup)
out.a <- stepFlexmix(libcpre_self_num ~ 1 , k=2, model=model, data=dat, nrep=20)
## 2 : * * * * * * * * * * * * * * * * * * * *
mod.refit <- refit(out.a)
  1. How do the two different models fit? Which do you think it better?
fit1 <- predict(out.a)$Comp.1[[1]]
fit2 <- predict(out.a)$Comp.2[[1]]
post <- out.a@posterior$scaled
predfix1 <- rowSums(cbind(fit1, fit2)*post)
## Correlations of fitted values and observed values from the mixture model
cor(predfix1, dat$libcpre_self)^2
## [1] 0.6004018
## Correlation using only the fits from best predicted theory
predfix2 <- fit1
predfix2[which(post[,2] > .5)] <- fit2[which(post[,2] > .5)]
cor(predfix2, dat$libcpre_self)^2
##           [,1]
## [1,] 0.5591659
## correlations of fitted values and observed from the linear model 
cor(mod$fitted, dat$libcpre_self)^2
## [1] 0.2213779
  1. Consider that income (an economic predictor operationalized by inc_incgroup_pre) and church membership (a social predictor operationalized by relig_chmember) might tell us something about the probabilities of being in one or the other group. Incorporate that information and see whether, in fact, they do provide information about group membership.
out.b <- stepFlexmix(libcpre_self ~ 1 , k=2, model=model, data=dat, nrep=20, 
    concomitant = FLXPmultinom( ~ inc_incgroup_pre + relig_chmember))
## 2 : * * * * * * * * * * * * * * * * * * * *
out.b.refit <- refit(out.b)
out.b.refit@concomitant
## $Comp.2
##                      Estimate Std. Error z value  Pr(>|z|)    
## (Intercept)         -0.558476   0.233169 -2.3952   0.01661 *  
## inc_incgroup_pre    -0.052821   0.010526 -5.0183 5.214e-07 ***
## relig_chmember2. No  0.298105   0.164661  1.8104   0.07023 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1