ICPSR Summer Program Teaching

Regression III

The Regression III course takes a considerably different form than the first two regression courses at the Summer Program. This course will hopefully prepare you for the things you will encounter when you (attempt to) publish quantitative work with linear models, and more complicated ones, too.

Initial linear model classes focus on the assumptions and theoretical considerations of linear models and generally walk you through estimation and interpretation. Good courses also deal with diagnostics, though these often get less time than they should. Further, it is not always obvious what violations of these assumptions will lead to in practical terms.

This course will provide you with a systematic approach to assessing, fixing and presenting your linear model results. Though we focus almost exclusively on the linear model (we will allude to nonlinear models occasionally), the logic we follow will be helpful in dealing with nonlinear models as well. More details can be found in the syllabus

Dave’s Office Hours: 10:00-11:30AM M-F


  • Chris Schwarz (NYU, Political Science)
    Office Hours: 12:30-2PM M-F

  • Nick Davis (UW-Milwaukee, Political Science)
    Office Hours : 12:30-2PM M-F

1 Introduction

2-3 Effective Model Presentation

  • Slides pdf
  • Code r
  • To install the most recent version of the DAMisc package, you can do the following:

4 Lab 1: Factors and Interactions (Angell Hall, Computer Classrooms B and C)

Homework 1: Linear Model Presentation

The posted homework was updated on Saturday 6/30 at 4PM. If you’ve already done it, there is no need to redo it. If not, this new version clarifies exactly what we are asking for on question 1.

  • Instructions pdf (Updated)
  • Data rda

5-6 Linearity

Homework 2: Transformations and Polynomials

7 Resampling Methods

8 Model Selection/Multi-model Inference

9-11 Flexible Methods for Non-linearity: Splines, Smoothing, GAMs, KRLS

  • Slides pdf
  • Code: R

12 Regression Trees

  • Slides pdf
  • Code: R

13 DSS and Regression Diagnostics

  • Slides pdf
  • Code: R

Homework 3: Non-parametric Models

  • Handout pdf
  • Data rda
  • To install the version of polywog needed for diagFun to work:

14 Lab 2: Non-linearity

For this lab, you’ll need to have the following packages installed xgboost, earth, rpart, randomForest, pdp, ICEbox and bartMachine. All of these should install with a simple call to install.packages() in the usual way. Installing polywog from my github is a bit more complicated. I tried this on the UMich computing site and here’s what worked.


At this point, the computer asked me if I wanted to install Rtools because polywog needs something there to compile it. I said Yes. Then, the install of polywog failed, but the installation of Rtools continued. After Rtools finishes installing, do the following:


If that fails, you can try installing this Windows binary or this macOS binary. To make it work, download the file, change R’s working directory to the appropriate folder (where you downloaded polywog.zip) and do the following:

install.packages("polywog.zip", repos=NULL)
install.packages("polywog_0.4-1.tgz", repos=NULL)

15 Regression Discontinuity Designs

16 Missing Data and Multiple Imputation

  • Slides pdf
  • Code R
  • Code (SensMice) R
  • SensMice Source Code tar.gz

Homework 4 (optional)

  • Instructions pdf

17 Mixture Models

  • Slides pdf
  • Code R
  • Mixture Tools R
  • World Shapefiles zip
  • Data (Shapefile, Mixture Tools, Data)zip

18 Lab 3: Multiple Imputation, Mixture Models

Conclusion - Outliers, Heteroskedasticity, Inference
(not covered in lecture)

  • Heteroskedasticity and Non-normality: pdf R
  • Outliers and Infuential Data: pdf R
  • Robust Regression: pdf R
  • Critiques of Common Practice: pdf R