Regression III: Lab2

Both of the questions in this lab use state repression as the dependent variable. In a general sense, state repression is the violation of human rights by the state. In this case, the focus is on the set of “physical integrity rights” - the rights to be free from torture, political imprisonment, extrajudicial killing and forced disappearance.

Question 1

This question uses the q1data.rda file (which wil put an object in your workspace called q1data). You can get this file by either downloading it from the course website https://quantoid.net/teachicpsr/regression3 or by doing the following in R:

q1 <- file("https://quantoid.net/files/reg3/q1data.rda")
load(q1)
close(q1)

This file has a number of variables. You can find a short description of each with:

searchVarLabels(q1data, "")

##                  ind                                             label
## ccode              1                                          COW code
## Year               2                                              Year
## polity2            3                                           Polity2
## pop                4                                        Population
## all_phones_pc      5    All Telephones, including Cellular, Per Capita
## gdppc_mp           6 Gross National Product Per Capita (Market Prices)
## revols             7                                       Revolutions
## riots              8                                             Riots
## agdems             9                    Anti-Government Demonstrations
## terror_incidents  10                     Number of terrorist incidents
## physint           11                     CIRI Physical Integrity Index

The dependent variable is going to be physint the CIRI physical integrity rights index. All of the other variables, except for ccode and Year will be the independent variables.

Use alsosDV in the DAMisc package to figure out whether the 9-category (0-8) variable could be treated as an interval-level variabled and modeled with OLS without other modifications.
Once you’ve done that, diagnose problems with non-linearity using the conventional methods we talked about early last week (e.g., CR Plots, splines, polynomials, transformations). Implement simple fixes to the problems if they exist. What model would you present.

Question 2

We’re going to continue to investigate repression in this question with a different dependent variable and potentially many other independent variables. This question uses the q2data.rda file (which wil put an object in your workspace called q2data). You can get this file by either downloading it from the course website https://quantoid.net/teachicpsr/regression3 or by doing the following in R:

q2 <- file("https://quantoid.net/files/reg3/q2data.rda")
load(q2)
close(q2)

There are too many variables to print out the list, but you can generate it with:

searchVarLabels(q2data, "")

I have organized the data so the country identifiers are first, the DV fariss_repress is next, then variables related to democracy and rights, next variables related to conflict and then finally variables related to other characteristics of the country. Before you start your investigation, I want you to take 33% of your data out and save it for later. You can do that as follows:

samps <- sample(1:nrow(q2data), floor(.33*nrow(q2data)), replace=FALSE)
keep <- setdiff(1:nrow(q2data), samps)
dat66 <- q2data[keep, ]
hold <- q2data[samps, ]

We’ll do the preliminary investigation on half of the dat66 object and the preliminary testing on the other half. Only once we get to a final model we want to try will we do so on the hold data object.

Much of the work on repression suggests that conflict increases demand for repression and democratic institutions and behaviors tend to reduce it. Estimate a parametric model that captures this idea along with whatever controls you think might be important.
Try some of the machine learning algorithms (CART, random forest, XGBOOST, BART) and evaluate the PDPs and/or ice plots. What to do those things say about the nature of the relationship between the independent variables and repression? Is there any evidence of an interaction?
Use the diagFun function that we talked about in class to diagnose with your parametric model. What, if any, problems exist.
Now, put all of the possible independent variables into the machine learning algorithms above. How do the results of these models compare to the parameteric model and to the results from the machine learning model on a subset of the variables?