--- title: "Regression III: Lab2" output: html_document --- ```{r setup, echo=FALSE, include=FALSE, message=FALSE, warning=FALSE} knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE) library(DAMisc) ``` Both of the questions in this lab use state repression as the dependent variable. In a general sense, state repression is the violation of human rights by the state. In this case, the focus is on the set of "physical integrity rights" - the rights to be free from torture, political imprisonment, extrajudicial killing and forced disappearance. ## Question 1 This question uses the `q1data.rda` file (which wil put an object in your workspace called `q1data`). You can get this file by either downloading it from the course website https://quantoid.net/teachicpsr/regression3 or by doing the following in R: ```{r} q1 <- file("https://quantoid.net/files/reg3/q1data.rda") load(q1) close(q1) ``` This file has a number of variables. You can find a short description of each with: ```{r} searchVarLabels(q1data, "") ``` The dependent variable is going to be `physint` the CIRI physical integrity rights index. All of the other variables, except for `ccode` and `Year` will be the independent variables. 1. Use `alsosDV` in the `DAMisc` package to figure out whether the 9-category (0-8) variable could be treated as an interval-level variabled and modeled with OLS without other modifications. 2. Once you've done that, diagnose problems with non-linearity using the conventional methods we talked about early last week (e.g., CR Plots, splines, polynomials, transformations). Implement simple fixes to the problems if they exist. What model would you present. ## Question 2 We're going to continue to investigate repression in this question with a different dependent variable and potentially many other independent variables. This question uses the `q2data.rda` file (which wil put an object in your workspace called `q2data`). You can get this file by either downloading it from the course website https://quantoid.net/teachicpsr/regression3 or by doing the following in R: ```{r} q2 <- file("https://quantoid.net/files/reg3/q2data.rda") load(q2) close(q2) ``` There are too many variables to print out the list, but you can generate it with: ```{r, eval=FALSE} searchVarLabels(q2data, "") ``` I have organized the data so the country identifiers are first, the DV `fariss_repress` is next, then variables related to democracy and rights, next variables related to conflict and then finally variables related to other characteristics of the country. Before you start your investigation, I want you to take 33% of your data out and save it for later. You can do that as follows: ```{r} samps <- sample(1:nrow(q2data), floor(.33*nrow(q2data)), replace=FALSE) keep <- setdiff(1:nrow(q2data), samps) dat66 <- q2data[keep, ] hold <- q2data[samps, ] ``` We'll do the preliminary investigation on half of the `dat66` object and the preliminary testing on the other half. Only once we get to a final model we want to try will we do so on the `hold` data object. 1. Much of the work on repression suggests that conflict increases demand for repression and democratic institutions and behaviors tend to reduce it. Estimate a parametric model that captures this idea along with whatever controls you think might be important. 2. Try some of the machine learning algorithms (CART, random forest, XGBOOST, BART) and evaluate the PDPs and/or ice plots. What to do those things say about the nature of the relationship between the independent variables and repression? Is there any evidence of an interaction? 3. Use the `diagFun` function that we talked about in class to diagnose with your parametric model. What, if any, problems exist. 4. Now, put all of the possible independent variables into the machine learning algorithms above. How do the results of these models compare to the parameteric model and to the results from the machine learning model on a subset of the variables?