--- title: "Lecture 2 Exercises" author: "Your Name" output: html_document --- ```{r setup, message = FALSE, warning = FALSE, echo=FALSE} knitr::opts_chunk$set(message = FALSE, warning = FALSE, dev="svg", tidy=TRUE, tidy.opts = list(only.comment=TRUE, width.cutoff=80)) library(summarytools) library(dplyr) ``` First, we need to load in the data. (It's also available locally in the RStudio.cloud instance or from [my website](https://quantoid.net/files/reg3/counties.rda)) ```{r} load(file("https://quantoid.net/files/reg3/counties.rda")) ``` This makes an object in your workspace called `counties`. We can see the summary of the data with the following: ```{r, results='asis'} counties %>% select(-c("fips", "NAME", "date", "state", "st")) %>% dfSummary(., plain.ascii=FALSE) ``` I want you to use the data to answer the following questions. ### Question 1 There are two categorical variables - `region` and `urban_rural`. Build a model of `cases` (or its log+1 if you prefer) that uses at least one of these variables. - Use the methods we talked about in class to visualize the pairwise differences for the categorical variable you used. - Try some of the different p-value adjustments to see how your results change. ### Question 2 Pick the two categorical variables plus at least three of the other variables as covariates. - Standardize the continuous variables and re-run the regression. Using this metric, which variable do you think has the biggest impact? (hint, you can use the `scaleDataFrame()` function from the `{DAMisc}` package to get all of the quantitative variables standardized). - Rescale all of the variables into the range [0-1] and re-evaluate which variable has the biggest impact. Hint, you could do the rescaling with the function below. ```{r} counties_rescale <- counties %>% mutate_if(is.numeric, scales::rescale) ``` - Use the `relimp` function to evaluate which variable has the biggest impact.