--- title: "Lecture 2 Exercises" author: "Dave Armstrong" output: html_document --- ```{r setup, message = FALSE, warning = FALSE, echo=FALSE} knitr::opts_chunk$set(message = FALSE, warning = FALSE, dev="svg", tidy=TRUE, tidy.opts = list(only.comment=TRUE, width.cutoff=100)) library(summarytools) library(dplyr) ``` Before you start, you'll need a number of packages. You should update the `{DAMisc}` package using the code below. ```{r, eval=FALSE} if(!requireNamespace("remotes")){ install.packages("remotes") } remotes::install_github("davidaarmstrong/damisc") ``` You can run the script below to install the other packages that you don't already have. R may ask you a question about whether you want to update several other packages that are dependencies of the packages you're trying to install. You can do this or not as you like. If you do update packages, you may also get a question about whether to update packages that need compilation. Unless you've installed tools to compile R packages (C and fortran compilers), then you should say "No" to that question. If you want to install those tools, [here](https://thecoatlessprofessor.com/programming/cpp/r-compiler-tools-for-rcpp-on-macos/) is a good guide for doing so on macs and [here](https://cran.r-project.org/bin/windows/Rtools/) is a guide for installing the tools on PCs. ```{r eval=FALSE} if(!requireNamespace("remotes")){ install.packages("remotes") } remotes::install_github("davidaarmstrong/factorplot") remotes::install_github("davidaarmstrong/daviz@pre-will") remotes::install_github("davidaarmstrong/psre") install.packages(c("multcomp", "tibble", "ggplot2", "car", "relimp")) ``` First, we need to load in the data. ```{r} load(file("https://quantoid.net/files/reg3/l3/counties.rda")) codebook <- as.data.frame(sapply(counties, function(x)attr(x, "label"))) %>% as_tibble(rownames = "varname") %>% setNames(c("varname", "description")) library(DT) datatable(codebook) ``` I want you to use the data to answer the following questions. ### Question There are three categorical variables - `bpr`, `place_type` and `urban_rural`. Build a model of cases `cases_per100k` that uses at least one of these variables. - Use the methods we talked about in class to visualize the pairwise differences for the categorical variable you used. ```{r} m <- lm(cases_per100k ~ urban_rural + repvote + black_pop, data=counties) ocv <- optCL(m, varname="urban_rural", add_ref=TRUE, grid_range=c(.5,.99)) clev <- 1-(1-mean(ocv$opt_levels))/2 plot_dat <- tibble( urban_rural = factor(2:length(levels(counties$urban_rural)), levels=1:length(levels(counties$urban_rural)), labels=levels(counties$urban_rural)), b = coef(m)[2:9], se = sqrt(vcov(m)[2:9,2:9]), lwr = b-qt(clev, m$df.residual), upr = b+qt(clev, m$df.residual) ) ```