`factorplot`

is an R package that helps visualize pairwise comparisons. It is particularly useful as a post-estimation technique following a (G)LM, multinomial logistic regression or any other multiple comparison procedure done with multcomp. A more thorough discussion of the method and what it produces are here.

The basic idea is that often when variables with multiple categories are used in regression models, we want to know not only the difference between each category and the reference group, but we want to know the differences in all possible pairs of values of that variable. There are lots of methods out there for doing just that, but I developed the `factorplot`

package to be simple to decode relative to its competitors. Here are a few examples of where the `factorplot`

package could be useful.

To show how the `factorplot`

function works for (G)LM coefficients, we are going to use the `Ornstein`

data from the `carData`

package. Here the outcome is interlocking firm memberships and the independent variables are the log of assets, the nation in which the firm resides and the sector in which the firm operates.

```
library(factorplot)
data(Ornstein, package="carData")
mod <- glm(interlocks ~ log(assets) + sector + nation,
data=Ornstein, family=poisson)
summary(mod)
#>
#> Call:
#> glm(formula = interlocks ~ log(assets) + sector + nation, family = poisson,
#> data = Ornstein)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -6.7111 -2.3159 -0.4595 1.2824 6.2849
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.83938 0.13664 -6.143 8.09e-10 ***
#> log(assets) 0.45145 0.01698 26.585 < 2e-16 ***
#> sectorBNK -0.16651 0.09575 -1.739 0.082036 .
#> sectorCON -0.48928 0.21320 -2.295 0.021736 *
#> sectorFIN -0.11161 0.07571 -1.474 0.140457
#> sectorHLD -0.01491 0.11924 -0.125 0.900508
#> sectorMAN 0.12187 0.07614 1.600 0.109489
#> sectorMER 0.06157 0.08670 0.710 0.477601
#> sectorMIN 0.24985 0.06888 3.627 0.000286 ***
#> sectorTRN 0.15181 0.07893 1.923 0.054453 .
#> sectorWOD 0.49825 0.07560 6.590 4.39e-11 ***
#> nationOTH -0.10699 0.07438 -1.438 0.150301
#> nationUK -0.38722 0.08951 -4.326 1.52e-05 ***
#> nationUS -0.77239 0.04963 -15.562 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 3737.0 on 247 degrees of freedom
#> Residual deviance: 1547.1 on 234 degrees of freedom
#> AIC: 2473.1
#>
#> Number of Fisher Scoring iterations: 5
```

The `factorplot`

function initially produces no output, but calculates all of the pairwise comparisons.

`f <- factorplot(mod, factor.var="sector")`

There are print, summary and plotting methods for the `factorplot`

object. The print method has an optional argument `sig`

, which if set to `TRUE`

only prints significant differences.

```
print(f, digits=3, sig=TRUE)
#> Difference SE p.val
#> AGR - CON 0.489 0.213 0.023
#> CON - HLD -0.474 0.235 0.045
#> BNK - MAN -0.288 0.102 0.005
#> CON - MAN -0.611 0.215 0.005
#> FIN - MAN -0.233 0.082 0.005
#> BNK - MER -0.228 0.106 0.032
#> CON - MER -0.551 0.220 0.013
#> AGR - MIN -0.250 0.069 0.000
#> BNK - MIN -0.416 0.084 0.000
#> CON - MIN -0.739 0.210 0.001
#> FIN - MIN -0.361 0.067 0.000
#> HLD - MIN -0.265 0.118 0.026
#> MER - MIN -0.188 0.085 0.029
#> BNK - TRN -0.318 0.082 0.000
#> CON - TRN -0.641 0.217 0.004
#> FIN - TRN -0.263 0.070 0.000
#> AGR - WOD -0.498 0.076 0.000
#> BNK - WOD -0.665 0.095 0.000
#> CON - WOD -0.988 0.215 0.000
#> FIN - WOD -0.610 0.077 0.000
#> HLD - WOD -0.513 0.121 0.000
#> MAN - WOD -0.376 0.080 0.000
#> MER - WOD -0.437 0.090 0.000
#> MIN - WOD -0.248 0.072 0.001
#> TRN - WOD -0.346 0.081 0.000
```

The summary method identifies the number of significant positive, significant negative and insignificant differences with other stimuli.

```
summary(f)
#> sig+ sig- insig
#> AGR 1 2 6
#> BNK 0 5 4
#> CON 0 7 2
#> FIN 0 4 5
#> HLD 1 2 6
#> MAN 3 1 5
#> MER 2 2 5
#> MIN 6 1 2
#> TRN 3 1 5
#> WOD 9 0 0
```

The plotting method colors each box according to whether and how each difference is significant. For example, the dark-gray box at the nexus of the `AGR`

and `CON`

row and column, respectively, indicates that the agriculture sector has significantly more interlocking board memberships than does the construction sector. The light-gray boxes indicate that the difference is significant in favor of the column rather than the row.

`plot(f)`

The default means for adjusting for multiplicity is to do nothing, but all of the options of `p.adjust`

are available.

```
f <- factorplot(mod, factor.var="sector", adjust.method="holm")
plot(f)
```

For those using the `glht()`

function from `multcomp`

, the `factorplot`

function works for these objects, too.

```
g <- glht(mod, linfct = mcp("sector" = "Tukey"))
s <- summary(g)
plot(factorplot(s))
```

One of the biggest problems with interpreting multinomial logit coefficients is that the reference category problem applies to all variables because it is a function not only of potentially some independent variables, but also of the dependent variable. Here the `factorplot`

function can help.

```
data("Chile", package="carData")
library(nnet)
mod <- multinom(vote ~ age + sex + education, data=Chile)
#> # weights: 24 (15 variable)
#> initial value 3494.848084
#> iter 10 value 3121.795602
#> iter 20 value 3077.986718
#> final value 3077.985852
#> converged
summary(mod)
#> Call:
#> multinom(formula = vote ~ age + sex + education, data = Chile)
#>
#> Coefficients:
#> (Intercept) age sexM educationPS educationS
#> N 1.0928913 0.00601010 0.58599491 0.3288071 -0.2398034
#> U 0.9496712 0.02041244 -0.24447512 -1.0294778 -0.6974096
#> Y 1.1850035 0.02082591 0.01426536 -0.4851798 -0.7827258
#>
#> Std. Errors:
#> (Intercept) age sexM educationPS educationS
#> N 0.3088481 0.006303204 0.1628414 0.2501653 0.1976728
#> U 0.3166930 0.006485748 0.1717826 0.2778410 0.2012233
#> Y 0.3059721 0.006264887 0.1640342 0.2536581 0.1954040
#>
#> Residual Deviance: 6155.972
#> AIC: 6185.972
```

In the above, the effect of age is the effect of `sexM`

is the effect on the binary choice between the reference (Abstain) and each non-reference level. We can see the effects for all pairs of levels with `factorplot`

.

```
f <- factorplot(mod, variable="sexM")
plot(f)
```

Notice that the factorplot shows that all of the differences between non-reference categories are statistically significant. As shown above, one way to specify the `factorplot`

for multinomial logit objects is to give the name of a single regressor. You can also provide the name of a term, for example `education`

, which has multiple regressors in the model. Here, there is a single heading for `Reference`

. The reference here stands in for not only the reference group of `education`

in the non-reference dependent variable categories, but also for all education categories in the dependent variable reference category. This makes sense because all of these parameters are set to 0 for identification purposes. Here is the plot for education:

```
f <- factorplot(mod, variable="education")
plot(f)
```

The default method for `factorplot`

will also take as the argument to `obj`

a vector of estimates and a variance-covariance matrix of the estimates as the `var`

argument. Using this method, you could plot factorplots for any estimates you might want.