Chapter 8 Time-Series Cross-Sectional Models

TSCS Data have both cross-sectional (data on different units) and time-series (data over time) properties. As such:

For each unit (e.g., country) there are many different time-periods.
For each different time-period, there are many different units

8.1 Preliminaries and Assumptions

In the presence of such data structure, conventional cross-sectional results are wrong:

At least inefficient - standard errors will be wrong.
Could also be biased depending on the particular nature of the data.

There are several issues that come up when we consider this particular dependence structure in the data.

Non-independence: the assumption that all observations are independent is almost certainly violated.
- Observations close in time for the same unit are often more alike each other than observations far away in time.
- Consequence I: Serial correlation (errors are correlated with each other).
- Consequence II: Model mis-specification: if \(y_{t}\) is a function of \(y_{t-1}\) and we don’t include it, then we have mis-specified the model. -Trending - series that tend to increase or decrease by the same amount over time.
- Two series that are trending will tend to show a strong relationship, but the relationship could be spurious as both variables are a function of time.
- Also violates the assumption of stationarity - finite, constant variance.
Structural breaks - rapid and immediate changes in mean or variance

While there are lots of things we could care about with respect to time-series, for TSCS data, it’s almost always about autoregressive errors.

\[y_{t} = \alpha_{0} + \alpha_{1} y_{t-1} + \varepsilon_{t}\]

Here, the \(\alpha_{0}\) allows the long-run equilibrium to be non-zero. Assuming \(|\alpha_{1}| < 1\), the series will eventually, though not immediately, return to its long-term equilibrium after an intervention (without further interventions).

The correlation between temporally adjacent values of \(y\) is:

\[Corr\left(y_{t}, y_{t-1}\right) \equiv \rho \equiv \frac{E\left(\left(y_{t} - \mu_{t}\right)\left(y_{t-1}-\mu_{t-1}\right)\right)}{E\left(y_{t} - \mu_{t}\right)^{2}}\]

We can estimate \(\hat{\rho}\) by plugging \(\bar{y}\) in for \(\mu\).

There are a few assumptions that we should make sure to understand. The first is stationarity. Covariance stationarity is defined by:

\(E(y_{t})\) is constant
\(\text{Var}(y_{t})\) is constant
\(\text{Cov}(y_{t=k}, y_{t=k+h})\) depends only on \(h\) and not \(k\).

Another is exogeneity. Strict exogeneity is defined as:

\[\begin{aligned} y_{t} &= \beta_{0} + \beta_{1}x_{t} + \varepsilon_{t}, \\ E(\varepsilon_{t} | \mathbf{X}) &= 0 \implies E(\varepsilon_{t} | x_{t + h}) = 0 \quad \forall h \end{aligned}\]

If exogeneity is violated, results will be biased. A weaker version (contemporaneous exogeneity) will suffice if \(T\) is large.

\[E(\varepsilon_{t}|x_{t}) = 0\]

One of the ways that we deal with some of these problems could be through the Autoregressive Distributive Lag (ADL) model. Here we assume:

\[y_{t} = \alpha_{0} + \alpha_{1}y_{t-1} + \varepsilon_{t}\]

Where \(\alpha_{0}\) assumes that the series has a non-zero mean. In the static process the long-run equilibrium is \(\beta_{0} = \frac{\alpha_{0}}{1-\alpha_{1}}\). However, we will tend to focus on the lagged dependent variable (LDV) model:

\[y_{t} = \alpha_{0} + \alpha_{1}y_{t-1} + \beta_{1}x_{t} + \varepsilon_{t}\]

Where the short-term (immediate) effect of \(x_{t}\) is \(\beta_{1}\) and the long-term effect of \(x_{t}\) is \(\frac{\beta_{1}}{1-\alpha_{1}}\)

Much of the discussion above is derived from Mark Pickup’s excellent primer on Time Series models. The book, though, is really about single-series models. Here, we are interested in time-series cross-sectional models, which have multiple series. All of the issues mentioned above get much more complicated in TSCS data becuse there are, in effect, many different time-series that we’re trying to model simultaneously. Further, the parameters are often constrained to be the same across the different series, which may or may not be appropriate for the data.

8.2 Unit Effects

In the chapter discussing multilevel models, we already discussed between- and within-estimators. In the TSCS literature, this same idea often originates from a different concern - how to deal with unit effects. That is - what is the appropriate way to incorporate in our model the unobserved heterogeneity that comes from unobserved group-specific characteristics? Modeling unit effects refers to acknowledging that there may be differences in the average level of the dependent variable across groups.

\[y_{i} = \alpha_{j[i]} + \beta x_{i} + \varepsilon_{i}; \quad \varepsilon_{i}\sim N(0, \sigma_{y}^{2})\]

After controlling for \(x\), there may remain unexplained variation relating to groups in \(y\). That is what \(\alpha_{j[i]}\) is meant to capture. Failing to account for \(\alpha_{j[i]}\) will usually result in biased estimates of \(\beta\). There are two approaches to solving this problem - fixed effects and random effects. We would define the fixed-effects estimator as:

\[y_{i} = \sum_{j=1}^{N}\alpha_{j}z_{j[i]} + \beta x_{i} + \varepsilon_{i}; \quad \varepsilon_{i}\sim N(0, \sigma_{y}^{2})\]

Here, the unit effects are a systematic part of the model. That means that the unobserved characteristics captured by the different intercepts could be correlated with the other variables in the systematic component of the model. Because these different intercepts are estimated as part of the systematic component, it wouldn’t make sense to think about what other “new” groups might look like. So, this is really a method that is appropriate when the groups are meaningful in their own right and not just a sample from a much larger set of groups. Because the fixed effects are in the systematic part of the model, they they would be perfectly collinear with any other variable that varied only by group, so those variables cannot be in the model. It is also worth noting that the fixed-effects estimator is a within-estimator.

The random effect model is defined as follows:

\[y_{i} = \alpha_{j[i]} + \beta x_{i} + \varepsilon_{i}; \quad \alpha_{j}\sim N(\mu_{\alpha}, \sigma_{\alpha}^{2}); \quad \varepsilon_{i}\sim N(0, \sigma_{y}^{2})\]

Here, the unit-specific effects are actually in the stochastic part of the model. As such, there is an assumption that they are uncorrelated with anything in the systematic part of the model. Bell and Jones suggest including all group-mean and within variables in the model, therefore purging the errors of any systematic relationship with the mean of the covariates in the model. It is also worth noting here that we are not modeling the unit-specific effects directly, rather the model accounts for the added variability that comes from the grouping. Specifically, it estimates the variance of the unit effects which are assumed to have a normal distribution with mean equal to 0. The random effect model, as we saw in the chapter on multilevel models, can incorporate group-specific variables and can accommodate itself to both between and within relationships.

Clark and Linzer (PSRM) give the following advice about the choice. - In the standard case (non-sluggish variables) - both RE and FE do equally well as regards bias, use whichever one you want. - Your hypothesis should be the guide - if you have a within hypothesis, you should use a within estimator. - With “sluggish” independent variables, - RE is better with few observations overall and few observations per unit.
- FE is better with larger N when the correlation between unit effects and independent variable are moderate to high. - FE is better in all cases when the number of observations per unit (times in our case) is bigger than 20.

The first-difference (FD) estimator also solves some problems - particularly related to stationarity. We would define the estimator as:

\[\Delta y_{it} = \beta \Delta x_{it} + \Delta \varepsilon_{it}\]

where:
- \(\Delta y_{it} = y_{it} - y_{i,t-1}\) - \(\Delta x_{it} = x_{it} - x_{i,t-1}\) - \(\Delta \varepsilon_{it} = \varepsilon_{it} - \varepsilon_{i,t-1}\)

This model makes sense particularly if you’re interested in changes.

8.3 The plm Package

The plm package in R has a number of functions that will help us out here. We’ll load in some data to use below:

library(rio)
library(plm)
dat <- import("https://quantoid.net/files/9591/repress_data.dta")

The fixed effect model is estimated with the model="within" and effect="individual" argument.

mod1.fe <- plm(repress ~ voice*veto +
              log(pop) + log(rgdpch) + iwar + cwar, data=dat, 
              model = "within", index= c("ccode", "year"), 
              effect="individual")
summary(mod1.fe)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = repress ~ voice * veto + log(pop) + log(rgdpch) + 
##     iwar + cwar, data = dat, effect = "individual", model = "within", 
##     index = c("ccode", "year"))
## 
## Unbalanced Panel: n = 159, T = 6-28, N = 3876
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -4.8942061 -0.6903648  0.0022614  0.7214781  4.4375575 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## voice        0.063040   0.014638  4.3065 1.702e-05 ***
## veto         0.233100   0.067497  3.4535 0.0005595 ***
## log(pop)    -1.430969   0.146190 -9.7884 < 2.2e-16 ***
## log(rgdpch) -0.125093   0.089594 -1.3962 0.1627308    
## iwar        -0.205658   0.148735 -1.3827 0.1668359    
## cwar        -0.633611   0.106042 -5.9751 2.516e-09 ***
## voice:veto   0.013440   0.012657  1.0619 0.2883504    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5062.1
## Residual Sum of Squares: 4797.3
## R-Squared:      0.052307
## Adj. R-Squared: 0.010159
## F-statistic: 29.2526 on 7 and 3710 DF, p-value: < 2.22e-16

You Try It!

For the applied exercises in this chapter, we’re using data from:

Ross, Michel (2001) “Does Oil Hinder Democracy?” World Politics 53(3): 325-361. You can load the data with:

load("data/oil.rda")

This will make an object called oil in your workspace.

Estimate the linear, between and one-way fixed effect model of regime1 on the five-year lags of regime1, oil, metal logl135 as well as the contemporaneous values of islam and oecd.

There is also a two-way fixed effect model where there are both unit and time. We can estimate that model with:

mod1.twfe <- plm(repress ~ voice*veto +
              log(pop) + log(rgdpch) + iwar + cwar, data=dat, 
              model = "within", index= c("ccode", "year"), 
              effect="twoway")
summary(mod1.twfe)

## Twoways effects Within Model
## 
## Call:
## plm(formula = repress ~ voice * veto + log(pop) + log(rgdpch) + 
##     iwar + cwar, data = dat, effect = "twoway", model = "within", 
##     index = c("ccode", "year"))
## 
## Unbalanced Panel: n = 159, T = 6-28, N = 3876
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -4.6429697 -0.6536746 -0.0081787  0.6951082  4.3962291 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## voice        0.082430   0.014632  5.6334 1.899e-08 ***
## veto         0.298216   0.067789  4.3992 1.117e-05 ***
## log(pop)    -0.065765   0.230294 -0.2856   0.77522    
## log(rgdpch)  0.225734   0.108414  2.0822   0.03740 *  
## iwar        -0.229900   0.149270 -1.5402   0.12361    
## cwar        -0.589891   0.106838 -5.5214 3.596e-08 ***
## voice:veto   0.023783   0.012582  1.8902   0.05881 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    4852.3
## Residual Sum of Squares: 4633
## R-Squared:      0.045196
## Adj. R-Squared: -0.0045795
## F-statistic: 24.905 on 7 and 3683 DF, p-value: < 2.22e-16

You Try It!

Estimate the two-way fixed effects version of the model you estimated above.

We could estimate the random effects model in lme4 or brms as discussed above, but we could also estimate it with plm as follows:

mod1.re <- plm(repress ~ voice*veto +
              log(pop) + log(rgdpch) + iwar + cwar, data=dat, 
              model = "random", index= c("ccode", "year"), 
              effect="individual")
summary(mod1.re)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = repress ~ voice * veto + log(pop) + log(rgdpch) + 
##     iwar + cwar, data = dat, effect = "individual", model = "random", 
##     index = c("ccode", "year"))
## 
## Unbalanced Panel: n = 159, T = 6-28, N = 3876
## 
## Effects:
##                  var std.dev share
## idiosyncratic 1.2931  1.1371 0.579
## individual    0.9409  0.9700 0.421
## theta:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5683  0.7837  0.7837  0.7718  0.7837  0.7837 
## 
## Residuals:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -4.5803 -0.7387  0.0145  0.0037  0.7474  4.3187 
## 
## Coefficients:
##              Estimate Std. Error  z-value  Pr(>|z|)    
## (Intercept)  3.822831   0.636985   6.0014 1.956e-09 ***
## voice        0.069137   0.014104   4.9020 9.486e-07 ***
## veto         0.203971   0.066453   3.0694 0.0021449 ** 
## log(pop)    -0.607796   0.048474 -12.5385 < 2.2e-16 ***
## log(rgdpch)  0.191204   0.053444   3.5777 0.0003467 ***
## iwar        -0.150622   0.149752  -1.0058 0.3145066    
## cwar        -0.774438   0.106113  -7.2983 2.915e-13 ***
## voice:veto   0.057605   0.011702   4.9226 8.541e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5628.6
## Residual Sum of Squares: 5150.5
## R-Squared:      0.084946
## Adj. R-Squared: 0.08329
## Chisq: 358.302 on 7 DF, p-value: < 2.22e-16

You Try It!

Estimate the random effect model (both one-way and two-way) similar to the models you estimated above.

One of the main benefits of using the plm package is access to a number of tests for residuals and also access to lag(), lead() and diff() functions that work as expected with time-series cross-sectional data. First, we could test for serial correlation in the errors with either the Breusch-Pagan test:

pbgtest(mod1.fe, order=1)

## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  repress ~ voice * veto + log(pop) + log(rgdpch) + iwar + cwar
## chisq = 731.08, df = 1, p-value < 2.2e-16
## alternative hypothesis: serial correlation in idiosyncratic errors

Or the Wooldridge test, which only works on fixed effects models.

pwartest(mod1.fe)

## 
##  Wooldridge's test for serial correlation in FE panels
## 
## data:  mod1.fe
## F = 340.26, df1 = 1, df2 = 3715, p-value < 2.2e-16
## alternative hypothesis: serial correlation

The Breusch-Pagan test also works on random effects models.

pbgtest(mod1.re, order=1)

## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  repress ~ voice * veto + log(pop) + log(rgdpch) + iwar + cwar
## chisq = 852.02, df = 1, p-value < 2.2e-16
## alternative hypothesis: serial correlation in idiosyncratic errors

You Try It!

Evaluate the two-way fixed effects model for serial correlation using the tests above. - See if a different lag structure solves the problem.

One way that we could attempt to solve the problem is by adding a lagged dependent variable to the model.

mod2.fe <- plm(repress ~ lag(repress, 1) + voice*veto +
              log(pop) + log(rgdpch) + iwar + cwar, data=dat, 
              model = "within", index= c("ccode", "year"), 
              effect="individual")
summary(mod2.fe)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = repress ~ lag(repress, 1) + voice * veto + log(pop) + 
##     log(rgdpch) + iwar + cwar, data = dat, effect = "individual", 
##     model = "within", index = c("ccode", "year"))
## 
## Unbalanced Panel: n = 159, T = 5-27, N = 3712
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -4.109986 -0.584769  0.012532  0.589705  3.923199 
## 
## Coefficients:
##                   Estimate Std. Error t-value  Pr(>|t|)    
## lag(repress, 1)  0.4563965  0.0145378 31.3937 < 2.2e-16 ***
## voice            0.0325649  0.0132895  2.4504 0.0143167 *  
## veto             0.1305343  0.0608242  2.1461 0.0319333 *  
## log(pop)        -0.7825166  0.1371398 -5.7060 1.252e-08 ***
## log(rgdpch)     -0.1098530  0.0820230 -1.3393 0.1805605    
## iwar            -0.0075908  0.1332022 -0.0570 0.9545587    
## cwar            -0.3415915  0.0936629 -3.6470 0.0002691 ***
## voice:veto       0.0027028  0.0113383  0.2384 0.8116001    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    4689.6
## Residual Sum of Squares: 3484.4
## R-Squared:      0.25698
## Adj. R-Squared: 0.22219
## F-statistic: 153.262 on 8 and 3545 DF, p-value: < 2.22e-16

We could then test for serial correlation again:

pbgtest(mod2.fe, order=1)

## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  repress ~ lag(repress, 1) + voice * veto + log(pop) + log(rgdpch) +     iwar + cwar
## chisq = 65.318, df = 1, p-value = 6.375e-16
## alternative hypothesis: serial correlation in idiosyncratic errors

pwartest(mod2.fe)

## 
##  Wooldridge's test for serial correlation in FE panels
## 
## data:  mod2.fe
## F = 2.527, df1 = 1, df2 = 3551, p-value = 0.112
## alternative hypothesis: serial correlation

We could also test for cross-sectional dependence. This is the situation where there is contemporaneous correlation across groups. Below is Breusch-Pagan’s original LM test with test="lm" and Peseran’s CD statistic with test="cd". These tests also work with random effects models.

pcdtest(mod2.fe, test="lm")

## 
##  Breusch-Pagan LM test for cross-sectional dependence in panels
## 
## data:  repress ~ lag(repress, 1) + voice * veto + log(pop) + log(rgdpch) +     iwar + cwar
## chisq = 16535, df = 12492, p-value < 2.2e-16
## alternative hypothesis: cross-sectional dependence

pcdtest(mod2.fe, test="cd")

## 
##  Pesaran CD test for cross-sectional dependence in panels
## 
## data:  repress ~ lag(repress, 1) + voice * veto + log(pop) + log(rgdpch) +     iwar + cwar
## z = 8.1655, p-value = 3.202e-16
## alternative hypothesis: cross-sectional dependence

There are robust standard errors we can use to correct for serial correlation and cross-sectional dependence. These are called Driscoll-Kray standard errors and have only been shown to work with fixed effects estimators.

library(lmtest)
coeftest(mod2.fe, vcov.=vcovSCC)

## 
## t test of coefficients:
## 
##                   Estimate Std. Error t value  Pr(>|t|)    
## lag(repress, 1)  0.4563965  0.0395825 11.5303 < 2.2e-16 ***
## voice            0.0325649  0.0107890  3.0183  0.002560 ** 
## veto             0.1305343  0.0487847  2.6757  0.007491 ** 
## log(pop)        -0.7825166  0.1203529 -6.5019 9.043e-11 ***
## log(rgdpch)     -0.1098530  0.0781457 -1.4057  0.159888    
## iwar            -0.0075908  0.0956638 -0.0793  0.936760    
## cwar            -0.3415915  0.0543238 -6.2881 3.606e-10 ***
## voice:veto       0.0027028  0.0176934  0.1528  0.878598    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Arellano standard errors can fix problems of serial correlation and heteroskedasticity.

coeftest(mod2.fe, vcov.=vcovHC, type="HC3", method="arellano")

## 
## t test of coefficients:
## 
##                   Estimate Std. Error t value  Pr(>|t|)    
## lag(repress, 1)  0.4563965  0.0261685 17.4407 < 2.2e-16 ***
## voice            0.0325649  0.0178315  1.8263 0.0678961 .  
## veto             0.1305343  0.0665615  1.9611 0.0499445 *  
## log(pop)        -0.7825166  0.2012610 -3.8881 0.0001029 ***
## log(rgdpch)     -0.1098530  0.1004767 -1.0933 0.2743288    
## iwar            -0.0075908  0.1320141 -0.0575 0.9541501    
## cwar            -0.3415915  0.1013826 -3.3693 0.0007616 ***
## voice:veto       0.0027028  0.0151234  0.1787 0.8581695    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You Try It!

How does the serial correlation robust standard error compare to the lag-structure modeling you did above?

We could also get the Beck and Katz panel corrected standard errors (PCSE) with:

coeftest(mod2.fe, vcov.=vcovBK(mod2.fe, type="HC3"))

## 
## t test of coefficients:
## 
##                   Estimate Std. Error t value  Pr(>|t|)    
## lag(repress, 1)  0.4563965  0.0177010 25.7836 < 2.2e-16 ***
## voice            0.0325649  0.0168745  1.9298 0.0537078 .  
## veto             0.1305343  0.0713440  1.8296 0.0673869 .  
## log(pop)        -0.7825166  0.1912979 -4.0906 4.399e-05 ***
## log(rgdpch)     -0.1098530  0.1114558 -0.9856 0.3243872    
## iwar            -0.0075908  0.1508133 -0.0503 0.9598603    
## cwar            -0.3415915  0.0937059 -3.6454 0.0002709 ***
## voice:veto       0.0027028  0.0141647  0.1908 0.8486821    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You Try It!

How do the results change if you use the BK PCSEs instead?