Fit "within-between" and several other regression variants for panel data in a multi-level modeling framework.
wbm(formula, data, id = NULL, wave = NULL, model = "w-b", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, family = gaussian, balance_correction = FALSE, dt_random = TRUE, dt_order = 1, pR2 = TRUE, pvals = TRUE, t.df = "Satterthwaite", weights = NULL, offset = NULL, scale = FALSE, scale.response = FALSE, n.sd = 1, ...)
Model formula. See details for crucial
The data, either a
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference).
Should the wave be included as a predictor? Default is FALSE.
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE.
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is
Use this to specify GLM link families. Default is
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE.
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities.
If detrending using
Calculate a pseudo R-squared? Default is TRUE, but in some cases may cause errors or add computation time.
Calculate p values? Default is TRUE but for some complex
linear models, this may take a long time to compute using the
For linear models only. User may choose the method for
calculating the degrees of freedom in t-tests. Default is
If using weights, either the name of the column in the data that contains the weights or a vector of the weights.
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be
Should the response variable also be rescaled? Default
How many standard deviations should you divide by for standardization? Default is 1, though some prefer 2.
wbm object, which inherits from
The within-between models, and multilevel panel models more generally,
distinguish between time-varying and time-invariant predictors. These are,
as they sound, variables that are either measured repeatedly (in every wave)
in the case of time-varying predictors or only once in the case of
time-invariant predictors. You need to specify these separately in the
formula to tell the model which variables you expect to change over time and
which will not. The primary way of doing so is via the
As an example, we can look at the WageData included in this
package. We will create a model that predicts the logarithm of the
individual's wages (
lwage) with their union status (
union), which can
change over time, and their race (
blk; dichotomized as black or
which does not change throughout the period of study. Our formula will look
lwage ~ union | blk
We put time-varying variables before the first
| and time-invariant
variables afterwards. You can specify lags like
lag(union) for time-varying
variables; for more than 1 lag, include the number:
After the first
| go the time-invariant variables. Note that if you put a
time-varying variable here, only the first wave measure will be used — in
some cases this will be what you want. You may also take a time-varying
variable --- let's say weeks worked (
wks) --- and use
include the individual's mean across all waves as a predictor while omitting
the per-wave measures.
There is also a place for a second
|. Here you can specify cross-level
interactions (within-level interactions can be specified here as well).
If I wanted the interaction term for
blk --- to see whether
the effect of union status depended on one's race --- I would specify the
formula this way:
lwage ~ union | blk | union * blk
Another use for the post-second
| section of the formula is for changing
the random effects specification. By default, only a random intercept is
specified in the call to
lme4::glmer(). If you would like
to specify other random slopes, include them here using the typical
lwage ~ union | blk | (union | id)
Note that if your random slope term has non-alphanumeric characters (like
if you want a random slope for
lag(union), then for the random effect
specification only, you need to put that term in backticks. For example,
lwage ~ lag(union) | blk | (`lag(union)` | id)
This is just a limitation of the way the formulas are dealt with by
One last thing to know: If you want to use the second
| but not the first,
put a 1 or 0 after the first, like this:
lwage ~ union | 1 | (union | id)
Of course, with no time-invariant variables, you need no
| operators at
As a convenience,
wbm does the heavy lifting for specifying the
within-between model correctly. Of course, as a side effect it only
takes a few easy tweaks to specify the model slightly differently. You
can change this behavior with the
By default, the argument is
This means, for each time-varying predictor, you have two types of
variables in the model. The "between" effect is represented by the
individual-level mean for each entity (e.g., each respondent to a panel
survey). The "within" effect is represented by each wave's measure with
the individual-level mean subtracted. Some refer to this as "de-meaning."
Thinking in a Hausman test framework --- with the within-between model as
described here --- you should expect the within and between
coefficients to be the same if a random effects model were appropriate.
The contextual model is very similar (use argument
some situations, this will be more intuitive to interpret. Empirically,
the only difference compared to the within-between specification is that
the contextual model does not subtract the individual-level means from the
wave-level measures. This also changes the interpretation of the
between-subject coefficients: In the contextual model, they are the
difference between the within and between effects. If there's no
difference between within and between effects, then, the coefficients will
To fit a random effects model, use either
involves no de-meaning and no individual-level means whatsoever.
To fit a fixed effects model, use either
between-subjects terms in the formula will be ignored. The time-varying
variables will be de-meaned, but the individual-level mean is not included
in the model.
Another option is what I'm calling
"stability", which is a non-standard
term. This is another convenience, this being one you could do yourself
through the formula syntax. The idea is that while the within effect and
predicting change is great, sometimes you want to really drill down on how
people that are generally high or low on the construct differ from each
other. The "stability" specification creates interaction terms with the
individual level means and the time variable, giving you something like a
growth curve model but with the particular question of whether the growth
trend depends on the average level of a time-varying variable. This can be
particularly informative when you are concerned that your time-varying
variable changes so infrequently that there just isn't enough variation
to glean anything from the within effect.
Allison, P. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications. https://doi.org/10.4135/9781412993869.d33
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3, 133–153. https://doi.org/10.1017/psrm.2014.7
Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. https://doi.org/10.1146/annurev.psych.093008.100356
wbm_stan() for a Bayesian estimation option.
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks | blk + fem | blk * lag(union), data = wages) summary(model)#> MODEL INFO: #> Entities: 595 #> Time periods: 2-7 #> Dependent variable: lwage #> Model type: Linear mixed effects #> Specification: within-between #> #> MODEL FIT: #> AIC = 1386.31, BIC = 1448.11 #> Pseudo-R² (fixed effects) = 0.13 #> Pseudo-R² (total) = 0.74 #> Entity ICC = 0.7 #> #> WITHIN EFFECTS: #> | | Est. | S.E. | t val. | d.f. | p | #> |:---------------|------:|-----:|-------:|--------:|-----:| #> | lag(union) | 0.06 | 0.03 | 2.28 | 2972.01 | 0.02 | #> | wks | -0.00 | 0.00 | -1.51 | 2994.31 | 0.13 | #> #> BETWEEN EFFECTS: #> | | Est. | S.E. | t val. | d.f. | p | #> |:----------------------|------:|-----:|-------:|-------:|-----:| #> | (Intercept) | 6.60 | 0.23 | 28.53 | 589.99 | 0.00 | #> | imean(lag(union)) | -0.03 | 0.03 | -0.80 | 589.98 | 0.42 | #> | imean(wks) | 0.00 | 0.00 | 0.91 | 589.99 | 0.36 | #> | blk | -0.23 | 0.06 | -3.85 | 589.98 | 0.00 | #> | fem | -0.44 | 0.05 | -8.89 | 589.98 | 0.00 | #> #> CROSS-LEVEL INTERACTIONS: #> | | Est. | S.E. | t val. | d.f. | p | #> |:-------------------|------:|-----:|-------:|--------:|-----:| #> | lag(union):blk | -0.13 | 0.12 | -1.03 | 2971.99 | 0.31 | #> #> p values calculated using Satterthwaite d.f. #> #> RANDOM EFFECTS: #> | Group | Parameter | Std. Dev. | #> |:--------:|:-----------:|:---------:| #> | id | (Intercept) | 0.354 | #> | Residual | | 0.2326 |