Fit "withinbetween" and several other regression variants for panel data in a multilevel modeling framework.
wbm( formula, data, id = NULL, wave = NULL, model = "wb", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, family = gaussian, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, pR2 = TRUE, pvals = TRUE, t.df = "Satterthwaite", weights = NULL, offset = NULL, interaction.style = c("doubledemean", "demean", "raw"), scale = FALSE, scale.response = FALSE, n.sd = 1, dt_random = dt.random, dt_order = dt.order, balance_correction = balance.correction, ... )
formula  Model formula. See details for crucial
info on 

data  The data, either a 
id  If 
wave  If 
model  One of 
detrend  Adjust withinsubject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). 
use.wave  Should the wave be included as a predictor? Default is FALSE. 
wave.factor  Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. 
min.waves  What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is 
family  Use this to specify GLM link families. Default is 
balance.correction  Correct betweensubject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. 
dt.random  Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. 
dt.order  If detrending using 
pR2  Calculate a pseudo Rsquared? Default is TRUE, but in some cases may cause errors or add computation time. 
pvals  Calculate p values? Default is TRUE but for some complex
linear models, this may take a long time to compute using the 
t.df  For linear models only. User may choose the method for
calculating the degrees of freedom in ttests. Default is

weights  If using weights, either the name of the column in the data that contains the weights or a vector of the weights. 
offset  this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be 
interaction.style  The best way to calculate interactions in within
models is in some dispute. The conventional way ( 
scale  If 
scale.response  Should the response variable also be rescaled? Default
is 
n.sd  How many standard deviations should you divide by for standardization? Default is 1, though some prefer 2. 
dt_random  Deprecated. Equivalent to 
dt_order  Deprecated. Equivalent to 
balance_correction  Deprecated. Equivalent to 
...  Additional arguments provided to 
A wbm
object, which inherits from merMod
.
Formula syntax
The withinbetween models, and multilevel panel models more generally,
distinguish between timevarying and timeinvariant predictors. These are,
as they sound, variables that are either measured repeatedly (in every wave)
in the case of timevarying predictors or only once in the case of
timeinvariant predictors. You need to specify these separately in the
formula to tell the model which variables you expect to change over time and
which will not. The primary way of doing so is via the 
operator.
As an example, we can look at the WageData included in this
package. We will create a model that predicts the logarithm of the
individual's wages (lwage
) with their union status (union
), which can
change over time, and their race (blk
; dichotomized as black or
nonblack),
which does not change throughout the period of study. Our formula will look
like this:
lwage ~ union  blk
Put timevarying variables before the first 
and timeinvariant
variables afterwards. You can specify lags like lag(union)
for timevarying
variables; for more than 1 lag, include the number: lag(union, 2)
.
After the first 
go the timeinvariant variables. Note that if you put a
timevarying variable here, what you get is the observed value rather than
one adjusted to isolate withinentity effects. You may also take a
timevarying variable  let's say weeks worked (wks
)  and use
imean(wks)
to include the individual's mean across all waves as a
predictor while omitting the perwave measures.
There is also a place for a second 
. Here you can specify crosslevel
interactions (withinlevel interactions can be specified here as well).
If I wanted the interaction term for union
and blk
 to see whether
the effect of union status depended on one's race  I would specify the
formula this way:
lwage ~ union  blk  union * blk
Another use for the postsecond 
section of the formula is for changing
the random effects specification. By default, only a random intercept is
specified in the call to lme4::lmer()
/lme4::glmer()
. If you would like
to specify other random slopes, include them here using the typical lme4
syntax:
lwage ~ union  blk  (union  id)
You can also include the wave variable in a random effects term to specify a latent growth curve model:
lwage ~ union  blk + t  (t  id)
One last thing to know: If you want to use the second 
but not the first,
put a 1 or 0 after the first, like this:
lwage ~ union  1  (union  id)
Of course, with no timeinvariant variables, you need no 
operators at
all.
Models
As a convenience, wbm
does the heavy lifting for specifying the
withinbetween model correctly. As a side effect it only
takes a few easy tweaks to specify the model slightly differently. You
can change this behavior with the model
argument.
By default, the argument is "wb"
(equivalently, "withinbetween"
).
This means, for each timevarying predictor, you have two types of
variables in the model. The "between" effect is represented by the
individuallevel mean for each entity (e.g., each respondent to a panel
survey). The "within" effect is represented by each wave's measure with
the individuallevel mean subtracted. Some refer to this as "demeaning."
Thinking in a Hausman test framework  with the withinbetween model as
described here  you should expect the within and between
coefficients to be the same if a random effects model were appropriate.
The contextual model is very similar (use argument "contextual"
). In
some situations, this will be more intuitive to interpret. Empirically,
the only difference compared to the withinbetween specification is that
the contextual model does not subtract the individuallevel means from the
wavelevel measures. This also changes the interpretation of the
betweensubject coefficients: In the contextual model, they are the
difference between the within and between effects. If there's no
difference between within and between effects, then, the coefficients will
be 0.
To fit a random effects model, use either "between"
or "random"
. This
involves no demeaning and no individuallevel means whatsoever.
To fit a fixed effects model, use either "within"
or "fixed"
. Any
betweensubjects terms in the formula will be ignored. The timevarying
variables will be demeaned, but the individuallevel mean is not included
in the model.
Allison, P. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications. https://doi.org/10.4135/9781412993869.d33
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of timeseries crosssectional and panel data. Political Science Research and Methods, 3, 133–153. https://doi.org/10.1017/psrm.2014.7
Curran, P. J., & Bauer, D. J. (2011). The disaggregation of withinperson and betweenperson effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. https://doi.org/10.1146/annurev.psych.093008.100356
Giesselmann, M., & SchmidtCatran, A. (2018). Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
Schunck, R., & Perales, F. (2017). Within and betweencluster effects in
generalized linear mixed models: A discussion of approaches and the
xthybrid
command. The Stata Journal, 17, 89–115.
https://doi.org/10.1177/1536867X1701700106
wbm_stan()
for a Bayesian estimation option.
data("WageData") wages < panel_data(WageData, id = id, wave = t) model < wbm(lwage ~ lag(union) + wks  blk + fem  blk * lag(union), data = wages) summary(model)#> MODEL INFO: #> Entities: 595 #> Time periods: 27 #> Dependent variable: lwage #> Model type: Linear mixed effects #> Specification: withinbetween #> #> MODEL FIT: #> AIC = 1386.31, BIC = 1448.11 #> PseudoR² (fixed effects) = 0.13 #> PseudoR² (total) = 0.74 #> Entity ICC = 0.7 #> #> WITHIN EFFECTS: #>  #> Est. S.E. t val. d.f. p #>       #> lag(union) 0.06 0.03 2.28 2972.01 0.02 #> wks 0.00 0.00 1.51 2994.31 0.13 #>  #> #> BETWEEN EFFECTS: #>  #> Est. S.E. t val. d.f. p #>       #> (Intercept) 6.60 0.23 28.53 589.99 0.00 #> imean(lag(union)) 0.03 0.03 0.80 589.98 0.42 #> imean(wks) 0.00 0.00 0.91 589.99 0.36 #> blk 0.23 0.06 3.85 589.98 0.00 #> fem 0.44 0.05 8.89 589.98 0.00 #>  #> #> CROSSLEVEL INTERACTIONS: #>  #> Est. S.E. t val. d.f. p #>       #> lag(union):blk 0.13 0.12 1.03 2971.99 0.31 #>  #> #> p values calculated using Satterthwaite d.f. #> #> RANDOM EFFECTS: #>  #> Group Parameter Std. Dev. #>    #> id (Intercept) 0.354 #> Residual 0.2326 #> 