This function allows you to define a minimum number of waves/periods and exclude all individuals with fewer observations than that.
Arguments
- data
A
panel_data()frame.- ...
Optionally, unquoted variable names/expressions separated by commas to be passed to
dplyr::select(). Otherwise, all columns are included ifformulaandvarsare also NULL.- formula
A formula, like the one you'll be using to specify your model.
- vars
As an alternative to formula, a vector of variable names.
- min.waves
What is the minimum number of observations to be kept? Default is
"all", but it can be any number.
Details
If ... (that is, unquoted variable name(s)) are included, then formula
and vars are ignored. Likewise, formula takes precedence over vars.
These are just different methods for selecting variables and you can choose
whichever you prefer/are comfortable with. ... corresponds with the
"tidyverse" way, formula is useful for programming or working with
model formulas, and vars is a "standard" evaluation method for when you
are working with strings.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
complete_data(wages, wks, lwage, min.waves = 3)
#> # Panel data: 4,165 × 14
#> # Entities: id [595]
#> # Wave variable: t [1, 2, 3, ... (7 waves)]
#> id t exp wks occ ind south smsa ms fem union ed blk
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 3 32 0 0 1 0 1 0 0 9 0
#> 2 1 2 4 43 0 0 1 0 1 0 0 9 0
#> 3 1 3 5 40 0 0 1 0 1 0 0 9 0
#> 4 1 4 6 39 0 0 1 0 1 0 0 9 0
#> 5 1 5 7 42 0 1 1 0 1 0 0 9 0
#> 6 1 6 8 35 0 1 1 0 1 0 0 9 0
#> 7 1 7 9 32 0 1 1 0 1 0 0 9 0
#> 8 2 1 30 34 1 0 0 0 1 0 0 11 0
#> 9 2 2 31 27 1 0 0 0 1 0 0 11 0
#> 10 2 3 32 33 1 1 0 0 1 0 1 11 0
#> # ℹ 4,155 more rows
#> # ℹ 1 more variable: lwage <dbl>