This function allows you to define a minimum number of waves/periods and exclude all individuals with fewer observations than that.
Arguments
- data
A
panel_data()
frame.- ...
Optionally, unquoted variable names/expressions separated by commas to be passed to
dplyr::select()
. Otherwise, all columns are included ifformula
andvars
are also NULL.- formula
A formula, like the one you'll be using to specify your model.
- vars
As an alternative to formula, a vector of variable names.
- min.waves
What is the minimum number of observations to be kept? Default is
"all"
, but it can be any number.
Details
If ...
(that is, unquoted variable name(s)) are included, then formula
and vars
are ignored. Likewise, formula
takes precedence over vars
.
These are just different methods for selecting variables and you can choose
whichever you prefer/are comfortable with. ...
corresponds with the
"tidyverse" way, formula
is useful for programming or working with
model formulas, and vars
is a "standard" evaluation method for when you
are working with strings.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
complete_data(wages, wks, lwage, min.waves = 3)
#> # Panel data: 4,165 × 14
#> # Entities: id [595]
#> # Wave variable: t [1, 2, 3, ... (7 waves)]
#> id t exp wks occ ind south smsa ms fem union ed blk
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 3 32 0 0 1 0 1 0 0 9 0
#> 2 1 2 4 43 0 0 1 0 1 0 0 9 0
#> 3 1 3 5 40 0 0 1 0 1 0 0 9 0
#> 4 1 4 6 39 0 0 1 0 1 0 0 9 0
#> 5 1 5 7 42 0 1 1 0 1 0 0 9 0
#> 6 1 6 8 35 0 1 1 0 1 0 0 9 0
#> 7 1 7 9 32 0 1 1 0 1 0 0 9 0
#> 8 2 1 30 34 1 0 0 0 1 0 0 11 0
#> 9 2 2 31 27 1 0 0 0 1 0 0 11 0
#> 10 2 3 32 33 1 1 0 0 1 0 1 11 0
#> # ℹ 4,155 more rows
#> # ℹ 1 more variable: lwage <dbl>