This function allows you to define a minimum number of waves/periods and exclude all individuals with fewer observations than that.

complete_data(data, ..., formula = NULL, vars = NULL,
  min.waves = "all")

Arguments

data

A panel_data() frame.

...

Optionally, unquoted variable names/expressions separated by commas to be passed to dplyr::select(). Otherwise, all columns are included if formula and vars are also NULL.

formula

A formula, like the one you'll be using to specify your model.

vars

As an alternative to formula, a vector of variable names.

min.waves

What is the minimum number of observations to be kept? Default is "all", but it can be any number.

Value

A panel_data frame.

Details

If ... (that is, unquoted variable name(s)) are included, then formula and vars are ignored. Likewise, formula takes precedence over vars. These are just different methods for selecting variables and you can choose whichever you prefer/are comfortable with. ... corresponds with the "tidyverse" way, formula is useful for programming or working with model formulas, and vars is a "standard" evaluation method for when you are working with strings.

Examples

data("WageData") wages <- panel_data(WageData, id = id, wave = t) complete_data(wages, wks, lwage, min.waves = 3)
#> # Panel data: 4,165 x 14 #> # entities: id [595] #> # wave variable: t [1, 2, 3, ... (7 waves)] #> id t exp wks occ ind south smsa ms fem union ed blk #> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 1 3 32 0 0 1 0 1 0 0 9 0 #> 2 1 2 4 43 0 0 1 0 1 0 0 9 0 #> 3 1 3 5 40 0 0 1 0 1 0 0 9 0 #> 4 1 4 6 39 0 0 1 0 1 0 0 9 0 #> 5 1 5 7 42 0 1 1 0 1 0 0 9 0 #> 6 1 6 8 35 0 1 1 0 1 0 0 9 0 #> 7 1 7 9 32 0 1 1 0 1 0 0 9 0 #> 8 2 1 30 34 1 0 0 0 1 0 0 11 0 #> 9 2 2 31 27 1 0 0 0 1 0 0 11 0 #> 10 2 3 32 33 1 1 0 0 1 0 1 11 0 #> # ... with 4,155 more rows, and 1 more variable: lwage <dbl>