This function takes wide format panels as input and converts them to long format.
long_panel(data, prefix = NULL, suffix = NULL, begin = NULL, end = NULL, id = "id", wave = "wave", periods = NULL, label_location = c("end", "beginning"), as_panel_data = TRUE, match = ".*", use.regex = FALSE, check.varying = TRUE)
The wide data frame.
What character(s) go before the period indicator? If none, set this argument to NULL.
What character(s) go after the period indicator? If none, set this argument to NULL.
What is the label for the first period? Could be
What is the label for the final period? Could be
The name of the ID variable as a string. If there is no ID variable, then this will be the name of the newly-created ID variable.
This will be the name of the newly-created wave variable.
If you period indicator does not lie in a sequence or is
not understood by the function, then you can supply them as a vector
instead. For instance, you could give
Where does the period label go on the variable?
If the variables are labeled like
Should the return object be a
The regex that will match the part of the variable names other
than the wave indicator. By default it will match any character any
amount of times. Sometimes you might know that the variable names should
start with a digit, for instance, and you might use
Should the function check to make sure that every variable in the wide data with a wave indicator is actually time-varying? Default is TRUE, meaning that a constant like "race_W1" only measured in wave 1 will be defined in each wave in the long data. With very large datasets, however, sometimes setting this to FALSE can save memory.
There is no easy way to convert panel data from wide to long format because the both formats are basically non-standard for other applications. This function can handle the common case in which the wide data frame has a regular labeling system for each period. The key thing is providing enough information for the function to understand the pattern.
In the end, this function calls
stats::reshape() but should be easier
to use and able to handle more situations, such as when the label occurs
at the beginning of the variable name. Also, just as important, this
function has built-in utilities to handle unbalanced data --- when
variables occur more than once but every single period, which breaks
## We need a wide data frame, so we will make one from the long-format ## data included in the package. # Convert WageData to panel_data object wages <- panel_data(WageData, id = id, wave = t) # Convert wages to wide format wide_wages <- widen_panel(wages) # Note: wide_wages has variables in the following format: # var1_1, var1_2, var1_3, var2_1, var2_2, var2_3, etc. long_wages <- long_panel(wide_wages, prefix = "_", begin = 1, end = 7, id = "id", label_location = "end") # Note that in this case, the prefix and label_location arguments are # the defaults but are included just for clarity.