Once all waves are collected into a single unhomogenized_panel
object, this
will homogenize variable names and, where applicable, categorical codings
according to a panel mapping.
homogenize_panel(panel, mapping = NULL, ...)
bind_waves(panel, allow_issues = FALSE, ...)
An unhomogenized panel
A panel mapping. If NULL, a panel mapping must be attached
to the panel
object using add_mapping()
Parameters to be used for context, usually for defining a panel schema
If TRUE
, will allow waves to be bound together even if
there are identified issues. Use caution with this!
An unhomogenized_panel
that is ready to be homogenized using
bind_waves()
bind_waves()
: Bind waves into a homogenized panel after
successful homogenization
The first part of the homogenization process is to harmonize wave variable names to the homogenized name. If either there are missing wave variable names and a provided homogenized name or provided variable names and a missing homogenized name, an error will be thrown. The original version of panelcleaner included a notion of "issues" that would allow harmonization with errors, but after continued practice of using panelcleaner, this behavior is deprecated in favor of halting the harmonization process altogether.
The next step is to harmonize the codings for categorical data. As panelcleaner was
intended to be used in a data processing pipeline before analysis was conducted in
Stata, the desired behavior of panelcleaner is to separate values from labels, unlike
R's factor
class. Codings are written using rcoder::coding()
. The harmonization process
is similar to names: errors will be thrown if wave codings and homogenized codings
aren't both present or missing, and the codings in all waves will be recoded to the
homogenized coding.
The last step is to harmonize variable descriptions. This part is optional. It will
only happen if the homogenized_description
(or custom name specified with a custom
panelcleaner schema) is present. The same types will occur for descriptions. The only
thing different about harmonizing descriptions is that it doesn't affect the data:
it operates by assigning the bpr.description
attribute for variables. This feature
is really only useful if you intend you data to be used in a
blueprintr project.
In some cases the default behavior of panelcleaner is too restrictive, especially during the beginning of data collection. Often, APIs or general data exports don't include variables that don't have any submissions yet, but you still want to keep those variables in your input data. These parameters lift some restrictions on panelcleaner's behavior:
drop_na_homogenized
: If TRUE
, any NA entries in the homogenized_name column will be
ignored, as if the row in the panel mapping doesn't exist.
ignored_missing_codings
: If TRUE
, waves with NA codings but with non-NA homogenized
codings will not have their values homogenized.
ignored_missing_homogenized_codings
: If TRUE
, any variables that have defined wave
codings but no homogenized coding will not have their codings homogenized.
error_missing_raw_variables
: If FALSE
, raw variables that should be present in the
data, given the panel mapping, but aren't will not throw an error. Instead, they'll be
added to the list of issues.
replace_missing_with_na
: If TRUE
, raw_variables that should be present in the data,
given the panel mapping, but are not will be created and filled with NA values. A message
will be displayed of all the variables where this action was applied. This value supersedes
error_missing_raw_variables
.