Homogenize all waves to consistent structure — homogenize

Once all waves are collected into a single unhomogenized_panel object, this will homogenize variable names and, where applicable, categorical codings according to a panel mapping.

homogenize_panel(panel, mapping = NULL, ...)

bind_waves(panel, allow_issues = FALSE, ...)

Arguments

panel: An unhomogenized panel
mapping: A panel mapping. If NULL, a panel mapping must be attached to the panel object using add_mapping()
...: Parameters to be used for context, usually for defining a panel schema
allow_issues: If TRUE, will allow waves to be bound together even if there are identified issues. Use caution with this!

Value

An unhomogenized_panel that is ready to be homogenized using bind_waves()

Functions

bind_waves(): Bind waves into a homogenized panel after successful homogenization

Homogenization Steps

The first part of the homogenization process is to harmonize wave variable names to the homogenized name. If either there are missing wave variable names and a provided homogenized name or provided variable names and a missing homogenized name, an error will be thrown. The original version of panelcleaner included a notion of "issues" that would allow harmonization with errors, but after continued practice of using panelcleaner, this behavior is deprecated in favor of halting the harmonization process altogether.

The next step is to harmonize the codings for categorical data. As panelcleaner was intended to be used in a data processing pipeline before analysis was conducted in Stata, the desired behavior of panelcleaner is to separate values from labels, unlike R's factor class. Codings are written using rcoder::coding(). The harmonization process is similar to names: errors will be thrown if wave codings and homogenized codings aren't both present or missing, and the codings in all waves will be recoded to the homogenized coding.

Descriptions

The last step is to harmonize variable descriptions. This part is optional. It will only happen if the homogenized_description (or custom name specified with a custom panelcleaner schema) is present. The same types will occur for descriptions. The only thing different about harmonizing descriptions is that it doesn't affect the data: it operates by assigning the bpr.description attribute for variables. This feature is really only useful if you intend you data to be used in a blueprintr project.

Extra Parameters

In some cases the default behavior of panelcleaner is too restrictive, especially during the beginning of data collection. Often, APIs or general data exports don't include variables that don't have any submissions yet, but you still want to keep those variables in your input data. These parameters lift some restrictions on panelcleaner's behavior:

drop_na_homogenized: If TRUE, any NA entries in the homogenized_name column will be ignored, as if the row in the panel mapping doesn't exist.
ignored_missing_codings: If TRUE, waves with NA codings but with non-NA homogenized codings will not have their values homogenized.
ignored_missing_homogenized_codings: If TRUE, any variables that have defined wave codings but no homogenized coding will not have their codings homogenized.
error_missing_raw_variables: If FALSE, raw variables that should be present in the data, given the panel mapping, but aren't will not throw an error. Instead, they'll be added to the list of issues.
replace_missing_with_na: If TRUE, raw_variables that should be present in the data, given the panel mapping, but are not will be created and filled with NA values. A message will be displayed of all the variables where this action was applied. This value supersedes error_missing_raw_variables.