blueprint(
"ch_scales",
description = "Self-Regulation & Self-Regulated Learning",
command =
.TARGET("verified_child_data") %>%
select(
unique_id,
c_id_01,starts_with("c_sr_"),
starts_with("c_srl_")
%>%
) shorten_domain_prefixes() %>%
enumerator_regulation_score() %>%
basic_number_renaming() %>%
drop_underscore_in_vars(
c("sr", "srl"),
"^.*{var}_(\\d+)r?$"
) )
Project Structure
The Data Team uses a variety of technologies, mainly in the R ecosystem, to create its data projects. Each project has a similar file structure to maintain consistency and replicability across projects:
.github/ # All Github configuration [optional]
workflows/ # All CI workflow definition files [optional]
blueprints/ # Contains all "blueprints" of datasets
codebooks/ # Contains exported codebooks of select datasets, depending on blueprint definition
config/
environment.R # Definitions of all environment variables used in this project
packages.R # Any `library()` for packages to be available across the project
R/ # All definitions for custom functions employed in the pipeline
.Rprofile # Supplemental file that primarily sets up `renv`
_targets.R # The main workflow orchestration definition file
renv.lock # Package dependency state capture file for `renv`
What’s important note from the outset is that data are not inside in these projects. Each project is versioned with git and hosted on TIES’ GitHub organization page. Most of our data are sensitive to some degree, so our operational practice is to load data directly via APIs or read from NYU Box.
Project templates can be created with the internal tool dtproj.
blueprints
The Data Team uses its blueprintr
package to build, test, and document datasets. blueprintr
is akin to dbt, but it is designed to manage a whole host of metadata, a necessary task for dissemination of project findings and data publication. Moreover, blueprintr
operates without a connection to a data warehouse, given the assumptions of low connectivity and technical availability that the Data Team operates in.
Each blueprint is a pair of two files:
- The blueprint definition file: an R script with a single
blueprint()
command. This file details how to generate the desired dataset and optionally includes arbitrary metadata at dataset/table level. - The blueprint metadata file: a CSV file with, at minimum, the columns
name
,type
, anddescription
. This file enumerates the variable-level metadata.
Here is an example blueprint definition file:
And here is an example metadata CSV for the same blueprint:
name | type | title | description | coding | tests | scale | section |
---|---|---|---|---|---|---|---|
unique_id | character | Survey submission ID | IDs | ||||
cid_1 | character | Child ID | has_no_duplicates() | IDs | |||
csr4 | integer | csr4 | Some kids find it easy to sit still when they are bored BUT Other kids find it hard to sit still when they are bored Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr5 | integer | csr5 | Some kids find it easy to remember what they are supposed to do BUT Other kids find it hard to remember what they are supposed to do Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr8 | integer | csr8 | Some kids find it easy to obey rules BUT Other kids find it hard to obey rules Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr10 | integer | csr10 | Some kids find it easy to be careful BUT Other kids find it hard to be careful Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr12 | integer | csr12 | Some kids find it easy to think before they act BUT Other kids find it hard to think before they act Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr13 | integer | csr13 | Some kids find it easy to pay attention to their schoolwork BUT Other kids find it hard to pay attention to their schoolwork Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr15 | integer | csr15 | Some kids find it easy to focus on things that are important BUT Other kids find it hard to focus on things that are important Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr16 | integer | csr16 | Some kids find it easy to work on a project until they are finished BUT Other kids find it hard to work on a project until they are finished Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr17 | integer | csr17 | Some kids find it easy to concentrate on one thing for a long time BUT Other kids find it hard to concentrate on one thing for a long time Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csr18 | integer | csr18 | Some kids find it easy to stay focused on their goals BUT Other kids find it hard to stay focused on their goals Are you more like the kids that find it easy or hard? | coding(code(""Really easy"", 3), code(""Kind of easy"", 2), code(""Kind of hard"", 1), code(""Really hard"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulation | Child Self-Regulation |
csrl1 | integer | csrl1 | Some kids make sure no one disturbs them when they study at home. Would you say you are like them? [pause for response]. Now that you decided that you [ARE/ARE NOT] like them. [If Yes] Are you ""A lot"" or ""Kind of "" like them? [If No] Are you ""A little"" or ""Not at all "" like them? [pause for response] | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl3 | integer | csrl3 | Some kids try to find a quiet place to study at home. Would you say you are like them? [pause for response]. Now that you decided that you [ARE/ARE NOT] like them. [If Yes] Are you ""A lot"" or ""Kind of "" like them? [If No] Are you ""A little"" or ""Not at all "" like them? [pause for response] | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl4 | integer | csrl4 | Some kids ask their friends or family for help when they are struggling with their homework. Would you say you are like them? [pause for response]. Now that you decided that you [ARE/ARE NOT] like them. [If Yes] Are you ""A lot"" or ""Kind of "" like them? [If No] Are you ""A little"" or ""Not at all "" like them? [pause for response] | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl10 | integer | csrl10 | Some kids encourage themselves or tell themselves ""you can do it"" when they are struggling with their homework. Would you say you are like them? [pause for response]. Now that you decided that you [ARE/ARE NOT] like them. [If Yes] Are you ""A lot"" or ""Kind of "" like them? [If No] Are you ""A little"" or ""Not at all "" like them? [pause for response] | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl15 | integer | csrl15 | Some kids review the instructions before starting their homework. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl16 | integer | csrl16 | Some kids try to calm down or take a deep breath when they are struggling with their homework. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl17 | integer | csrl17 | Some kids try to ""take their time"" and do their homework with patience. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl18 | integer | csrl18 | Some kids take little breaks when working on challenging homework. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl19 | integer | csrl19 | Some kids gather their notebooks or any materials they need before they start their homework. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl20 | integer | csrl20 | Some kids look for information in their notes, videos, books, internet or exercises when they are struggling with their homework. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl21 | integer | csrl21 | Some kids prepare for an exam by reviewing their notes or making study materials. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl24 | integer | csrl24 | Some kids look to see how hard their homework is before deciding whether they will work on their homework or do something fun. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
csrl22 | integer | csrl22 | Some kids prepare for an exam by doing practice tests to see where they are having trouble. Would you say you are like them? | coding(code(""A lot like them"", 3), code(""Kind of like them"", 2), code(""A little like them"", 1), code(""Not at all like them"", 0)) | in_set(c(0:3, NA)) | Child Self-Regulated Learning | Child Self-Regulated Learning |
These metadata CSVs have three required fields:
- name: The variable name
- type: The variable type (usually “character”, “integer”, “double”, or “logical”)
- description: Description of the variable content. If the dataset corresponds to a survey, this is usually the question wording in English.
There are often other columns:
- coding: If the variable is categorical, this contains the label-value mapping for the variable, written with rcoder syntax.
- tests: Any content tests on the data
- scale: If the variable belongs to a psychometric scale, the name of the scale. This is used for identifying variable groups for psychometric descriptive statistics.
- title: A shorter description for the variable, used as a variable label when exported to Stata
- section: Codebook section; if no section is assigned, the codebook will place the variable into the “Other” section
- section_description: Description for the section. Useful for providing extra context for the codebook section.
- group: Codebook subsection / variable group. Useful to have for combining a collection of variables together e.g. a scale
- group_description: Description for the variable group. Useful for adding an introductory statement asked before each question in the group.
codebooks
“Codebooks” are essentially data dictionaries, targeted for social science research. They commonly include enumerations of variables in a dataset, as well as their descriptions and (when applicable) categorical codings. Some codebooks also include methodology descriptions and other descriptive statistics of the data.
The codebooks
folder contains HTML codebook exports of selected blueprints, as indicated by the presence of blueprintr::bp_export_codebook()
in the blueprint definition file:
blueprint(
"ch_scales",
description = "Self-Regulation & Self-Regulated Learning",
command =
some_command()
|>
) bp_export_codebook()
Unless otherwise agreed upon, these codebooks are for internal purposes only. They are mainly present to support TIES’ members in their research.
config
The config
folder has two main R files:
environment.R
packages.R
Other project-specific files, like YAML configuration, may be stored in this folder.
environment.R
Sensitive information necessary for pipeline function, like API keys and passwords, must be stored as environment variables and never checked into version control. Environment variables are generally stored in a personal .Renviron
file, but it is our practice to load them into global variables at the start of the pipeline to avoid unnecessary calls to Sys.getenv()
.
Here is an example environment.R
:
<- Sys.getenv("BOX_PATH", unset = NULL)
BOX_PATH <- as.logical(Sys.getenv("F_RUN_TESTS", unset = "FALSE"))
F_RUN_TESTS <- as.logical(Sys.getenv("F_NIGHTLY", unset = "FALSE"))
F_NIGHTLY
<- Sys.getenv("CACHE_PASSPHRASE", unset = NULL) CACHE_PASSPHRASE
packages.R
This file serves two purposes:
- Capture soft dependencies in the project code (i.e. packages that are required but not directly referred to in the code)
- Attach packages via
library()
to make those packages’ exported functions available across the entire pipeline
Our project structure uses renv
to manage the specific versions of packages employed in our pipeline to improve replicability. renv
is able to capture these dependencies via code inspection; however sometimes a soft dependency (one that is not explicitly stated in the code) can occur e.g. a package depends on another for some plotting routine. To capture these dependecies, we place a reference to one of the package’s functions in packages.R
so that renv
can treat that package as a hard dependency.
Use of library()
should be restricted to packages that are used extensively. As stated in @ref(rstyle-funcs-package-deps), it is preferred to use package::func()
syntax in function writing; moreover, it is preferred the same style throughout most of the pipeline definition as well for clarity and long-term maintenance.
Example of packages.R
:
# Retain suggested packages in renv
::to_labelled
labelled::kable_as_image
kableExtra::style_dir # Format-on-save capability in VSCode
styler::run # Necessary for VSCode to work in renv projects
languageserver
# Attach packages used in the entire pipeline here
library(targets)
library(tarchetypes)
library(tidytable)
library(blueprintr)
library(rcoder)