blueprintr uses code inspection to identify and trace dataset dependencies. These macro functions signal a dependency to blueprintr and evaluate to symbols to be analyzed in the drake plan.

.TARGET(bp_name, .env = parent.frame())

.BLUEPRINT(bp_name, .env = parent.frame())

.META(bp_name, .env = parent.frame())

.SOURCE(dat_name)

mark_source(dat)

Arguments

bp_name

Character string of blueprint's name

.env

The environment in which to evaluate the macro. For internal use only!

dat_name

Character string of an object's name, used exclusively for marking "sources"

dat

A data.frame-like object

Functions

  • .TARGET(): Gets symbol of built and checked data

  • .BLUEPRINT(): Gets symbol of blueprint reference in plan

  • .META(): Gets symbol of metadata reference in plan

  • .SOURCE(): Gets a symbol for an object intended to be a "data source"

  • mark_source(): Mark an data.frame-like object as a source table

When to use

Generally speaking, the .BLUEPRINT and .META macros should be used for check functions, which frequently require context, e.g. in the form of configuration from the blueprint or coding expectations from the metadata. .TARGET is primarily used in blueprint commands, but there could be situations where a check depends on the content of another dataset.

It is important to note that the symbols generated by these macros are only understood in the context of a drake plan. The targets associated with the symbols are generated when blueprints are attached to a plan.

Sources

Sources are an ability to add variable UUIDs to objects that are not constructed using blueprints. This is often the case if the sourced table derives from some intermittent HTTP query or a file from disk. Blueprints have limited capability of configuring the underlying target behavior during the _initial phase, so often it is easier to do that sort of fetching and pre-processing before using blueprints. However, you lose the benefit of variable lineage when you don't use blueprints. "Sources" are simply data.frame-like objects that have the ".uuid" attribute for each variable so that variable lineage can cover the full data lifetime. Use blueprintr::mark_source() to add the UUID attributes, and then use .SOURCE() in the blueprints so lineage can be captured

Examples

.TARGET("example_dataset")
#> example_dataset
.BLUEPRINT("example_dataset")
#> example_dataset_blueprint
.META("example_dataset")
#> example_dataset_meta

blueprint(
  "test_bp",
  description = "Blueprint with dependencies",
  command =
    .TARGET("parent1") %>%
      left_join(.TARGET("parent2"), by = "id") %>%
      filter(!is.na(id))
)
#> <blueprint: 'test_bp'>
#> 
#> Description: Blueprint with dependencies
#> Annotations: DISABLED
#> Metadata location: '/home/runner/work/blueprintr/blueprintr/blueprints/test_bp.csv'
#> 
#> -- Command --
#> Workflow command:
#> parent1 %>% left_join(parent2, by = "id") %>% filter(!is.na(id))
#> 
#> Raw command:
#> .TARGET("parent1") %>% left_join(.TARGET("parent2"), by = "id") %>% 
#>     filter(!is.na(id))