blueprintr
uses code inspection to identify and trace dataset dependencies.
These macro functions signal a dependency to blueprintr
and evaluate to
symbols to be analyzed in the drake
plan.
.TARGET(bp_name, .env = parent.frame())
.BLUEPRINT(bp_name, .env = parent.frame())
.META(bp_name, .env = parent.frame())
.SOURCE(dat_name)
mark_source(dat)
.TARGET()
: Gets symbol of built and checked data
.BLUEPRINT()
: Gets symbol of blueprint reference in plan
.META()
: Gets symbol of metadata reference in plan
.SOURCE()
: Gets a symbol for an object intended to be a
"data source"
mark_source()
: Mark an data.frame-like object as a source table
Generally speaking, the .BLUEPRINT
and .META
macros should be used for
check functions, which frequently require context, e.g. in the form of
configuration from the blueprint or coding expectations from the metadata.
.TARGET
is primarily used in blueprint commands, but there could be
situations where a check depends on the content of another dataset.
It is important to note that the symbols generated by these macros are only
understood in the context of a drake
plan. The targets associated with the
symbols are generated when blueprints are attached to a plan.
Sources are an ability to add variable UUIDs to objects that are not constructed
using blueprints. This is often the case if the sourced table derives from some
intermittent HTTP query or a file from disk. Blueprints have limited capability
of configuring the underlying target behavior during the _initial
phase, so often
it is easier to do that sort of fetching and pre-processing before using blueprints.
However, you lose the benefit of variable lineage when you don't use blueprints.
"Sources" are simply data.frame-like objects that have the ".uuid" attribute for each
variable so that variable lineage can cover the full data lifetime. Use blueprintr::mark_source()
to add the UUID attributes, and then use .SOURCE()
in the blueprints so lineage
can be captured
.TARGET("example_dataset")
#> example_dataset
.BLUEPRINT("example_dataset")
#> example_dataset_blueprint
.META("example_dataset")
#> example_dataset_meta
blueprint(
"test_bp",
description = "Blueprint with dependencies",
command =
.TARGET("parent1") %>%
left_join(.TARGET("parent2"), by = "id") %>%
filter(!is.na(id))
)
#> <blueprint: 'test_bp'>
#>
#> Description: Blueprint with dependencies
#> Annotations: DISABLED
#> Metadata location: '/home/runner/work/blueprintr/blueprintr/blueprints/test_bp.csv'
#>
#> -- Command --
#> Workflow command:
#> parent1 %>% left_join(parent2, by = "id") %>% filter(!is.na(id))
#>
#> Raw command:
#> .TARGET("parent1") %>% left_join(.TARGET("parent2"), by = "id") %>%
#> filter(!is.na(id))