DAGassist()
validates a DAG + model specification, classifies node roles,
builds minimal and canonical adjustment sets, fits comparable models, and
renders a compact report in several formats (console, LaTeX fragment, DOCX,
XLSX, plain text). It also supports passing a single engine call (e.g.
feols(Y ~ X + Z | fe, data = df)
) instead of a plain formula.
Arguments
- dag
A dagitty object (see
dagitty::dagitty()
).- formula
Either (a) a standard model formula
Y ~ X + ...
, or (b) a single engine call such asfeols(Y ~ X + Z | fe, data = df, ...)
. When an engine call is provided,engine
,data
, and extra arguments are automatically extracted from the call.- data
A
data.frame
(or compatible, e.g. tibble). Optional if supplied via the engine call informula
.- exposure
Optional character scalar; if missing/empty, inferred from the DAG (must be unique).
- outcome
Optional character scalar; if missing/empty, inferred from the DAG (must be unique).
- engine
Modeling function, default stats::lm. Ignored if
formula
is a single engine call (in that case the function is taken from the call).- labels
Optional variable labels (named character vector or data.frame).
- verbose
Logical (default
TRUE
). Controls verbosity in the console printer (formulas + notes).- type
Output type. One of
"console"
(default),"latex"
/"docx"
/"word"
,"excel"
/"xlsx"
,"text"
/"txt"
.- out
Output file path for the non-console types:
type="latex"
: a LaTeX fragment written toout
(must end with.tex
).type="docx"
/"word"
: a Word (.docx) file written toout
.type="excel"
/"xlsx"
: an Excel (.xlsx) file written toout
.type="text"
/"txt"
: a plain-text file written toout
. Ignored fortype="console"
.
- imply
Logical; default
FALSE
. Evaluation scope.If
FALSE
(default): restrict DAG evaluation to variables named in the formula (prune the DAG to exposure, outcome, and RHS terms). Roles/sets/bad-controls are computed on this pruned graph, and the roles table only shows those variables. This is most useful if you want to refine your specific call.If
TRUE
: evaluate on the full DAG and allow DAG-implied controls in the minimal/canonical sets; roles table shows all nodes. This is most useful if you want to refine your overall control variable selection.
- omit_intercept
Logical; drop intercept rows from the model comparison (default
TRUE
).- omit_factors
Logical; drop factor-level rows from the model comparison (default
TRUE
).- engine_args
Named list of extra arguments forwarded to
engine(...)
. Ifformula
is an engine call, arguments from the call are merged withengine_args
(call values take precedence).
Value
An object of class "DAGassist_report"
, invisibly for file outputs,
and printed for type="console"
. The list contains:
validation
- result fromvalidate_spec(...)
which verifies acyclicity and X/Y declarations.roles
- raw roles data.frame fromclassify_nodes(...)
(logic columns).roles_display
- roles grid after labeling/renaming for exporters.bad_in_user
- variables in the user's RHS that areMED
/COL
/IO
/DMed
/DCol
.controls_minimal
- (legacy) one minimal set (character vector).controls_minimal_all
- list of all minimal sets (character vectors).controls_canonical
- canonical set (character vector; may be empty).formulas
- list withoriginal
,minimal
,minimal_list
,canonical
.models
- list with fitted modelsoriginal
,minimal
,minimal_list
,canonical
.verbose
,imply
- flags as provided.
Details
Engine-call parsing. If formula
is a call (e.g., feols(Y ~ X | fe, data=df)
),
DAGassist extracts the engine function, formula, data argument, and any additional
engine arguments directly from that call; these are merged with engine
/engine_args
you pass explicitly (call arguments win).
Fixest tails. For engines like fixest that use |
to denote FE/IV parts,
DAGassist preserves any | ...
tail when constructing minimal/canonical formulas
(e.g., Y ~ X + controls | fe | iv(...)
).
Roles grid. The roles table displays short headers:
X
(exposure),Y
(outcome),CON
(confounder),MED
(mediator),COL
(collider),IO
(intermediate outcome = proper descendant ofY
),DMed
(proper descendant of any mediator),DCol
(proper descendant of any collider). Descendants are proper (exclude the node itself) and can be any distance downstream. The internalis_descendant_of_exposure
is retained for logic but hidden in displays.
Bad controls. For total-effect estimation, DAGassist flags as bad controls
any variables that are MED
, COL
, IO
, DMed
, or DCol
. These are warned in
the console and omitted from the model-comparison table. Valid confounders (pre-treatment)
are eligible for minimal/canonical adjustment sets.
Output types.
console
prints roles, sets, formulas (ifverbose
), and a compact model comparison with{modelsummary}
if available (falls back gracefully otherwise).latex
writes a LaTeX fragment you can\\input{}
into a paper.docx
/word
writes a Word doc (usesoptions(DAGassist.ref_docx=...)
if set).excel
/xlsx
writes an Excel workbook with tidy tables.text
/txt
writes a plain-text report for logs/notes.
Dependencies. Core requires {dagitty}
. Optional enhancements: {modelsummary}
(pretty tables), {broom}
(fallback tidying), {rmarkdown}
+ Pandoc (DOCX),
{writexl}
(XLSX).
Interpreting the output
ROLES. Variables in your formula are classified by DAG-based causal role:
X
- treatment / exposure.Y
- outcome / dependent variable.CON
- confounder (common cause ofX
andY
); adjust for these.MED
- mediator (on a path fromX
toY
); do not adjust when estimating total effects.COL
- collider (direct descendant ofX
andY
); adjusting opens a spurious path, so do not adjust.IO
- intermediate outcome (descendant ofY
); do not adjust.DMed
- descendant of a mediator; do not adjust when estimating total effects.DCol
- descendant of a collider; adjusting opens a spurious path, so do not adjust.other
- safe, non-confounding predictors (e.g., affectY
only). Included in the canonical model but omitted from the minimal set because they're not required for identification.
MODEL COMPARISON.
Minimal - the smallest adjustment set that blocks all back-door paths (confounders only).
Canonical - the largest permissible set: includes all controls that are not
MED
,COL
,IO
,DMed
, orDCol
.other
variables may appear here.
Errors and edge cases
If exposure/outcome cannot be inferred uniquely, the function stops with a clear message.
Fitting errors (e.g., FE collinearity) are captured and displayed in comparisons without aborting the whole pipeline.
See also
print.DAGassist_report()
for the console printer, and the helper
exporters in report_*
modules.
Examples
# generate a console DAGassist report
DAGassist(dag = g, formula = lm(Y ~ X + Z + C + M, data = df))
#> DAGassist Report:
#>
#> Roles:
#> variable role X Y conf med col IO dMed dCol
#> X exposure x
#> Y outcome x x
#> Z confounder x
#> M mediator x
#> C collider x x x
#>
#> (!) Bad controls in your formula: {C, M}
#> Minimal controls 1: {Z}
#> Canonical controls: {Z}
#>
#> Formulas:
#> original: Y ~ X + Z + C + M
#>
#> Model comparison:
#>
#> +---+----------+-----------+-----------+
#> | | Original | Minimal 1 | Canonical |
#> +===+==========+===========+===========+
#> | X | 0.467*** | 1.306*** | 1.306*** |
#> +---+----------+-----------+-----------+
#> | | (0.122) | (0.098) | (0.098) |
#> +---+----------+-----------+-----------+
#> | Z | 0.185+ | 0.235+ | 0.235+ |
#> +---+----------+-----------+-----------+
#> | | (0.102) | (0.127) | (0.127) |
#> +---+----------+-----------+-----------+
#> | C | 0.368*** | | |
#> +---+----------+-----------+-----------+
#> | | (0.076) | | |
#> +---+----------+-----------+-----------+
#> | M | 0.512*** | | |
#> +---+----------+-----------+-----------+
#> | | (0.077) | | |
#> +===+==========+===========+===========+
#> | + p < 0.1, * p < 0.05, ** p < 0.01, |
#> | *** p < 0.001 |
#> +===+==========+===========+===========+
# generate a LaTeX DAGassist report
# \donttest{
DAGassist(dag = g, formula = lm(Y ~ X + Z + C + M, data = df),
type = "latex", out = file.path(tempdir(), "frag.tex"))
# }