Skip to contents

DAGassist() validates a DAG + model specification, classifies node roles, builds minimal and canonical adjustment sets, fits comparable models, and renders a compact report in several formats (console, LaTeX fragment, DOCX, XLSX, plain text). It also supports passing a single engine call (e.g. feols(Y ~ X + Z | fe, data = df)) instead of a plain formula.

Usage

DAGassist(
  dag,
  formula,
  data,
  exposure,
  outcome,
  engine = stats::lm,
  labels = NULL,
  verbose = TRUE,
  type = c("console", "latex", "word", "docx", "excel", "xlsx", "text", "txt"),
  out = NULL,
  imply = FALSE,
  omit_intercept = TRUE,
  omit_factors = TRUE,
  engine_args = list()
)

Arguments

dag

A dagitty object (see dagitty::dagitty()).

formula

Either (a) a standard model formula Y ~ X + ..., or (b) a single engine call such as feols(Y ~ X + Z | fe, data = df, ...). When an engine call is provided, engine, data, and extra arguments are automatically extracted from the call.

data

A data.frame (or compatible, e.g. tibble). Optional if supplied via the engine call in formula.

exposure

Optional character scalar; if missing/empty, inferred from the DAG (must be unique).

outcome

Optional character scalar; if missing/empty, inferred from the DAG (must be unique).

engine

Modeling function, default stats::lm. Ignored if formula is a single engine call (in that case the function is taken from the call).

labels

Optional variable labels (named character vector or data.frame).

verbose

Logical (default TRUE). Controls verbosity in the console printer (formulas + notes).

type

Output type. One of "console" (default), "latex"/"docx"/"word", "excel"/"xlsx", "text"/"txt".

out

Output file path for the non-console types:

  • type="latex": a LaTeX fragment written to out (must end with .tex).

  • type="docx"/"word": a Word (.docx) file written to out.

  • type="excel"/"xlsx": an Excel (.xlsx) file written to out.

  • type="text"/"txt": a plain-text file written to out. Ignored for type="console".

imply

Logical; default FALSE. Evaluation scope.

  • If FALSE (default): restrict DAG evaluation to variables named in the formula (prune the DAG to exposure, outcome, and RHS terms). Roles/sets/bad-controls are computed on this pruned graph, and the roles table only shows those variables. This is most useful if you want to refine your specific call.

  • If TRUE: evaluate on the full DAG and allow DAG-implied controls in the minimal/canonical sets; roles table shows all nodes. This is most useful if you want to refine your overall control variable selection.

omit_intercept

Logical; drop intercept rows from the model comparison (default TRUE).

omit_factors

Logical; drop factor-level rows from the model comparison (default TRUE).

engine_args

Named list of extra arguments forwarded to engine(...). If formula is an engine call, arguments from the call are merged with engine_args (call values take precedence).

Value

An object of class "DAGassist_report", invisibly for file outputs, and printed for type="console". The list contains:

  • validation - result from validate_spec(...) which verifies acyclicity and X/Y declarations.

  • roles - raw roles data.frame from classify_nodes(...) (logic columns).

  • roles_display - roles grid after labeling/renaming for exporters.

  • bad_in_user - variables in the user's RHS that are MED/COL/IO/DMed/DCol.

  • controls_minimal - (legacy) one minimal set (character vector).

  • controls_minimal_all - list of all minimal sets (character vectors).

  • controls_canonical - canonical set (character vector; may be empty).

  • formulas - list with original, minimal, minimal_list, canonical.

  • models - list with fitted models original, minimal, minimal_list, canonical.

  • verbose, imply - flags as provided.

Details

Engine-call parsing. If formula is a call (e.g., feols(Y ~ X | fe, data=df)), DAGassist extracts the engine function, formula, data argument, and any additional engine arguments directly from that call; these are merged with engine/engine_args you pass explicitly (call arguments win).

Fixest tails. For engines like fixest that use | to denote FE/IV parts, DAGassist preserves any | ... tail when constructing minimal/canonical formulas (e.g., Y ~ X + controls | fe | iv(...)).

Roles grid. The roles table displays short headers:

  • X (exposure), Y (outcome), CON (confounder), MED (mediator), COL (collider), IO (intermediate outcome = proper descendant of Y), DMed (proper descendant of any mediator), DCol (proper descendant of any collider). Descendants are proper (exclude the node itself) and can be any distance downstream. The internal is_descendant_of_exposure is retained for logic but hidden in displays.

Bad controls. For total-effect estimation, DAGassist flags as bad controls any variables that are MED, COL, IO, DMed, or DCol. These are warned in the console and omitted from the model-comparison table. Valid confounders (pre-treatment) are eligible for minimal/canonical adjustment sets.

Output types.

  • console prints roles, sets, formulas (if verbose), and a compact model comparison with {modelsummary} if available (falls back gracefully otherwise).

  • latex writes a LaTeX fragment you can \\input{} into a paper.

  • docx/word writes a Word doc (uses options(DAGassist.ref_docx=...) if set).

  • excel/xlsx writes an Excel workbook with tidy tables.

  • text/txt writes a plain-text report for logs/notes.

Dependencies. Core requires {dagitty}. Optional enhancements: {modelsummary} (pretty tables), {broom} (fallback tidying), {rmarkdown} + Pandoc (DOCX), {writexl} (XLSX).

Interpreting the output

ROLES. Variables in your formula are classified by DAG-based causal role:

  • X - treatment / exposure.

  • Y - outcome / dependent variable.

  • CON - confounder (common cause of X and Y); adjust for these.

  • MED - mediator (on a path from X to Y); do not adjust when estimating total effects.

  • COL - collider (direct descendant of X and Y); adjusting opens a spurious path, so do not adjust.

  • IO - intermediate outcome (descendant of Y); do not adjust.

  • DMed - descendant of a mediator; do not adjust when estimating total effects.

  • DCol - descendant of a collider; adjusting opens a spurious path, so do not adjust.

  • other - safe, non-confounding predictors (e.g., affect Y only). Included in the canonical model but omitted from the minimal set because they're not required for identification.

MODEL COMPARISON.

  • Minimal - the smallest adjustment set that blocks all back-door paths (confounders only).

  • Canonical - the largest permissible set: includes all controls that are not MED, COL, IO, DMed, or DCol. other variables may appear here.

Errors and edge cases

  • If exposure/outcome cannot be inferred uniquely, the function stops with a clear message.

  • Fitting errors (e.g., FE collinearity) are captured and displayed in comparisons without aborting the whole pipeline.

See also

print.DAGassist_report() for the console printer, and the helper exporters in report_* modules.

Examples

# generate a console DAGassist report
DAGassist(dag = g, formula = lm(Y ~ X + Z + C + M, data = df))
#> DAGassist Report: 
#> 
#> Roles:
#> variable  role        X  Y  conf  med  col  IO  dMed  dCol
#> X         exposure    x                                   
#> Y         outcome        x                      x         
#> Z         confounder        x                             
#> M         mediator                x                       
#> C         collider                     x    x   x         
#> 
#>  (!) Bad controls in your formula: {C, M}
#> Minimal controls 1: {Z}
#> Canonical controls: {Z}
#> 
#> Formulas:
#>   original:  Y ~ X + Z + C + M
#> 
#> Model comparison:
#> 
#> +---+----------+-----------+-----------+
#> |   | Original | Minimal 1 | Canonical |
#> +===+==========+===========+===========+
#> | X | 0.467*** | 1.306***  | 1.306***  |
#> +---+----------+-----------+-----------+
#> |   | (0.122)  | (0.098)   | (0.098)   |
#> +---+----------+-----------+-----------+
#> | Z | 0.185+   | 0.235+    | 0.235+    |
#> +---+----------+-----------+-----------+
#> |   | (0.102)  | (0.127)   | (0.127)   |
#> +---+----------+-----------+-----------+
#> | C | 0.368*** |           |           |
#> +---+----------+-----------+-----------+
#> |   | (0.076)  |           |           |
#> +---+----------+-----------+-----------+
#> | M | 0.512*** |           |           |
#> +---+----------+-----------+-----------+
#> |   | (0.077)  |           |           |
#> +===+==========+===========+===========+
#> | + p < 0.1, * p < 0.05, ** p < 0.01,  |
#> | *** p < 0.001                        |
#> +===+==========+===========+===========+ 

# generate a LaTeX DAGassist report
# \donttest{
DAGassist(dag = g, formula = lm(Y ~ X + Z + C + M, data = df),
          type = "latex", out = file.path(tempdir(), "frag.tex"))
# }