Currently, dml.sensemakr uses caret for
handling machine learning (ML) methods. The default ML method is the
random forest implementation provided by the package
ranger. This is not only fast, but also seems to provide
good results with minimal to no tuning. However, researchers can use any
ML method they prefer, by just changing the reg argument of
the dml() function. In this vignette we provide a few
examples illustrating how to use different ML methods.
Here we use the same data as before: our goal is to estimate the causal impact of 401(k) eligibility on net financial assets
# loads package
library(dml.sensemakr)
## loads data
data("pension")
y <- pension$net_tfa
d <- pension$e401
x <- model.matrix(~ -1 + age + inc + educ+ fsize + marr + twoearn + pira + hown, data = pension)Users can provide different ML methods to dml() using
the reg argument. If only the name of the method is
provided, no tuning is performed, and default parameters are used. For
instance, the code below runs DML using generalized additive models
(GAMs).
And the code below uses gradient boosting machines (GBMs).
Above we used the same ML method for estimating both the regression
with the treatment and with the outcome. Note, however, that you can use
different methods for each regression, by specifying yreg
and dreg separately. For instance, the code below uses GAM
for the outcome regression, and GBM for the treatment regression.
# gradient boosting machine
dml.gam.gbm <- dml(y, d, x, model = "npm", yreg = "gam", dreg = "gbm")
summary(dml.gam.gbm)Users can provide details such as the form of cross-validation and a
specific tuning grid by passing a named list of arguments via
reg. The arguments of reg should include all
relevant arguments of the train() function of the package
caret. The main arguments are: method,
trControl and tuneGrid or
tuneLength. See ?caret::train for further
information.
For instance, the code below performs 5-fold cross-validation, to
search parameters in a grid of size 5 for GBM (the values of the grid
are chosen by caret).
Below we provide some templates for other machine learning methods.
In all examples you may change trControl to your favorite
choice of cross-validation, for instance,
trControl = list(method = "cv", number = 5) for 5-fold
cross validation, and also expand the parameters of the tuning grid
accordingly.
Template for using neural networks.
# Neural Net
nnet.args <- list(method = "nnet",
trControl = list(method = "none"),
tuneGrid = expand.grid(size = 8, decay = 0.01),
maxit = 1000, maxNWts = 10000)
dml.nnet <- dml(y, d, x, model = "npm", reg = nnet.args)
summary(dml.nnet)Template for using lasso with a polynomial expansion of the
covariates x.
# creates polynomial expansion of x
xl <- model.matrix(~ -1 + (poly(age, 6, raw=TRUE) + poly(inc, 8, raw=TRUE) +
poly(educ, 4, raw=TRUE) + poly(fsize, 2, raw=TRUE) +
marr + twoearn + pira + hown)^2, data = pension)
# lasso args
lasso.args <- list(method = "glmnet",
trControl = list(method = "none"),
tuneGrid = expand.grid(alpha = 1, lambda = 0.002))
# fit dml
dml.glmnet <- dml(y, d, xl, model = "plm", reg = lasso.args)
summary(dml.glmnet)