other-ml-methods.Rmd
Currently, dml.sensemakr
uses caret
for
handling machine learning (ML) methods. The default ML method is the
random forest implementation provided by the package
ranger
. This is not only fast, but also seems to provide
good results with minimal to no tuning. However, researchers can use any
ML method they prefer, by just changing the reg
argument of
the dml()
function. In this vignette we provide a few
examples illustrating how to use different ML methods.
Here we use the same data as before: our goal is to estimate the causal impact of 401(k) eligibility on net financial assets
# loads package
library(dml.sensemakr)
## loads data
data("pension")
y <- pension$net_tfa
d <- pension$e401
x <- model.matrix(~ -1 + age + inc + educ+ fsize + marr + twoearn + pira + hown, data = pension)
Users can provide different ML methods to dml()
using
the reg
argument. If only the name of the method is
provided, no tuning is performed, and default parameters are used. For
instance, the code below runs DML using generalized additive models
(GAMs).
# generalized additive model
dml.gam <- dml(y, d, x, model = "npm", reg = "gam")
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Target: ate
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gam), treatment (gam)
#> Tuning: dirty
#>
#>
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#>
#> - Tuning Model for D.
#> -- Best Tune:
#> select method
#> 1 TRUE GCV.Cp
#>
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#> select method
#> 1 TRUE GCV.Cp
#>
#>
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#>
#> -- Rep 1 -- Folds: 1 2 3 4 5
summary(dml.gam)
#>
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gam, R2 = 28.511%), treatment (gam, R2 = 12.201%)
#> Tuning: dirty
#>
#> Average Treatment Effect:
#>
#> Estimate Std. Error t value P(>|t|)
#> ate.all 7161 1142 6.27 3.61e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#>
#> Verbal interpretation of DML procedure:
#>
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Generalized Additive Model using Splines from the R package mgcv; the treatment regression uses Generalized Additive Model using Splines from the R package mgcv.
And the code below uses gradient boosting machines (GBMs).
# gradient boosting machine
dml.gbm <- dml(y, d, x, model = "npm", reg = "gbm")
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Target: ate
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gbm), treatment (gbm)
#> Tuning: dirty
#>
#>
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#>
#> - Tuning Model for D.
#> -- Best Tune:
#> interaction.depth n.trees shrinkage n.minobsinnode
#> 1 1 50 0.1 10
#>
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#> interaction.depth n.trees shrinkage n.minobsinnode
#> 1 1 50 0.1 10
#>
#>
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#>
#> -- Rep 1 -- Folds: 1 2 3 4 5
summary(dml.gbm)
#>
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gbm, R2 = 22.658%), treatment (gbm, R2 = 11.67%)
#> Tuning: dirty
#>
#> Average Treatment Effect:
#>
#> Estimate Std. Error t value P(>|t|)
#> ate.all 7784 1163 6.693 2.18e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#>
#> Verbal interpretation of DML procedure:
#>
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Stochastic Gradient Boosting from the R package gbm; the treatment regression uses Stochastic Gradient Boosting from the R package gbm.
Above we used the same ML method for estimating both the regression
with the treatment and with the outcome. Note, however, that you can use
different methods for each regression, by specifying yreg
and dreg
separately. For instance, the code below uses GAM
for the outcome regression, and GBM for the treatment regression.
# gradient boosting machine
dml.gam.gbm <- dml(y, d, x, model = "npm", yreg = "gam", dreg = "gbm")
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Target: ate
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gam), treatment (gbm)
#> Tuning: dirty
#>
#>
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#>
#> - Tuning Model for D.
#> -- Best Tune:
#> interaction.depth n.trees shrinkage n.minobsinnode
#> 1 1 50 0.1 10
#>
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#> select method
#> 1 TRUE GCV.Cp
#>
#>
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#>
#> -- Rep 1 -- Folds: 1 2 3 4 5
summary(dml.gam.gbm)
#>
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gam, R2 = 27.421%), treatment (gbm, R2 = 11.861%)
#> Tuning: dirty
#>
#> Average Treatment Effect:
#>
#> Estimate Std. Error t value P(>|t|)
#> ate.all 7638 1118 6.83 8.51e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#>
#> Verbal interpretation of DML procedure:
#>
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Generalized Additive Model using Splines from the R package mgcv; the treatment regression uses Stochastic Gradient Boosting from the R package gbm.
Users can provide details such as the form of cross-validation and a
specific tuning grid by passing a named list of arguments via
reg
. The arguments of reg
should include all
relevant arguments of the train()
function of the package
caret
. The main arguments are: method
,
trControl
and tuneGrid
or
tuneLength
. See ?caret::train
for further
information.
For instance, the code below performs 5-fold cross-validation, to
search parameters in a grid of size 5 for GBM (the values of the grid
are chosen by caret
).
# gradient boosting machine
gbm.args <- list(method = "gbm",
trControl = list(method = "cv", number = 5),
tuneLength = 5)
dml.gbm <- dml(y, d, x, model = "npm", reg = gbm.args)
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Target: ate
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gbm), treatment (gbm)
#> Tuning: dirty
#>
#>
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#>
#> - Tuning Model for D.
#> -- Best Tune:
#> n.trees interaction.depth shrinkage n.minobsinnode
#> 7 100 2 0.1 10
#>
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#> n.trees interaction.depth shrinkage n.minobsinnode
#> 17 100 4 0.1 10
#>
#>
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#>
#> -- Rep 1 -- Folds: 1 2 3 4 5
summary(dml.gbm)
#>
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (gbm, R2 = 24.029%), treatment (gbm, R2 = 12.327%)
#> Tuning: dirty
#>
#> Average Treatment Effect:
#>
#> Estimate Std. Error t value P(>|t|)
#> ate.all 8098 1245 6.506 7.7e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#>
#> Verbal interpretation of DML procedure:
#>
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Stochastic Gradient Boosting from the R package gbm; the treatment regression uses Stochastic Gradient Boosting from the R package gbm.
Below we provide some templates for other machine learning methods.
In all examples you may change trControl
to your favorite
choice of cross-validation, for instance,
trControl = list(method = "cv", number = 5)
for 5-fold
cross validation, and also expand the parameters of the tuning grid
accordingly.
Template for using neural networks.
# Neural Net
nnet.args <- list(method = "nnet",
trControl = list(method = "none"),
tuneGrid = expand.grid(size = 8, decay = 0.01),
maxit = 1000, maxNWts = 10000)
dml.nnet <- dml(y, d, x, model = "npm", reg = nnet.args)
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Target: ate
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (nnet), treatment (nnet)
#> Tuning: dirty
#>
#>
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#>
#> - Tuning Model for D.
#> -- Best Tune:
#> size decay
#> 1 8 0.01
#>
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#> size decay
#> 1 8 0.01
#>
#>
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#>
#> -- Rep 1 -- Folds: 1 2 3 4 5
summary(dml.nnet)
#>
#> Debiased Machine Learning
#>
#> Model: Nonparametric
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (nnet, R2 = 30.377%), treatment (nnet, R2 = 11.461%)
#> Tuning: dirty
#>
#> Average Treatment Effect:
#>
#> Estimate Std. Error t value P(>|t|)
#> ate.all 7567 1217 6.219 5e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#>
#> Verbal interpretation of DML procedure:
#>
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Neural Network from the R package nnet; the treatment regression uses Neural Network from the R package nnet.
Template for using lasso with a polynomial expansion of the
covariates x
.
# creates polynomial expansion of x
xl <- model.matrix(~ -1 + (poly(age, 6, raw=TRUE) + poly(inc, 8, raw=TRUE) +
poly(educ, 4, raw=TRUE) + poly(fsize, 2, raw=TRUE) +
marr + twoearn + pira + hown)^2, data = pension)
# lasso args
lasso.args <- list(method = "glmnet",
trControl = list(method = "none"),
tuneGrid = expand.grid(alpha = 1, lambda = 0.002))
# fit dml
dml.glmnet <- dml(y, d, xl, model = "plm", reg = lasso.args)
#> Debiased Machine Learning
#>
#> Model: Partially Linear
#> Target: ate
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (glmnet), treatment (glmnet)
#> Tuning: dirty
#>
#>
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#>
#> - Tuning Model for D.
#> -- Best Tune:
#> alpha lambda
#> 1 1 0.002
#>
#> - Tuning Model for Y (partially linear).
#> -- Best Tune:
#> alpha lambda
#> 1 1 0.002
#>
#>
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#>
#> -- Rep 1 -- Folds: 1 2 3 4 5
summary(dml.glmnet)
#>
#> Debiased Machine Learning
#>
#> Model: Partially Linear
#> Cross-Fitting: 5 folds, 1 reps
#> ML Method: outcome (glmnet, R2 = 28.772%), treatment (glmnet, R2 = 11.969%)
#> Tuning: dirty
#>
#> Average Treatment Effect:
#>
#> Estimate Std. Error t value P(>|t|)
#> ate.all 9159 1316 6.959 3.42e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#>
#> Verbal interpretation of DML procedure:
#>
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses glmnet from the R package glmnet; the treatment regression uses glmnet from the R package glmnet.