Using Different ML Methods • dml.sensemakr

Currently, dml.sensemakr uses caret for handling machine learning (ML) methods. The default ML method is the random forest implementation provided by the package ranger. This is not only fast, but also seems to provide good results with minimal to no tuning. However, researchers can use any ML method they prefer, by just changing the reg argument of the dml() function. In this vignette we provide a few examples illustrating how to use different ML methods.

Here we use the same data as before: our goal is to estimate the causal impact of 401(k) eligibility on net financial assets

# loads package
library(dml.sensemakr)

## loads data
data("pension")
y  <- pension$net_tfa
d  <- pension$e401
x  <- model.matrix(~ -1 + age + inc + educ+ fsize + marr + twoearn + pira + hown, data = pension)

Quick Overview

Users can provide different ML methods to dml() using the reg argument. If only the name of the method is provided, no tuning is performed, and default parameters are used. For instance, the code below runs DML using generalized additive models (GAMs).

# generalized additive model
dml.gam <- dml(y, d, x, model = "npm", reg = "gam")
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Target: ate 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gam), treatment (gam)
#>  Tuning: dirty 
#> 
#> 
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#> 
#> - Tuning Model for D.
#> -- Best Tune:
#>   select method
#> 1   TRUE GCV.Cp
#> 
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#>   select method
#> 1   TRUE GCV.Cp
#> 
#> 
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#> 
#> -- Rep 1 -- Folds: 1  2  3  4  5
summary(dml.gam)
#> 
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gam, R2 = 28.511%), treatment (gam, R2 = 12.201%)
#>  Tuning: dirty 
#> 
#> Average Treatment Effect: 
#> 
#>         Estimate Std. Error t value  P(>|t|)    
#> ate.all     7161       1142    6.27 3.61e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#> 
#> Verbal interpretation of DML procedure:
#> 
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Generalized Additive Model using Splines from the R package mgcv; the treatment regression uses Generalized Additive Model using Splines from the R package mgcv.

And the code below uses gradient boosting machines (GBMs).

# gradient boosting machine
dml.gbm  <- dml(y, d, x,  model = "npm", reg = "gbm")
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Target: ate 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gbm), treatment (gbm)
#>  Tuning: dirty 
#> 
#> 
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#> 
#> - Tuning Model for D.
#> -- Best Tune:
#>   interaction.depth n.trees shrinkage n.minobsinnode
#> 1                 1      50       0.1             10
#> 
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#>   interaction.depth n.trees shrinkage n.minobsinnode
#> 1                 1      50       0.1             10
#> 
#> 
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#> 
#> -- Rep 1 -- Folds: 1  2  3  4  5
summary(dml.gbm)
#> 
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gbm, R2 = 22.658%), treatment (gbm, R2 = 11.67%)
#>  Tuning: dirty 
#> 
#> Average Treatment Effect: 
#> 
#>         Estimate Std. Error t value  P(>|t|)    
#> ate.all     7784       1163   6.693 2.18e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#> 
#> Verbal interpretation of DML procedure:
#> 
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Stochastic Gradient Boosting from the R package gbm; the treatment regression uses Stochastic Gradient Boosting from the R package gbm.

Above we used the same ML method for estimating both the regression with the treatment and with the outcome. Note, however, that you can use different methods for each regression, by specifying yreg and dreg separately. For instance, the code below uses GAM for the outcome regression, and GBM for the treatment regression.

# gradient boosting machine
dml.gam.gbm  <- dml(y, d, x,  model = "npm", yreg = "gam", dreg = "gbm")
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Target: ate 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gam), treatment (gbm)
#>  Tuning: dirty 
#> 
#> 
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#> 
#> - Tuning Model for D.
#> -- Best Tune:
#>   interaction.depth n.trees shrinkage n.minobsinnode
#> 1                 1      50       0.1             10
#> 
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#>   select method
#> 1   TRUE GCV.Cp
#> 
#> 
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#> 
#> -- Rep 1 -- Folds: 1  2  3  4  5
summary(dml.gam.gbm)
#> 
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gam, R2 = 27.421%), treatment (gbm, R2 = 11.861%)
#>  Tuning: dirty 
#> 
#> Average Treatment Effect: 
#> 
#>         Estimate Std. Error t value  P(>|t|)    
#> ate.all     7638       1118    6.83 8.51e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#> 
#> Verbal interpretation of DML procedure:
#> 
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Generalized Additive Model using Splines from the R package mgcv; the treatment regression uses Stochastic Gradient Boosting from the R package gbm.

Tuning Parameters

Users can provide details such as the form of cross-validation and a specific tuning grid by passing a named list of arguments via reg. The arguments of reg should include all relevant arguments of the train() function of the package caret. The main arguments are: method, trControl and tuneGrid or tuneLength. See ?caret::train for further information.

For instance, the code below performs 5-fold cross-validation, to search parameters in a grid of size 5 for GBM (the values of the grid are chosen by caret).

# gradient boosting machine
gbm.args <- list(method = "gbm", 
                 trControl = list(method = "cv", number = 5),
                 tuneLength  = 5)
dml.gbm  <- dml(y, d, x,  model = "npm", reg = gbm.args)
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Target: ate 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gbm), treatment (gbm)
#>  Tuning: dirty 
#> 
#> 
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#> 
#> - Tuning Model for D.
#> -- Best Tune:
#>   n.trees interaction.depth shrinkage n.minobsinnode
#> 7     100                 2       0.1             10
#> 
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#>    n.trees interaction.depth shrinkage n.minobsinnode
#> 17     100                 4       0.1             10
#> 
#> 
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#> 
#> -- Rep 1 -- Folds: 1  2  3  4  5
summary(dml.gbm)
#> 
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (gbm, R2 = 24.029%), treatment (gbm, R2 = 12.327%)
#>  Tuning: dirty 
#> 
#> Average Treatment Effect: 
#> 
#>         Estimate Std. Error t value P(>|t|)    
#> ate.all     8098       1245   6.506 7.7e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#> 
#> Verbal interpretation of DML procedure:
#> 
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Stochastic Gradient Boosting from the R package gbm; the treatment regression uses Stochastic Gradient Boosting from the R package gbm.

Other Methods

Below we provide some templates for other machine learning methods. In all examples you may change trControl to your favorite choice of cross-validation, for instance, trControl = list(method = "cv", number = 5) for 5-fold cross validation, and also expand the parameters of the tuning grid accordingly.

Neural Networks

Template for using neural networks.

# Neural Net
nnet.args     <- list(method = "nnet", 
                      trControl = list(method  = "none"), 
                      tuneGrid = expand.grid(size = 8, decay = 0.01), 
                      maxit = 1000, maxNWts = 10000)
dml.nnet <- dml(y, d, x, model = "npm", reg = nnet.args)
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Target: ate 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (nnet), treatment (nnet)
#>  Tuning: dirty 
#> 
#> 
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#> 
#> - Tuning Model for D.
#> -- Best Tune:
#>   size decay
#> 1    8  0.01
#> 
#> - Tuning Model for Y (non-parametric).
#> -- Best Tune:
#>   size decay
#> 1    8  0.01
#> 
#> 
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#> 
#> -- Rep 1 -- Folds: 1  2  3  4  5
summary(dml.nnet)
#> 
#> Debiased Machine Learning
#> 
#>  Model: Nonparametric 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (nnet, R2 = 30.377%), treatment (nnet, R2 = 11.461%)
#>  Tuning: dirty 
#> 
#> Average Treatment Effect: 
#> 
#>         Estimate Std. Error t value P(>|t|)    
#> ate.all     7567       1217   6.219   5e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#> 
#> Verbal interpretation of DML procedure:
#> 
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses Neural Network from the R package nnet; the treatment regression uses Neural Network from the R package nnet.

Lasso with Polynomial Expansion

Template for using lasso with a polynomial expansion of the covariates x.

# creates polynomial expansion of x
xl  <- model.matrix(~ -1 + (poly(age, 6, raw=TRUE) + poly(inc, 8, raw=TRUE) +
                      poly(educ, 4, raw=TRUE) + poly(fsize, 2, raw=TRUE) +
                      marr + twoearn + pira + hown)^2, data = pension)
# lasso args
lasso.args <- list(method = "glmnet",
                   trControl = list(method = "none"),
                   tuneGrid = expand.grid(alpha = 1, lambda = 0.002))

# fit dml
dml.glmnet <- dml(y, d, xl, model = "plm", reg = lasso.args)
#> Debiased Machine Learning
#> 
#>  Model: Partially Linear 
#>  Target: ate 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (glmnet), treatment (glmnet)
#>  Tuning: dirty 
#> 
#> 
#> ====================================
#> Tuning parameters using all the data
#> ====================================
#> 
#> - Tuning Model for D.
#> -- Best Tune:
#>   alpha lambda
#> 1     1  0.002
#> 
#> - Tuning Model for Y (partially linear).
#> -- Best Tune:
#>   alpha lambda
#> 1     1  0.002
#> 
#> 
#> ======================================
#> Repeating 5-fold cross-fitting 1 times
#> ======================================
#> 
#> -- Rep 1 -- Folds: 1  2  3  4  5
summary(dml.glmnet)
#> 
#> Debiased Machine Learning
#> 
#>  Model: Partially Linear 
#>  Cross-Fitting: 5 folds, 1 reps 
#>  ML Method: outcome (glmnet, R2 = 28.772%), treatment (glmnet, R2 = 11.969%)
#>  Tuning: dirty 
#> 
#> Average Treatment Effect: 
#> 
#>         Estimate Std. Error t value  P(>|t|)    
#> ate.all     9159       1316   6.959 3.42e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Note: DML estimates combined using the median method.
#> 
#> Verbal interpretation of DML procedure:
#> 
#> -- Average treatment effects were estimated using DML with 5-fold cross-fitting. In order to reduce the variance that stems from sample splitting, we repeated the procedure 1 times. Estimates are combined using the median as the final estimate, incorporating variation across experiments into the standard error as described in Chernozhukov et al. (2018). The outcome regression uses glmnet from the R package glmnet; the treatment regression uses glmnet from the R package glmnet.