
Carlos Cinelli Andrew Forney, and Judea Pearl (2022+). "A Crash Course in Good and Bad Controls." Sociological Methods and Research.
[ abstract ]
[ preprint ]
[ journal ]
[ slides ]
[ r code ]
[ python code ]
Many students, especially in econometrics, express frustration with the way a problem known as “bad control” is evaded, if not mishandled, in the traditional literature. The problem arises when the addition of a variable to a regression equation produces an unintended discrepancy between the regression coefficient and the effect that the coefficient is expected to represent. Avoiding such discrepancies presents a challenge not only to practitioners of econometrics, but to all analysts in the data intensive sciences. This note describes graphical tools for understanding, visualizing, and resolving the problem through a series of illustrative examples. We have found that the examples presented here can serve as a powerful instructional device to supplement formal discussions of the problem. By making this “crash course” accessible to instructors and practitioners, we hope to avail these tools to a broader community of scientists concerned with the causal interpretation of regression models.

Lang Liu, Carlos Cinelli, and Zaid Harchaoui. (2022). "Orthogonal Statistical Learning with SelfConcordant Loss." Annual Conference on Learning Theory (COLT)
[ abstract ]
[ preprint ]
[ journal ]
Orthogonal statistical learning and double machine learning have emerged as general frameworks for twostage statistical prediction in the presence of a nuisance component. We establish nonasymptotic bounds on the excess risk of orthogonal statistical learning methods with a loss function satisfying a selfconcordance property. Our bounds improve upon existing bounds by a dimension factor while lifting the assumption of strong convexity. We illustrate the results with examples from multiple treatment effect estimation and generalized partially linear modeling.

Carlos Cinelli, N. LaPierre, B. Hill, S. Sankararaman and E. Eskin (2022). "Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy."
Nature Communications.
[ abstract ]
[ preprint ]
[ journal ]
[ video ]
Mendelian Randomization (MR) exploits genetic variants as instrumental variables to estimate the causal effect of an "exposure" trait on an "outcome" trait from observational data. However, the validity of such studies is threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to partially mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large genetic databases. Here, we describe a suite of sensitivity analysis tools for MR that enables investigators to properly quantify the robustness of their findings against these (and other) unobserved validity threats. Specifically, we propose the routine reporting of sensitivity statistics that can be used to readily quantify the robustness of a MR result: (i) the partial R2 of the genetic instrument with the exposure and the outcome traits; and, (ii) the robustness value of both genetic associations. These statistics quantify the minimal strength of violations of the MR assumptions that would be necessary to explain away the MR causal effect estimate. We also provide intuitive displays to visualize the sensitivity of the MR estimate to any degree of violation, and formal methods to bound the worstcase bias caused by violations in terms of multiples of the observed strength of principal components, batch effects, as well as putative pleiotropic pathways. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings, by showing that the MR estimate of the causal effect of body mass index (BMI) on diastolic blood pressure is relatively robust, whereas the MR estimate of the causal effect of BMI on Townsend deprivation index is relatively fragile.

Carlos Cinelli and Judea Pearl (2021). "Generalizing Experimental Results by Leveraging Knowledge of Mechanisms."
European Journal of Epidemiology.
[ abstract ]
[ preprint ]
[ journal ]
[ r package ]
We show how experimental results can be generalized across diverse populations by leveraging knowledge of local mechanisms that produce the outcome of interest, only some of which may differ in the target domain. We use Structural Causal Models (SCM) and a refined version of selection diagrams to represent such knowledge, and to decide whether it entails the invariance of probabilities of causation across populations, which then enables generalization. We further provide: (i) bounds for the target effect when some of these conditions are violated; (ii) new identification results for probabilities of causation and the transported causal effect when trials from multiple source domains are available; as well as (iii) a Bayesian approach for estimating the transported causal effect from finite samples. We illustrate these methods both with simulated data and with a real example that transports the effects of Vitamin A supplementation on childhood mortality across different regions.

Chi Zhang, Carlos Cinelli, Bryant Chen, and Judea Pearl (2021). "Exploiting Equality Constraints in Causal Inference."
International Conference on Artificial Intelligence and Statistics (AISTATS).
[ abstract ]
[ preprint ]
[ journal ]
Assumptions about equality of effects are commonly made in causal inference tasks. For example, the wellknown “differenceindifferences” method assumes that confounding remains constant across time periods. Similarly, it is not unreasonable to assume that causal effects apply equally to units undergoing interference. Finally, sensitivity analysis often hypothesizes equality among existing and unaccounted for confounders. Despite the ubiquity of these “equality constraints,” modern identification methods have not leveraged their presence in a systematic way. In this paper, we develop a novel graphical criterion that extends the wellknown method of generalized instrumental sets to exploit such additional constraints for causal identification in linear models. We further demonstrate how it solves many diverse problems found in the literature in a general way, including differenceindifferences, interference, as well as benchmarking in sensitivity analysis.

Carlos Cinelli and Chad Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted Variable Bias." Journal of the Royal Statistical Society, Series B (Statistical Methodology).
[ abstract ]
[ preprint ]
[ journal ]
[ r package ]
[ shiny app ]
[ stata module ]
[ python package ]
We extend the omitted variable bias framework with a suite of tools for sensitivity analysis in regression models that: (i) does not require assumptions about the treatment assignment nor the nature of confounders; (ii) naturally handles multiple confounders, possibly acting nonlinearly; (iii) exploits expert knowledge to bound sensitivity parameters; and, (iv) can be easily computed using only standard regression results. In particular, we introduce two novel sensitivity measures suited for routine reporting. The robustness value describes the minimum strength of association unobserved confounding would need to have, both with the treatment and the outcome, to change the research conclusions. The partial R2 of the treatment with the outcome shows how strongly confounders explaining all the residual outcome variation would have to be associated with the treatment to eliminate the estimated effect. Next, we offer graphical tools for elaborating on problematic confounders, examining the sensitivity of point estimates, tvalues, as well as “extreme scenarios”. Finally, we describe problems with a common “benchmarking” practice and introduce a novel procedure to formally bound the strength of confounders based on comparison to observed covariates. We apply these methods to a running example that estimates the effect of exposure to violence on attitudes toward peace.

Daniel Kumor, Carlos Cinelli and Elias Bareinboim (2020). "Efficient Identification in Linear Structural Causal Models with Auxiliary Cutsets." International Conference on Machine Learning (ICML).
[ abstract ]
[ preprint ]
[ journal ]
We develop a new polynomialtime algorithm for identification of structural coefficients in linear causal models that subsumes previous stateoftheart methods, unifying several disparate approaches to identification in this setting. Building on these results, we develop a procedure for identifying total causal effects in linear systems.

Carlos Cinelli, D. Kumor, B. Chen, J. Pearl and E. Bareinboim (2019). "Sensitivity Analysis of Linear Structural Causal Models." International Conference on Machine Learning (ICML).
[ abstract ]
[ preprint ]
[ journal ]
[ short video ]
Causal inference requires assumptions about the data generating process, many of which are unverifiable from the data. Given that some causal assumptions might be uncertain or disputed, formal methods are needed to quantify how sensitive research conclusions are to violations of those assumptions. Although an extensive literature exists on the topic, most results are limited to specific model structures, while a generalpurpose algorithmic framework for sensitivity analysis is still lacking. In this paper, we develop a formal, systematic approach to sensitivity analysis for arbitrary linear Structural Causal Models (SCMs). We start by formalizing sensitivity analysis as a constrained identification problem. We then develop an efficient, graphbased identification algorithm that exploits nonzero constraints on both directed and bidirected edges. This allows researchers to systematically derive sensitivity curves for a target causal quantity with an arbitrary set of path coefficients and error covariances as sensitivity parameters. These results can be used to display the degree to which violations of causal assumptions affect the target quantity of interest, and to judge, on scientific grounds, whether problematic degrees of violations are plausible.

Carlos Cinelli and Judea Pearl (2018). "On the utility of Causal Diagrams for Modeling Attrition." Epidemiology.
[ abstract ]
[ preprint ]
[ journal ]
In a recent communication, Breskin, Cole and Hudgens aimed to demonstrate “how singleworld intervention graphs can supplement traditional causal diagrams”. The example used in their demonstration involved selection bias due to attrition, namely, subjects dropping out from a randomized trial before the outcome is observed. Here we use the same example to demonstrate the opposite conclusion; the derivation presented by Breskin et al. is in fact longer and more complicated than the standard, threestep derivation facilitated by traditional causal diagrams. We further show that more natural solutions to attrition problems are obtained when viewed as missingdata problems encoded in causal diagrams.
PrePhD: Before turning my attention to causal and statistical methodology, I used to write about a quite different topic. Below you can find some of my predoctoral publications on the history of economic thought (most in portuguese).

Carlos Cinelli and Rogerio Arthmar. "The debating tradition in Britain and the new political economy: William Thompson and John Stuart Mill at the London Cooperative Society in 1825." Nova Economia, v.28 (2), p.609636, 2018.

Rogerio Arthmar and Carlos Cinelli (in portuguese). "The classical economics between laissezfaire and socialism." EconomiA, v. 14, p. 227252, 2013.

Carlos Cinelli (in portuguese). "Voluntary transfers and municipal corruption in Brazil: preliminary evidence from the irregular accounts registry of the Federal Court of Accounts." Revista Economia e Tecnologia, v. 7, p. 8997, 2011.

Carlos Cinelli and Rogerio Arthmar (in portuguese). "When the classical liberal and the socialist confront: Bastiat, Proudhon and capital rent." Nova Economia, v. 20, p. 509541, 2010.