Carlos Cinelli

See also my CV or Google Scholar profile.

Working Papers

Carlos Cinelli, Avi Feller, Guido Imbens, Edward Kennedy, Sara Magliacane and Jose Zubizarreta. "Challenges in Statistics: Causality and Causal Inference".
[ abstract ] [ email for draft ]

Causality and causal inference have emerged as core research areas at the interface of modern statistics and domains including biomedical sciences, social sciences, computer science, and beyond. The field's inherently interdisciplinary nature --- particularly the central role of incorporating domain knowledge --- creates a rich and varied set of statistical challenges. Although there has been much progress made, especially in the last three decades, there remain many open questions. Our goal in this paper is to outline research directions and open questions we view as particularly promising for future work. Throughout we emphasize that advancing causal research requires a wide range of contributions, from novel theory and methodological innovations to improved software tools and closer engagement with domain scientists and practitioners.

Danielle Tsao, Ronan Perry and Carlos Cinelli. "On the minimum strength of (unobserved) covariates to overturn an insignificant result."
[ abstract ] [ draft ]

We study conditions under which the addition of variables to a regression equation can turn a previously statistically insignificant result into a significant one. Specifically, we characterize the minimum strength of association required for these variables—both with the dependent and independent variables, or with the dependent variable alone—to elevate the observed t-statistic above a specified significance threshold. Interestingly, we show that it is considerably difficult to overturn a statistically insignificant result solely by reducing the standard error. Instead, included variables must also alter the point estimate to achieve such reversals in practice. Our results can be used for sensitivity analysis and for bounding the extent of p-hacking, and may also offer algebraic explanations for patterns of reversals seen in empirical research, such as those documented by Lenz and Sahn (2021).

Victor Chernozhukov, Carlos Cinelli, Whitney Newey, Amit Sharma, and Vasilis Syrgkanis. "Long Story Short: Omitted Variable Bias in Causal Machine Learning."
(best paper award SBE 2024)
[ abstract ] [ draft ] [ nber ] [ r package ] [ python package ] [ video 1 ] [ video 2 ]

We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution--all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects and avearage derivatives) the bound is shown to depend on easily interpretable quantities that measure the explanatory power of the omitted variables. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Furthermore, we use debiased machine learning to provide flexible and efficient statistical inference on learnable components of the bounds. Finally, empirical examples demonstrate the usefulness of the approach.

Peer Reviewed

Nicholas J. Irons and Carlos Cinelli (2025). "Causally Sound Priors for Binary Experiments." Bayesian Analysis.
[ abstract ] [ preprint ] [ journal ] [ replication code ] [ poster ]

We introduce the BREASE framework for the Bayesian analysis of randomized controlled trials with a binary treatment and a binary outcome. Approaching the problem from a causal inference perspective, we propose parameterizing the likelihood in terms of the baseline risk, efficacy, and side effects of the treatment, along with a flexible, yet intuitive and tractable jointly independent beta prior distribution on these parameters, which we show to be a generalization of the Dirichlet prior for the joint distribution of potential outcomes. Our approach has a number of desirable characteristics when compared to current mainstream alternatives: (i) it naturally induces prior dependence between expected outcomes in the treatment and control groups; (ii) as the baseline risk, efficacy and side effects are quantities inherently familiar to clinicians, the hyperparameters of the prior are directly interpretable, thus facilitating the elicitation of prior knowledge and sensitivity analysis; and (iii) it admits analytical formulae for the marginal likelihood, Bayes factor, and other posterior quantities,as well as an exact posterior sampling algorithm and an accurate and fast data-augmented Gibbs sampler in cases where traditional MCMC fails. Empirical examples demonstrate the utility of our methods for estimation, hypothesis testing, and sensitivity analysis of treatment effects.

Carlos Cinelli and Chad Hazlett (2025). "An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables." Biometrika.
[ abstract ] [ preprint ] [ journal ] [ r package ] [ video 1 ] [ video 2 ] [ video 3 ]

We develop an "omitted variable bias" framework for sensitivity analysis of instrumental variable (IV) estimates that is immune to "weak instruments," naturally handles multiple "side-effects" (violations of the exclusion restriction assumption) and "confounders" (violations of the ignorability of the instrument assumption), exploits expert knowledge to bound sensitivity parameters, and can be easily implemented with standard software. Conveniently, we show that many pivotal conclusions regarding the sensitivity of the IV estimate (e.g. tests against the null hypothesis of a zero causal effect) can be reached simply through separate sensitivity analyses of two familiar auxiliary OLS estimates, namely, the effect of the instrument on the treatment (the "first stage") and the effect of the instrument on the outcome (the "reduced form"). More specifically, we introduce sensitivity statistics for routine reporting, such as (extreme) robustness values for IV estimates, describing the minimum strength that omitted variables need to have to invalidate the conclusions of an IV study. Next we provide visual displays that fully characterize the sensitivity of IV point estimates and confidence intervals to violations of the standard IV assumptions. Finally, we offer formal bounds on the worst possible bias under the assumption that the maximum explanatory power of omitted variables are no stronger than a multiple of the explanatory power of observed variables. We apply our methods in a running example that estimates the returns to schooling.

Carlos Cinelli, Jeremy Ferwerda, and Chad Hazlett (2024). "sensemakr: Sensitivity Analysis Tools for OLS in R and Stata." Observational Studies.
[ abstract ] [ preprint ] [ journal ] [ r package ] [ stata module ] [ video ]

This paper introduces the package sensemakr for R and Stata, which implements a suite of sensitivity analysis tools for regression models developed in Cinelli and Hazlett (2020a). Given a regression model, sensemakr can compute sensitivity statistics for routine reporting, such as the robustness value, which describes the minimum strength that unobserved confounders need to have to overturn a research conclusion. The package also provides plotting tools that visually demonstrate the sensitivity of point estimates and t-values to hypothetical confounders. Finally, sensemakr implements formal bounds on sensitivity parameters by means of comparison with the explanatory power of observed variables. All these tools are based on the familiar "omitted variable bias" framework, do not require assumptions regarding the functional form of the treatment assignment mechanism nor the distribution of the unobserved confounders, and naturally handle multiple, non-linear confounders. With sensemakr, users can transparently report the sensitivity of their causal inferences to unobserved confounding, thereby enabling a more precise, quantitative debate as to what can be concluded from imperfect observational studies.

Philipp Bach, Victor Chernozhukov, Carlos Cinelli, Lin Jia, Sven Klaassen, Nils Skotara, and Martin Spindler. (2024) "Sensitivity Analysis for Causal ML: A Use Case at Booking.com." KDD 2024 Workshop - Causal Inference and Machine Learning in Practice.
[ abstract ] [ preprint ] [ workshop ]

Causal Machine Learning has emerged as a powerful tool for flexibly estimating causal effects from observational data in both industry and academia. However, causal inference from observational data relies on untestable assumptions about the data-generating process, such as the absence of unobserved confounders. When these assumptions are violated, causal effect estimates may become biased, undermining the validity of research findings. In these contexts, sensitivity analysis plays a crucial role, by enabling data scientists to assess the robustness of their findings to plausible violations of unconfoundedness. This paper introduces sensitivity analysis and demonstrates its practical relevance through a (simulated) data example based on a use case at Booking.com. We focus our presentation on a recently proposed method by Chernozhukov et al (2023), which derives general non-parametric bounds on biases due to omitted variables, and is fully compatible with (though not limited to) modern inferential tools of Causal Machine Learning. By presenting this use case, we aim to raise awareness of sensitivity analysis and highlight its importance in real-world scenarios.

Carlos Cinelli Andrew Forney, and Judea Pearl (2024). "A Crash Course in Good and Bad Controls." Sociological Methods and Research.
(featured in the most read and most cited articles of the journal)
[ abstract ] [ preprint ] [ journal ] [ slides ] [ video ] [ video portuguese ] [ r code ] [ python code ]

Many students, especially in econometrics, express frustration with the way a problem known as “bad control” is evaded, if not mishandled, in the traditional literature. The problem arises when the addition of a variable to a regression equation produces an unintended discrepancy between the regression coefficient and the effect that the coefficient is expected to represent. Avoiding such discrepancies presents a challenge not only to practitioners of econometrics, but to all analysts in the data intensive sciences. This note describes graphical tools for understanding, visualizing, and resolving the problem through a series of illustrative examples. We have found that the examples presented here can serve as a powerful instructional device to supplement formal discussions of the problem. By making this “crash course” accessible to instructors and practitioners, we hope to avail these tools to a broader community of scientists concerned with the causal interpretation of regression models.

Lang Liu, Carlos Cinelli, and Zaid Harchaoui. (2022). "Orthogonal Statistical Learning with Self-Concordant Loss." Annual Conference on Learning Theory (COLT)
[ abstract ] [ preprint ] [ journal ]

Orthogonal statistical learning and double machine learning have emerged as general frameworks for two-stage statistical prediction in the presence of a nuisance component. We establish non-asymptotic bounds on the excess risk of orthogonal statistical learning methods with a loss function satisfying a self-concordance property. Our bounds improve upon existing bounds by a dimension factor while lifting the assumption of strong convexity. We illustrate the results with examples from multiple treatment effect estimation and generalized partially linear modeling.

Carlos Cinelli, N. LaPierre, B. Hill, S. Sankararaman and E. Eskin (2022). "Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy." Nature Communications.
[ abstract ] [ preprint ] [ journal ] [ r package ] [ video 1 ] [ video 2 ]

Mendelian Randomization (MR) exploits genetic variants as instrumental variables to estimate the causal effect of an "exposure" trait on an "outcome" trait from observational data. However, the validity of such studies is threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to partially mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large genetic databases. Here, we describe a suite of sensitivity analysis tools for MR that enables investigators to properly quantify the robustness of their findings against these (and other) unobserved validity threats. Specifically, we propose the routine reporting of sensitivity statistics that can be used to readily quantify the robustness of a MR result: (i) the partial R2 of the genetic instrument with the exposure and the outcome traits; and, (ii) the robustness value of both genetic associations. These statistics quantify the minimal strength of violations of the MR assumptions that would be necessary to explain away the MR causal effect estimate. We also provide intuitive displays to visualize the sensitivity of the MR estimate to any degree of violation, and formal methods to bound the worst-case bias caused by violations in terms of multiples of the observed strength of principal components, batch effects, as well as putative pleiotropic pathways. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings, by showing that the MR estimate of the causal effect of body mass index (BMI) on diastolic blood pressure is relatively robust, whereas the MR estimate of the causal effect of BMI on Townsend deprivation index is relatively fragile.

Carlos Cinelli and Judea Pearl (2021). "Generalizing Experimental Results by Leveraging Knowledge of Mechanisms." European Journal of Epidemiology.
[ abstract ] [ preprint ] [ journal ] [ r package ]

We show how experimental results can be generalized across diverse populations by leveraging knowledge of local mechanisms that produce the outcome of interest, only some of which may differ in the target domain. We use Structural Causal Models (SCM) and a refined version of selection diagrams to represent such knowledge, and to decide whether it entails the invariance of probabilities of causation across populations, which then enables generalization. We further provide: (i) bounds for the target effect when some of these conditions are violated; (ii) new identification results for probabilities of causation and the transported causal effect when trials from multiple source domains are available; as well as (iii) a Bayesian approach for estimating the transported causal effect from finite samples. We illustrate these methods both with simulated data and with a real example that transports the effects of Vitamin A supplementation on childhood mortality across different regions.

Chi Zhang, Carlos Cinelli, Bryant Chen, and Judea Pearl (2021). "Exploiting Equality Constraints in Causal Inference." International Conference on Artificial Intelligence and Statistics (AISTATS).
[ abstract ] [ preprint ] [ journal ]

Assumptions about equality of effects are commonly made in causal inference tasks. For example, the well-known “difference-in-differences” method assumes that confounding remains constant across time periods. Similarly, it is not unreasonable to assume that causal effects apply equally to units undergoing interference. Finally, sensitivity analysis often hypothesizes equality among existing and unaccounted for confounders. Despite the ubiquity of these “equality constraints,” modern identification methods have not leveraged their presence in a systematic way. In this paper, we develop a novel graphical criterion that extends the well-known method of generalized instrumental sets to exploit such additional constraints for causal identification in linear models. We further demonstrate how it solves many diverse problems found in the literature in a general way, including differencein-differences, interference, as well as benchmarking in sensitivity analysis.

Carlos Cinelli and Chad Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted Variable Bias." Journal of the Royal Statistical Society, Series B (Statistical Methodology).
(featured in the most read and most cited articles of the journal)
[ abstract ] [ preprint ] [ journal ] [ r package ] [ shiny app ] [ stata module ] [ python package ] [ video 1 ] [ video 2 ]

We extend the omitted variable bias framework with a suite of tools for sensitivity analysis in regression models that: (i) does not require assumptions about the treatment assignment nor the nature of confounders; (ii) naturally handles multiple confounders, possibly acting non-linearly; (iii) exploits expert knowledge to bound sensitivity parameters; and, (iv) can be easily computed using only standard regression results. In particular, we introduce two novel sensitivity measures suited for routine reporting. The robustness value describes the minimum strength of association unobserved confounding would need to have, both with the treatment and the outcome, to change the research conclusions. The partial R2 of the treatment with the outcome shows how strongly confounders explaining all the residual outcome variation would have to be associated with the treatment to eliminate the estimated effect. Next, we offer graphical tools for elaborating on problematic confounders, examining the sensitivity of point estimates, t-values, as well as “extreme scenarios”. Finally, we describe problems with a common “benchmarking” practice and introduce a novel procedure to formally bound the strength of confounders based on comparison to observed covariates. We apply these methods to a running example that estimates the effect of exposure to violence on attitudes toward peace.

Daniel Kumor, Carlos Cinelli and Elias Bareinboim (2020). "Efficient Identification in Linear Structural Causal Models with Auxiliary Cutsets." International Conference on Machine Learning (ICML).
[ abstract ] [ preprint ] [ journal ]

We develop a new polynomial-time algorithm for identification of structural coefficients in linear causal models that subsumes previous state-of-the-art methods, unifying several disparate approaches to identification in this setting. Building on these results, we develop a procedure for identifying total causal effects in linear systems.

Carlos Cinelli, D. Kumor, B. Chen, J. Pearl and E. Bareinboim (2019). "Sensitivity Analysis of Linear Structural Causal Models." International Conference on Machine Learning (ICML).
[ abstract ] [ preprint ] [ journal ] [ short video ]

Causal inference requires assumptions about the data generating process, many of which are unverifiable from the data. Given that some causal assumptions might be uncertain or disputed, formal methods are needed to quantify how sensitive research conclusions are to violations of those assumptions. Although an extensive literature exists on the topic, most results are limited to specific model structures, while a general-purpose algorithmic framework for sensitivity analysis is still lacking. In this paper, we develop a formal, systematic approach to sensitivity analysis for arbitrary linear Structural Causal Models (SCMs). We start by formalizing sensitivity analysis as a constrained identification problem. We then develop an efficient, graph-based identification algorithm that exploits non-zero constraints on both directed and bidirected edges. This allows researchers to systematically derive sensitivity curves for a target causal quantity with an arbitrary set of path coefficients and error covariances as sensitivity parameters. These results can be used to display the degree to which violations of causal assumptions affect the target quantity of interest, and to judge, on scientific grounds, whether problematic degrees of violations are plausible.

Carlos Cinelli and Judea Pearl (2018). "On the utility of Causal Diagrams for Modeling Attrition." Epidemiology.
[ abstract ] [ preprint ] [ journal ]

In a recent communication, Breskin, Cole and Hudgens aimed to demonstrate “how single-world intervention graphs can supplement traditional causal diagrams”. The example used in their demonstration involved selection bias due to attrition, namely, subjects dropping out from a randomized trial before the outcome is observed. Here we use the same example to demonstrate the opposite conclusion; the derivation presented by Breskin et al. is in fact longer and more complicated than the standard, three-step derivation facilitated by traditional causal diagrams. We further show that more natural solutions to attrition problems are obtained when viewed as missing-data problems encoded in causal diagrams.

Pre-PhD: Before turning my attention to causal and statistical methodology, I used to write about a quite different topic. Below you can find some of my pre-doctoral publications on the history of economic thought (most in portuguese).

Carlos Cinelli and Rogerio Arthmar. "The debating tradition in Britain and the new political economy: William Thompson and John Stuart Mill at the London Co-operative Society in 1825." Nova Economia, v.28 (2), p.609-636, 2018.

Rogerio Arthmar and Carlos Cinelli (in portuguese). "The classical economics between laissez-faire and socialism." EconomiA, v. 14, p. 227-252, 2013.

Carlos Cinelli (in portuguese). "Voluntary transfers and municipal corruption in Brazil: preliminary evidence from the irregular accounts registry of the Federal Court of Accounts." Revista Economia e Tecnologia, v. 7, p. 89-97, 2011.

Carlos Cinelli and Rogerio Arthmar (in portuguese). "When the classical liberal and the socialist confront: Bastiat, Proudhon and capital rent." Nova Economia, v. 20, p. 509-541, 2010.

Tutorials

Carlos Cinelli and Elias Bareinboim (2019). "Generalizability in Causal Inference." Tutorial prepared for the Southern California Methods Conference, Riverside, 2019.
[ slides ]

Carlos Cinelli (2016 - In Portuguese). "R Programming."
[ slides ]