Carlos Cinelli

[ home ] [ research ] [ software ] [ cv ]

See also my CV or Google Scholar profile.

Working Papers


Peer Reviewed


  1. Carlos Cinelli and Chad Hazlett (2024+). "An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables." Biometrika.
    [ abstract ] [ preprint ] [ r package ] [ video 1 ] [ video 2 ] [ video 3 ]

    We develop an "omitted variable bias" framework for sensitivity analysis of instrumental variable (IV) estimates that is immune to "weak instruments," naturally handles multiple "side-effects" (violations of the exclusion restriction assumption) and "confounders" (violations of the ignorability of the instrument assumption), exploits expert knowledge to bound sensitivity parameters, and can be easily implemented with standard software. Conveniently, we show that many pivotal conclusions regarding the sensitivity of the IV estimate (e.g. tests against the null hypothesis of a zero causal effect) can be reached simply through separate sensitivity analyses of two familiar auxiliary OLS estimates, namely, the effect of the instrument on the treatment (the "first stage") and the effect of the instrument on the outcome (the "reduced form"). More specifically, we introduce sensitivity statistics for routine reporting, such as (extreme) robustness values for IV estimates, describing the minimum strength that omitted variables need to have to invalidate the conclusions of an IV study. Next we provide visual displays that fully characterize the sensitivity of IV point estimates and confidence intervals to violations of the standard IV assumptions. Finally, we offer formal bounds on the worst possible bias under the assumption that the maximum explanatory power of omitted variables are no stronger than a multiple of the explanatory power of observed variables. We apply our methods in a running example that estimates the returns to schooling.

  2. Carlos Cinelli, Jeremy Ferwerda, and Chad Hazlett (2024). "sensemakr: Sensitivity Analysis Tools for OLS in R and Stata." Observational Studies.
    [ abstract ] [ preprint ] [ r package ] [ stata module ] [ video ]

    This paper introduces the package sensemakr for R and Stata, which implements a suite of sensitivity analysis tools for regression models developed in Cinelli and Hazlett (2020a). Given a regression model, sensemakr can compute sensitivity statistics for routine reporting, such as the robustness value, which describes the minimum strength that unobserved confounders need to have to overturn a research conclusion. The package also provides plotting tools that visually demonstrate the sensitivity of point estimates and t-values to hypothetical confounders. Finally, sensemakr implements formal bounds on sensitivity parameters by means of comparison with the explanatory power of observed variables. All these tools are based on the familiar "omitted variable bias" framework, do not require assumptions regarding the functional form of the treatment assignment mechanism nor the distribution of the unobserved confounders, and naturally handle multiple, non-linear confounders. With sensemakr, users can transparently report the sensitivity of their causal inferences to unobserved confounding, thereby enabling a more precise, quantitative debate as to what can be concluded from imperfect observational studies.

  3. Carlos Cinelli Andrew Forney, and Judea Pearl (2024). "A Crash Course in Good and Bad Controls." Sociological Methods and Research.
    (featured in the most read and most cited articles of the journal)
    [ abstract ] [ preprint ] [ journal ] [ slides ] [ video ] [ video portuguese ] [ r code ] [ python code ]

    Many students, especially in econometrics, express frustration with the way a problem known as “bad control” is evaded, if not mishandled, in the traditional literature. The problem arises when the addition of a variable to a regression equation produces an unintended discrepancy between the regression coefficient and the effect that the coefficient is expected to represent. Avoiding such discrepancies presents a challenge not only to practitioners of econometrics, but to all analysts in the data intensive sciences. This note describes graphical tools for understanding, visualizing, and resolving the problem through a series of illustrative examples. We have found that the examples presented here can serve as a powerful instructional device to supplement formal discussions of the problem. By making this “crash course” accessible to instructors and practitioners, we hope to avail these tools to a broader community of scientists concerned with the causal interpretation of regression models.

  4. Lang Liu, Carlos Cinelli, and Zaid Harchaoui. (2022). "Orthogonal Statistical Learning with Self-Concordant Loss." Annual Conference on Learning Theory (COLT)
    [ abstract ] [ preprint ] [ journal ]

    Orthogonal statistical learning and double machine learning have emerged as general frameworks for two-stage statistical prediction in the presence of a nuisance component. We establish non-asymptotic bounds on the excess risk of orthogonal statistical learning methods with a loss function satisfying a self-concordance property. Our bounds improve upon existing bounds by a dimension factor while lifting the assumption of strong convexity. We illustrate the results with examples from multiple treatment effect estimation and generalized partially linear modeling.

  5. Carlos Cinelli, N. LaPierre, B. Hill, S. Sankararaman and E. Eskin (2022). "Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy." Nature Communications.
    [ abstract ] [ preprint ] [ journal ] [ r package ] [ video 1 ] [ video 2 ]

    Mendelian Randomization (MR) exploits genetic variants as instrumental variables to estimate the causal effect of an "exposure" trait on an "outcome" trait from observational data. However, the validity of such studies is threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to partially mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large genetic databases. Here, we describe a suite of sensitivity analysis tools for MR that enables investigators to properly quantify the robustness of their findings against these (and other) unobserved validity threats. Specifically, we propose the routine reporting of sensitivity statistics that can be used to readily quantify the robustness of a MR result: (i) the partial R2 of the genetic instrument with the exposure and the outcome traits; and, (ii) the robustness value of both genetic associations. These statistics quantify the minimal strength of violations of the MR assumptions that would be necessary to explain away the MR causal effect estimate. We also provide intuitive displays to visualize the sensitivity of the MR estimate to any degree of violation, and formal methods to bound the worst-case bias caused by violations in terms of multiples of the observed strength of principal components, batch effects, as well as putative pleiotropic pathways. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings, by showing that the MR estimate of the causal effect of body mass index (BMI) on diastolic blood pressure is relatively robust, whereas the MR estimate of the causal effect of BMI on Townsend deprivation index is relatively fragile.

  6. Carlos Cinelli and Judea Pearl (2021). "Generalizing Experimental Results by Leveraging Knowledge of Mechanisms." European Journal of Epidemiology.
    [ abstract ] [ preprint ] [ journal ] [ r package ]

  7. We show how experimental results can be generalized across diverse populations by leveraging knowledge of local mechanisms that produce the outcome of interest, only some of which may differ in the target domain. We use Structural Causal Models (SCM) and a refined version of selection diagrams to represent such knowledge, and to decide whether it entails the invariance of probabilities of causation across populations, which then enables generalization. We further provide: (i) bounds for the target effect when some of these conditions are violated; (ii) new identification results for probabilities of causation and the transported causal effect when trials from multiple source domains are available; as well as (iii) a Bayesian approach for estimating the transported causal effect from finite samples. We illustrate these methods both with simulated data and with a real example that transports the effects of Vitamin A supplementation on childhood mortality across different regions.

  8. Chi Zhang, Carlos Cinelli, Bryant Chen, and Judea Pearl (2021). "Exploiting Equality Constraints in Causal Inference." International Conference on Artificial Intelligence and Statistics (AISTATS).
    [ abstract ] [ preprint ] [ journal ]

  9. Assumptions about equality of effects are commonly made in causal inference tasks. For example, the well-known “difference-in-differences” method assumes that confounding remains constant across time periods. Similarly, it is not unreasonable to assume that causal effects apply equally to units undergoing interference. Finally, sensitivity analysis often hypothesizes equality among existing and unaccounted for confounders. Despite the ubiquity of these “equality constraints,” modern identification methods have not leveraged their presence in a systematic way. In this paper, we develop a novel graphical criterion that extends the well-known method of generalized instrumental sets to exploit such additional constraints for causal identification in linear models. We further demonstrate how it solves many diverse problems found in the literature in a general way, including differencein-differences, interference, as well as benchmarking in sensitivity analysis.

  10. Carlos Cinelli and Chad Hazlett (2020). "Making Sense of Sensitivity: Extending Omitted Variable Bias." Journal of the Royal Statistical Society, Series B (Statistical Methodology).
    (featured in the most read and most cited articles of the journal)
    [ abstract ] [ preprint ] [ journal ] [ r package ] [ shiny app ] [ stata module ] [ python package ] [ video 1 ] [ video 2 ]

  11. We extend the omitted variable bias framework with a suite of tools for sensitivity analysis in regression models that: (i) does not require assumptions about the treatment assignment nor the nature of confounders; (ii) naturally handles multiple confounders, possibly acting non-linearly; (iii) exploits expert knowledge to bound sensitivity parameters; and, (iv) can be easily computed using only standard regression results. In particular, we introduce two novel sensitivity measures suited for routine reporting. The robustness value describes the minimum strength of association unobserved confounding would need to have, both with the treatment and the outcome, to change the research conclusions. The partial R2 of the treatment with the outcome shows how strongly confounders explaining all the residual outcome variation would have to be associated with the treatment to eliminate the estimated effect. Next, we offer graphical tools for elaborating on problematic confounders, examining the sensitivity of point estimates, t-values, as well as “extreme scenarios”. Finally, we describe problems with a common “benchmarking” practice and introduce a novel procedure to formally bound the strength of confounders based on comparison to observed covariates. We apply these methods to a running example that estimates the effect of exposure to violence on attitudes toward peace.

  12. Daniel Kumor, Carlos Cinelli and Elias Bareinboim (2020). "Efficient Identification in Linear Structural Causal Models with Auxiliary Cutsets." International Conference on Machine Learning (ICML).
    [ abstract ] [ preprint ] [ journal ]

  13. We develop a new polynomial-time algorithm for identification of structural coefficients in linear causal models that subsumes previous state-of-the-art methods, unifying several disparate approaches to identification in this setting. Building on these results, we develop a procedure for identifying total causal effects in linear systems.

  14. Carlos Cinelli, D. Kumor, B. Chen, J. Pearl and E. Bareinboim (2019). "Sensitivity Analysis of Linear Structural Causal Models." International Conference on Machine Learning (ICML).
    [ abstract ] [ preprint ] [ journal ] [ short video ]

  15. Causal inference requires assumptions about the data generating process, many of which are unverifiable from the data. Given that some causal assumptions might be uncertain or disputed, formal methods are needed to quantify how sensitive research conclusions are to violations of those assumptions. Although an extensive literature exists on the topic, most results are limited to specific model structures, while a general-purpose algorithmic framework for sensitivity analysis is still lacking. In this paper, we develop a formal, systematic approach to sensitivity analysis for arbitrary linear Structural Causal Models (SCMs). We start by formalizing sensitivity analysis as a constrained identification problem. We then develop an efficient, graph-based identification algorithm that exploits non-zero constraints on both directed and bidirected edges. This allows researchers to systematically derive sensitivity curves for a target causal quantity with an arbitrary set of path coefficients and error covariances as sensitivity parameters. These results can be used to display the degree to which violations of causal assumptions affect the target quantity of interest, and to judge, on scientific grounds, whether problematic degrees of violations are plausible.

  16. Carlos Cinelli and Judea Pearl (2018). "On the utility of Causal Diagrams for Modeling Attrition." Epidemiology.
    [ abstract ] [ preprint ] [ journal ]

  17. In a recent communication, Breskin, Cole and Hudgens aimed to demonstrate “how single-world intervention graphs can supplement traditional causal diagrams”. The example used in their demonstration involved selection bias due to attrition, namely, subjects dropping out from a randomized trial before the outcome is observed. Here we use the same example to demonstrate the opposite conclusion; the derivation presented by Breskin et al. is in fact longer and more complicated than the standard, three-step derivation facilitated by traditional causal diagrams. We further show that more natural solutions to attrition problems are obtained when viewed as missing-data problems encoded in causal diagrams.


    Pre-PhD: Before turning my attention to causal and statistical methodology, I used to write about a quite different topic. Below you can find some of my pre-doctoral publications on the history of economic thought (most in portuguese).



  18. Carlos Cinelli and Rogerio Arthmar. "The debating tradition in Britain and the new political economy: William Thompson and John Stuart Mill at the London Co-operative Society in 1825." Nova Economia, v.28 (2), p.609-636, 2018.

  19. Rogerio Arthmar and Carlos Cinelli (in portuguese). "The classical economics between laissez-faire and socialism." EconomiA, v. 14, p. 227-252, 2013.

  20. Carlos Cinelli (in portuguese). "Voluntary transfers and municipal corruption in Brazil: preliminary evidence from the irregular accounts registry of the Federal Court of Accounts." Revista Economia e Tecnologia, v. 7, p. 89-97, 2011.

  21. Carlos Cinelli and Rogerio Arthmar (in portuguese). "When the classical liberal and the socialist confront: Bastiat, Proudhon and capital rent." Nova Economia, v. 20, p. 509-541, 2010.

Tutorials