Machine Learning/Statistics Thesis Proposal

  • Ph.D. Student
  • Joint Ph.D. Program in Statistics & Machine Learning
  • Carnegie Mellon University
Thesis Proposals

Causal Inference with Complex Data Structures and Non-Standard Effects

Causal inference is essential for answering many important questions in health, public policy, and across science. However, there are non-trivial complications outside of classical settings involving for example randomized trials, parametric models, and simple data struc- tures. In my proposed work I explore how to effectively harness modern machine learning tools to overcome some of the methodological limitations of classical causal inference. My work can be categorized into the following three sub-topics.

a.) Stochastic interventions for general longitudinal data. We extend novel ”incremental” intervention effects to be employed in general longitudinal observational studies with many timepoints, repeated outcomes, and dropout. Importantly, our methods do not require any positivity or parametric assumptions. We show how the proposed method is less sensitive to the curse of dimensionality and yields estimators that incorporate flexible regression methods while still achieving fast root-n rates under weak conditions.

b.) Causal effects based on distributional distances. We have proposed a new form of non- standard causal effect that can identify nuanced and valuable information about causality, based on distances between counterfactual outcome distributions (e.g., L1 distance), rather than simple mean shifts. We consider single- and multi-source randomized studies, as well as observational studies, and analyze error bounds and asymptotic properties of each of the proposed estimators. Special difficulties arise due to the non-smoothness of the L1 distance functional.

c.) Causal clustering. We will extend modern clustering methods to identify heterogeneity of treatment effects. We aim to provide a general framework of causal clustering in which we can utilize clustering methods with counterfactual outcomes (or their conditional means) to identify subgroups with similar level of treatment effects, and explore how we can conserve favorable theoretical properties of widely used clustering methods.

Thesis Committee:
Edward H. Kennedy (Chair)
Barnabas Poczos
Larry A. Wasserman 
Sivaraman Balakrishnan
Ashley I. Naimi (University of Pittsburgh)

Copy of Draft Proposal Document



For More Information, Please Contact: