Machine Learning Thesis Proposal

  • SHASHANK SINGH
  • Ph.D. Student
  • Machine Learning Department
  • Carnegie Mellon University
Thesis Proposals

Estimating Probability Distributions and their Properties

This thesis studies several theoretical problems in nonparametric statistics and machine learning, mostly in the areas of estimating or generating samples from a probability distribution, estimating a real-valued functional of a probability distribution, or testing a hypothesis about a probability distribution, using IID samples from that distribution. For distribution estimation, we consider a large, novel class of losses, under which high-dimensional nonparametric distribution estimation is more tractable than under the usual $\L^2$ loss. These losses have with connections with recent methods such as generative adversarial modelling, helping to explain why these methods appear to perform well at problems that are intractable from traditional perspectives of nonparametric statistics. Our work on density functional estimation focuses on several types of integral functionals, such as information theoretic quantities (entropies, mutual informations, and divergences), measures of smoothness, and measures of (dis)similarity between distributions, which play important roles as subroutines elsewhere in statistics, machine learning, and signal processing. Finally, we propose to study some applications of these density functional estimators to classical hypothesis testing problems such as two-sample (homogeneity) or (conditional) independence testing. A consistent theme is that, although traditional nonparametric density estimation is intractable in high-dimensions, several equally useful tasks may be tractable, even with similar or more realistic assumptions on the distribution.

Thesis Committee:
Barnabás Póczos (Chair)
Ryan Tibshirani
Larry Wasserman
Bharath Sriperumbudur (Pennsylvania State University)

Copy of Proposal Draft Document

For More Information, Please Contact: 
Keywords: