Machine Learning Thesis Defense

  • Ph.D. Student
  • Machine Learning Department
  • Carnegie Mellon University
Thesis Orals

Distribution and Histogram (DisH) Learning

Machine learning has made incredible advances in the last couple of decades. Notwithstanding, a lot of this progress has been limited to basic point-estimation tasks. That is, a large bulk of attention has been geared at solving problems that take in a static finite vector and map it to another static finite vector. However, we do not navigate through life in a series of point-estimation problems, mapping x to y. Instead, we find broad patterns and gather a far-sighted understanding of data by considering collections of points like sets, sequences, and distributions. Thus, contrary to what various billionaires, celebrity theoretical physicists, and sci-fi classics would lead you to believe, true machine intelligence is fairly out of reach currently. In order to bridge this gap, we have developed algorithms that understand data at an aggregate, holistic level.

This thesis pushes machine learning past the realm of operating over static finite vectors, to start reasoning ubiquitously with complex, dynamic collections like sets and sequences. We develop algorithms that consider distributions as functional covariates/responses, and methods that use distributions as internal representations. We consider distributions since they are a straightforward characterization of many natural phenomena and provide a richer description than simple point data by detailing information at an aggregate level. Our approach may be seen as addressing two sides of the same coin: on one side, we use traditional machine learning algorithms adjusted to directly operate on inputs and outputs that are probability functions (and sample sets); on the other side, we develop better estimators for traditional tasks by making use of and adjusting internal distributions.

We begin by developing algorithms for traditional machine learning tasks for the cases when one’s input (and/or possibly output) is not a finite point, but is instead a distribution, or sample set drawn from a distribution. We develop a scalable nonparametric estimator for regressing a real valued response given an input that is a distribution, a case which we coin distribution to real regression (DRR). Furthermore, we extend this work to the case when both the output response and the input covariate are distributions; a task we call distribution to distribution regression (DDR).

After, we look to expand the versatility and efficacy of traditional machine learning tasks through novel methods that operate with distributions of features. For example, we show that one may improve the performance of kernel learning tasks by learning a kernel’s spectral distribution in a data-driven fashion using Bayesian nonparametric techniques. Moreover, we study how to perform sequential modeling by looking at summary statistics from past points. Lastly, we also develop methods for high-dimensional density estimation that make use of flexible transformations of variables and autoregressive conditionals.

Thesis Committee:
Barnabas Poczos (Co-Chair)
Jeff Schneider (Co-Chair)
Ruslan Salakhutdinov
Le Song (Georgia Institute of Technology)

Copy of Thesis Document

For More Information, Please Contact: