Machine learning (ML) has become one of the most powerful classes of tools for artificial intelligence, personalized web services and data science problems across fields. Within the field of machine learning itself, there had been quite a number of paradigm shifts caused by the explosion of data size, computing power, modeling tools, and the new ways people collect, share, and make use of data sets.
Data privacy, for instance, was much less of a problem before the availability of personal information online that could be used to identify users in anonymized data sets. Images, videos, as well as observations generated over a social networks, often have highly localized structures, that cannot be captured by standard nonparametric models. Moreover, the "common task framework'" that is adopted by many sub-disciplines of AI has made it possible for many people to collaboratively and repeated work on the same data set, leading to implicit overfitting on public benchmarks and questionable scientific discoveries.
This thesis presents technical results under a number of new mathematical frameworks that are designed to partially address these issues. These include differentially private learning, locally adaptive nonparametric regression and sequential selective estimation (Gaussian adaptive data analysis). The talk will highlight a few aspects of this thesis's contribution to these problems relative to what was known in the literature.
Ryan Tibshirani (Chair)
Adam Smith (Boston University)