Machine Learning Thesis Defense
- Newell-Simon Hall
- HANZHANG HU
- Ph.D. Student
- Machine Learning Department
- Carnegie Mellon University
Anytime Predictions and Learning for the Balance between Computation and Accuracy
When choosing machine learning algorithms, one often has to balance between two opposing factors, the computational speed and the accuracy of predictors. The trade-off during testing is often difficult to balance, because the test-time computational budget may be agnostic at training, or the budget may vary during testing. Analogously, given a novel data-set, one often lacks prior knowledge in the appropriate predictor complexity and training computation, and furthermore, may want to interrupt or extend the training based on training results.
In this work, we address the trade-off between computation and accuracy via anytime prediction and learning. Anytime algorithms can be interrupted at any time and still produce valid solutions. Furthermore, the quality of the results improves with the consumed computation before interruption. With the versatility to adjust to any budget, anytime algorithms automatically utilize any agnostic computational budget to the maximum extent.
To address the test-time trade-off, we study anytime predictors, whose prediction computation can be interrupted during testing. We start with developing provably near-optimal anytime linear predictors, and derive a theoretical performance limitation for anytime predictors that are based on ensemble methods. Then we develop practical anytime predictions within individual neural networks via multi-objective optimization. Furthermore, leveraging these anytime predictors as weak learners, we circumvent the performance limitation on ensemble-based anytime predictors.
For the train-time trade-off, we consider the neural architecture search problem, where one seeks the optimal neural network structure for a data-set. We draw a parallel between this bi-level combinatorial optimization problem and the feature selection problem for linear prediction, and develop an iterative network growth algorithm that is inspired by a forward selection algorithm. We also consider the problem of training on large data-sets, and develop no-regret gradient boosting algorithms for stochastic data streams.
J. Andrew Bagnell (Co-Chair)
Martial Hebert (Co-Chair)
Rich Caruana (Microsoft)