Efficient ML Training and Inference with Dynamic Hyperparameter Optimization
Recent advances in machine learning have made deep neural networks (DNNs) a fundamental building block of deployed services and applications. However, training DNNs is time-consuming and serving trained DNNs is computationally expensive. Tuning critical hyperparameters improves the efficiency of DNN training and serving, as well as quality of the resulting model. However, almost all of these hyperparameters are generally chosen once at the beginning of training and remain static. We argue that, instead of searching for a single best setting for a hyperparameter, practitioners can achieve superior results by making these hyperparameters adaptive, thus allowing them to fluctuate in response to changing conditions during training and deployment. This has been shown to be true for adaptive learning rates, which are now a standard component of state of the art training regimes. In this thesis we argue that this principle should be extended generally. We provide evidence showing that using runtime information to dynamically adapt hyperparameters that are traditionally static, such as emphasis on individual training examples, augmentation applied to those examples, and the weights updated during transfer learning, can increase the accuracy and efficiency of ML training and inference.
Gregory R. Ganger (Chair)
David G. Andersen
Michael Kozuch (Intel Labs)
Padmanabhan Pillai (Intel Labs)
Rahul Sukthankar (Google)