Artificial Intgelligence Seminar / Computer Science Speaking Skills Talk

  • Remote Access - Zoom
  • Virtual Presentation - ET
  • MISHA KHODAK
  • Ph.D. Student
  • Computer Science Department
  • Carnegie Mellon University
Speaking Skills

Factorized layers revisited: Compressing deep neural networks without playing the lottery

Machine learning models are rapidly growing in size, leading to increased training and deployment costs. While the most popular approach for training compressed models is trying to guess good "lottery tickets" or sparse subnetworks, we revisit the low-rank factorization approach, in which weights matrices are replaced by products of smaller matrices. We extend recent analyses of optimization of deep networks to motivate simple initialization and regularization schemes for improving the training of these factorized layers. Empirically these methods yield higher accuracies than popular pruning and lottery ticket approaches at the same compression level. We further demonstrate their usefulness in two settings beyond model compression: simplifying knowledge distillation and training Transformer-based architectures such as BERT. This is joint work with Neil Tenenholtz, Lester Mackey, and Nicolo Fusi.

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.

Zoom Participation. See announcement.

For More Information, Please Contact: 
Keywords: