Machine Learning Thesis Proposal
- Remote Access - Zoom
- Virtual Presentation - ET
- SHAOJIE BAI
- Ph.D. Student
- Machine Learning Department
- Carnegie Mellon University
Equilibrium Approaches to Modern Deep Learning
Deep learning architectures have become one of the most successful, widely-adopted and well-studied methods in modern artificial intelligence. Accompanying these successes is also the increasingly complex and diverse architectural designs, at the foundation of which is a core concept: layers. In the forward pass, a deep network feeds the input through an explicit stack of L operators; and to update this network, the gradients are computed by backpropagating in reverse through these L transformations. However, this typical approach creates some major challenges. First, it is often the model architects' responsibility to pick the value L, design each layer, and schedule how we stack them. Second, each intermediate layer needs to be stored in memory, making deep networks eventually cost huge memory footprints which inflates as the depth L grows. Third, layers render deep networks inelastic, as they must follow the exact procedure instructed by the computation graphs without any deviation (e.g., skipping some layer i).
In this proposal, we revisit the fundamental concept of a "layer'', and propose a new layer-less perspective on deep architectures: deep equilibrium (DEQ) model. In the first part of the thesis, we discuss the general formulation of DEQ models, where the output is defined by an (implicit) equilibrium state (i.e., a fixed-point condition). We discuss how these models represent an "infinite-level'' neural network with just one layer. Importantly, we demonstrate how one can solve for, and then differentiate through, these equilibrium states directly with black-box solvers, thus eliminating the need for costly storage of any intermediate activations or inelastic computation graph. In the second part, we discuss how we can integrate the notion of hierarchy into these equilibrium models even without the concept of layers. We present empirical evidence of how modern structured layers can be embedded in this framework, as well as the flexibility and competitiveness of the equilibrium approach to deep learning. We devote the third part of the proposal to an in-depth analysis of the various issues and challenges of such approach, and shed light on how we could improve implicit models with regularizations on their implicitness and stability.
For our proposed work, we first plan to advance our study of the third part of the discussion. As these equilibrium models decouple their representational capacities (i.e., implicit functions) from the forward algorithms (i.e., black-box solvers), we can exploit extremely lightweight hypersolvers to facilitate forward fixed-point convergence. Going further ahead, we aim to better exploit the success of the implicit models to more realistic (e.g., large-batch) settings such as implicit representations and self-supervised learning.
Zico Kolter (Chair)
David Duvenaud (University of Toronto)
Vladlen Koltun (Intel Labs)
Zoom Participation. See announcement.