Language Technologies Ph.D. Thesis Defense
- Gates Hillman Centers
- ZHILIN YANG
- Ph.D. Student
- Language Technologies Institute
- Carnegie Mellon University
Generative Feature Learning
Because unlabeled data is more accessible and more abundant than labeled data, it is crucial to use unlabeled data to improve learning. There are two major learning paradigms towards this goal---the unsupervised pretraining method pretrains a language model on unlabeled data and performs finetuning on downstream task, while semi-supervised learning jointly optimizes loss functions on labeled and unlabeled data. To unify these two paradigms and catalyze advancements, we propose the framework of generative feature learning, where generative modeling is used as a tool to boost target task performance with unlabeled data.
Under this framework, we present instantiations for different scenarios. For unsupervised pretraining, we propose a novel architecture Transformer-XL that substantially improves language modeling. Following the success in language modeling, we present a new learning paradigm XLNet with which progress in language modeling can be translated to benefits in pretraining. In addition, we present three semi-supervised learning approaches targeting different use cases based on generating questions, generating complement data, and generating random walk paths on graphs.
Overall, the generative feature learning framework leads the development on a variety of research topics. Technically, our framework enables major breakthroughs in modeling long-range dependency, bridging the gap between language modeling and target tasks, and fundamental understanding of GAN-based semi-supervised learning. As a result, the presented methods hold or used to hold the state-of-the-art results on more than 30 benchmarks, including natural language inference, question answering, text classification, language modeling, semi-supervised learning, etc., demonstrating the effectiveness of the proposed framework and the individual technological advancements.
Ruslan Salakhutdinov (Co-chair)
William W. Cohen (Co-chair)
Jason Weston (Facebook AI Research)