Language Technologies Ph.D. Thesis Proposal
- Remote Access - Zoom
- Virtual Presentation - ET
- XIANG KONG
- Ph.D. Student
- Language Technologies Institute
- Carnegie Mellon University
Towards Efficient Neural Machine Translation
Machine translation (MT), the use of machines to automatically translate from one language to others, aims to overcome language barriers among people from different cultures. Recently, neural network-based machine translation (NMT) models significantly narrow the gap between machine and human translations in terms of translation accuracy. However, some new challenges are also introduced and the efficiency is one of the most important issues. Specifically, with a complex deep network structure, NMT models generally have high space and computational costs, hindering their deployment in real-time applications with strict latency requirements or devices with limited memory resources. In this proposal, we consider three efficiency-related challenges that NMT faces, (1) architecture efficiency: NMT, similar to other deep learning models, employs a deep network structure with a large number of parameters and high model complexity, resulting in large storage sizes and slow training speed; (2) decoding efficiency: NMT has higher inference latency and one main reason is the autoregressive property of its conventional decoding algorithm that only generates one token at a time; (3) efficiency in multilingual NMT: to better support translations between multiple languages, a popular strategy is to employ a deeper encoder and decoder structure with increased model capacity. However, the extra latency and memory costs introduced by this approach make it unacceptable for efficiency or memory-constrained applications.
This proposal consists of three parts to tackle these challenges respectively. First, to improve the architecture efficiency, we introduce a better word encoding mechanism to significantly reduce the time and space consumption of the embedding and softmax layers. We propose to reduce the computational complexity of the self-attention mechanism in Transformer. Then we explore nonautoregressive decoders which generate tokens in parallel. We develop two methods to improve its translation quality without hurting its speed advantage. Finally, we investigate the efficiency issue in the multilingual translation scenario. We first study the speed-accuracy trade-off for multilingual translation and achieve a better one through the model capacity allocation. We plan to explore improving one-to-many translation in an universal encoder-decoder architecture with a single shallow decoder through mixture-of-experts (MoEs)
Eduard Hovy (Chair)
Alexander Rush (Cornell University)
Zoom Participation. See announcement.