End-to-end (E2E) models have become a new paradigm shift in the ASR community in recent years. These models replace the acoustic, pronunciation and language models of a conventional cloud-based ASR system by one neural network at a fraction of the size, making this attractive for on-device applications. In this talk, I will outline various advances in our team towards improving the quality and latency of E2E models such that they surpass the performance of cloud-based models on both metrics.
Tara Sainath received her PhD in Electrical Engineering and Computer Science from MIT in 2009. The main focus of her PhD work was in acoustic modeling for noise robust speech recognition. After her PhD, she spent 5 years at the Speech and Language Algorithms group at IBM T.J. Watson Research Center, before joining Google Research. She has served as a Program Chair for ICLR in 2017 and 2018. Also, she has co-organized numerous special sessions and workshops, including Interspeech 2010, ICML 2013, Interspeech 2016 and ICML 2017. In addition, she was a member of the IEEE Speech and Language Processing Technical Committee (SLTC) as well as the Associate Editor for IEEE/ACM Transactions on Audio, Speech, and Language Processing. Her research interests are mainly in deep neural networks.
The LTI COlloquium is generously sponsored by Abridge.
In Person and Zoom Participation. See announcement.