Language Technologies Thesis Proposal
- Gates Hillman Centers
- XUEZHE MA
- Ph.D. Student
- Language Technologies Institute
- Carnegie Mellon University
Neural Networks for Linguistic Structured Prediction and Its Interpretability
Linguistic structured prediction, such as sequence labeling, syntactic and semantic parsing, and coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community, and has been applied to a wide range of down-stream tasks.
Most traditional high performance linguistic structured prediction models are linear statistical models, including Hidden Markov Models (HMM) and Conditional Random Fields (CRF), which rely heavily on hand-crafted features and task-specific resources. However, such task-specific knowledge is costly to develop, making structured prediction models difficult to adapt to new tasks or new domains. In the past few years, non-linear neural networks with as input distributed word representations, also known as word embeddings, have been broadly applied to NLP problems with great success. By utilizing distributed representations as inputs, these systems are capable of learning hidden information representations directly from data instead of manually designing hand-crafted features. Despite the impressive empirical successes of applying neural networks to linguistic structured prediction tasks, there are at least two major problems: 1) there is no a consistent architecture for, at least of components of, different structured prediction tasks that is able to be trained in a truely end-to-end setting. 2) understanding the role of different parts of the deep neural network is difficult.
In this thesis, we will discuss the two of the major problems in current neural models, and attempt to provide solutions to address them. In the first part of this thesis, we propose a consistent neural architecture for the encoding component, named BLSTM-CNNs, across different structured prediction tasks. It is a truly end-to-end model requiring no task-specific resources, feature engineering, or data pre-processing beyond pre-trained word embeddings on unlabeled corpora. Thus, our model can be easily applied to a wide range of structured prediction tasks on different languages and domains. We apply this encoding architecture to different tasks including sequence labeling and graph and transition-based dependency parsing, combined with different structured output layers, achieving state-of-the-art performance. In the second part of this thesis, we investigate what kind of linguistic information is represented in deep neural models for natural languages, with dependency parsing as the test bed.
Eduard Hovy (Chair)
Joakim Nivre (Uppsala University)