Language Technologies Ph.D. Thesis Proposal

  • Remote Access - Zoom
  • Virtual Presentation - ET
  • Ph.D. Student
  • Language Technologies Institute
  • Carnegie Mellon University
Thesis Proposals

Automated Extraction of Language Descriptions for Under-Resourced Languages

Creating a language description which illustrates the salient points of the language is not only important for language understanding but is also an indispensable step for language documentation and preservation. In this thesis, we propose methods for automatically extracting language descriptions which describe the different linguistic phenomena covering aspects of morphology, syntax and lexical seman- tics. As part of these language descriptions, we aim to provide a set of guiding principles to help understand the different linguistic phenomena in a format that can be easily used by language experts as well as learners to serve their goals. Many of these guiding principles are governed using syntactic, lexical and/or semantic features of the language. Through the advances in natural language processing (NLP) research, we can automate some of the processes involved in creating descriptions across different languages. However, most state- of-the-art methods require an abundance of labeled data, which however, is often not readily available for under-resourced languages. Therefore, in the first part of the thesis, we focus on improving NLP methods for automatically extracting such features for under-resourced languages and in the second part we propose methods for automatic extraction of descriptions using these features.

In the second part of the thesis, we apply the extracted features to automatically extract language descriptions and analyses covering aspects of morphology, syntax and lexical semantics. We propose a general framework to extract these descriptions in a human- and machine-readable format, and design human and automatic evaluation methods to evaluate the extracted rules. We create an online interface tool to visualize and explore the extracted descriptions. Furthermore, having such descriptions in a machine-readable format can help further natural language processing (NLP) applications.

Thesis Committee:
Graham Neubig (Chair) 
Alan Black
David R. Mortensen
Antonios Anastasopoulos (George Mason University) 
Isabelle Augenstein (University of Copenhagen)

Additional Information

Zoom Participation. See announcement.

For More Information, Please Contact: