While acoustic signals are continuous in nature, the ways that humans generate pitch in speech and music involve important discrete decisions. As a result, models of pitch must resolve a tension between continuous and combinatorial structure. Similarly, interpreting images of printed documents requires reasoning about both continuous pixels and discrete characters. Focusing on several different tasks that involve human artifacts, I'll present probabilistic models with this goal in mind.
First, I'll describe an approach to historical document recognition that uses a statistical model of the historical printing press to reason about images, and, as a result, is able to decipher historical documents in an unsupervised fashion. Based on this approach, I'll also demonstrate a related model that accurately predicts compositor attribution in the First Folio of Shakespeare. Next, I'll present an unsupervised system that transcribes acoustic piano music into a symbolic representation by jointly describing the discrete structure of sheet music and the continuous structure of piano sounds. Finally, I'll present a supervised method for predicting prosodic intonation from text that treats discrete prosodic decisions as latent variables, but directly models pitch in a continuous fashion.
Taylor Berg-Kirkpatrick joined the Language Technologies Institute at Carnegie Mellon University as an Assistant Professor in Fall 2016. Previously, he was a Research Scientist at Semantic Machines Inc and, before that, completed his Ph.D. in computer science at the University of California, Berkeley. Taylor's research focuses on using machine learning to understand structured human data, including language but also sources like music, document images, and other complex artifacts.
Faculty Host/Instructor: Alex Hauptmann