AI Seminar

  • Michael Collins
  • AT&T Labs-Research

Statistical Models for Natural Language Parsing

This talk will discuss the problem of machine learning applied tonatural language parsing: given a set of example sentence/tree pairs,the task is to learn a function from sentences to trees whichgeneralizes well to new sentences.In the first part of the talk I will review recent work onprobabilistic, history-based approaches. Much of the recent success ofthese methods has been due to the incorporation of lexicallyconditioned parameters. I will discuss the importance of head wordsand dependency parameters, and also the use of estimation methods suchas decision trees or maximum entropy methods.While history-based models have several advantages, it can be awkwardto encode some constraints within this framework. It is often easy tothink of features which might be useful in discriminating betweencandidate trees for a sentence, but much more difficult to alter themodel to take these features into account. In the second part of thetalk I will review more recent work on learning methods which promiseto be considerably more flexible in incorporating features. I willdiscuss how three such approaches -- boosting, support vector machinesand markov random fields -- can be applied to parsing, and thesimilarities and relative advantages of the three approaches.----------------------------------------------------------------------Bio:Michael Collins did his undergraduate studies in ElectricalEngineering at Cambridge University, and went on to do a Masters inSpeech and Language Processing, also at Cambridge. He received his PhDfrom University of Pennsylvania in 1998, and has been at AT&Tlabs-research since January 1999, most recently in the AIdepartment. His research interests are in machine-learning approachesto natural language processing.
For More Information, Please Contact: 
Catherine Copetas, copetas@cs.cmu.edu