LTI Special Seminar

  • Sanda Harabagiu
  • Natural Language Processing Group, Southern Methodist Univ.

Boosting Knowledge for Open-Domain Answer Engines

The design of open-domain answer engines is guided by twothrusts. First, natural language processing (NLP) methods are used toderive the questions semantics, in order to identify the candidateanswers in the text collections. These methods are integrated withspecially crafted information retrieval (IR) techniques that returnall text paragraphs of interest. Second, to be able to extractthe correct answers, bag-of-words approaches are not alwayssufficient. They are replaced by surface-based NLP methods that areboosted with pragmatic knowledge that filters out incorrect answers.The boosting methodology relies on several new sources of pragmaticknowledge. First, we considered that it is likely that an answerengine would be presented with reformulations of previously posedquestions. Thus we devised an approach of recognizing questionreformulations and caching their corresponding answers. Secondly, wedesigned a new paragraph retrieval mechanism that enables keywordalternations, such that paraphrases of question concepts and even somerelated concepts are included in the search for the textualanswer. Finally, instead of operating at word level, we have escalatedour extraction methods to operate at the level of dependencies betweenwords, thus better approximating the semantics of questions andanswers. Without any loss of robustness and without downgrading theelegance of our answer engine, we enable the representation ofquestions and answers into semantic forms based on information broughtforward by fast, wide-coverage probabilistic parsers. Furthermore, bytranslating the semantic forms into logical forms, we enable ajustification option relying on minimal abductive knowledge. The proofmechanism is easily extensible for special domains orsituations. Sanda Harabagiu is an Assistant Professor in the Department of ComputerScience and Engineering at Southern Methodist University, Dallas TX.She received a PhD in Computer Engineering from the University of SouthernCalifornia, Los Angeles in 1997 and a Doctorate in Computer Science fromthe University of Rome "Tor Vergata", Italy, in 1994. Prior to joiningSMU, Dr. Harabagiu was a researcher in the Artificial Intelligence Centerat SRI International, Menlo Park, California. Dr. Harabagiu is a recipientof the National Science Foundation CAREER award.
For More Information, Please Contact: 
Catherine Copetas, copetas@cs.cmu.edu