Language Technologies Institute Colloquium

  • Assistant Professor
  • College of Information and Computer Sciences
  • University of Massachusetts Amherst

Inferring social events from language, and social bias in language analysis

What can text analysis tell us about society?  Enormous corpora of news, social media, and historical documents record events, beliefs, and culture.  Automated text analysis scales to large data sets, and can assist in discovering patterns and themes.

Time permitting, I will discuss two projects.  The first is in socially relevant event extraction from the news.  We tackle the surprising lack of systematic records on police killings of civilians in the U.S., by helping automate the extraction of these fatality events from news articles, in order to assist manual curation efforts.  Our methods make use of distant supervision and outperform extractors used in previous NLP research.

Second, in addition to using NLP to advance social understanding, findings from the social sciences can better inform the design of artificial intelligence. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter, through a demographically supervised model to identify AAE-like language associated with geo-located messages.  We verify that this language follows well-known AAE linguistic phenomena -- and furthermore, existing tools like language identification, part-of-speech tagging, and dependency parsing fail on this AAE-like language more often than text associated with white speakers.  We leverage our model to fix racial bias in some of these tools, and discuss future implications for fairness and artificial intelligence.

Brendan O'Connor is an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst, and works in the intersection of computational social science and natural language processing -- studying how social factors influence language technologies, and how to better understand social trends with text analysis.  For example, he investigates racial bias in NLP technologies, political events reported in news, and opinions and slang in Twitter.  His work recently received a workshop best paper award and has been featured in the New York Times and the Wall Street Journal.  He received his PhD in 2014 from Carnegie Mellon University's Machine Learning Department, advised by Noah Smith, and has previously been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and worked in industry in the Facebook Data Science group and at the crowdsourcing startup Crowdflower.  He started studying the intersection of AI and social science in Symbolic Systems (B.S./M.S.) at Stanford University.

For More Information, Please Contact: