Societal Computing Thesis Proposal
- Gates Hillman Centers
- Reddy Conference Room 4405
- IAIN J. CRUICKSHANK
- Ph.D. Student
- Ph.D. Program in Societal Computing
- Institute for Software Research, Carnegie Mellon University
Unsupervised Graph Learning for Multi-modal and Dynamic Data
Real-world phenomena can be described by many different types of data, like explicit networks or user-emitted text to describe online social media users. When there are different sets of data that describe the same entities, the data may be said to be multi-view or multi-modal. A distinct advantage of multi-modal data is that different modes of the data may better capture different areas of the underlying, latent structure of the data. Typically, each of the modes of the data are analyzed separately as it is often unclear how to combine them into one, cohesive data model. However, analyzing each of the modes separately does not fully exploit the advantage of multi-modal data, as any one mode of the data may not fully capture the underlying structure in a way all of the modes combine can. Furthermore, real-world phenomenon can also be dynamic in nature. This dynamic nature results in data that will change throughout time, which also necessitates hanges to any data model describing the phenomenon.
To create an updatable, cohesive, interpretable data model of multi-modal data, I propose the use of graph learning. The basic idea of graph learning is to find a ‘best-fit' graph to a collection of data which leverages the mathematical properties of a graph for a rich representation of the data. In this dissertation I will develop novel graph learning methods to fuse multi-modal data into a single, cohesive data model that can be incrementally updated. In particular, I propose to address three main problems with real-world, dynamic, multi-modal data. First, I propose to address the problem of graph learning on generic types of multi-modal data. Current methods have almost exclusively been applied to specific types of data, like image and genetic data. It is unclear whether these methods extend to more general types of data, like text or case attributes. Second, I propose to address the problem of when data comes with explicit networks, such as online social media data. Current multi-modal graph learning techniques struggle with fusing non-network and network data. Third, I will address the problem of maintaining a faithful graph data model when the multi-modal data is dynamic. Current methods of dynamically updating graphs of data rely on strong assumptions, such as all of the vertices in the graph being out-degree regular.
The above novel techniques combined provide a single, cohesive framework to represent real-world, dynamic, multimodal data for analysis. As such, I will then demonstrate the use of different network science metrics in order to characterize uncertainty in the data and indicate when the structure of the data is changing in meaningful ways. Together, these methods in graph learning and network analysis can be used to model large volumes of dynamic, unlabeled, multi-modal data and present it in such a way that enables a human analyst to perceive trends and communities of interest, understand uncertainty, and ultimately apply meaning to the data. Enabling human analysts to better understand and reason about their data is critical to many important endeavors, like intelligence analysis, research, and many others.
Kathleen M. Carley (Chair)
L. Rick Carley
Taya Berger-Wolf (University of Illinois, Chicago)