Societal Computing Thesis Defense
- Remote Access Enabled - Zoom
- Virtual Presentation
- IAIN J. CRUICKSHANK
- Ph.D. Student
- Ph.D. Program in Societal Computing
- Institute for Software Research, Carnegie Mellon University
Multi-view Clustering of Social-based Data
In this thesis, I develop multi-view clustering techniques for social-based data and demonstrate their effectiveness on two real-world clustering problems. Many real-world phenomena produce various types, or views, of data that can be used to describe the structure present in the phenomena. When combined, the different views of the data often give rise to better analyses, as different views of the data capture different aspects of the underlying phenomena. However, combining this data for meaningful analysis is challenging due to differences between the various views, and having partial views, which are common in social-based scenarios. I explore the application of multi-view clustering to social-based data in order to exploit the different views of data that arise from complex social-based phenomena.
I first analyze techniques from multiple paradigms of multi-view clustering on a wide range of social-based multi-view data and propose a new hybrid paradigm for multi-view clustering. The results of the empirical tests show that while some existing techniques perform well on certain scenarios, they often perform poorly on others. Only two of the techniques, which were proposed in this work, perform well across all of the social-based scenarios and are robust to data difficulties like inter- and intra-view variance, which are common features of social-based data. I use these techniques to analyze two real-world, multi-view, social-based data scenarios: Hashtag usage on Twitter during the CVOID-19 pandemic and samples of malware collected from different threat actors. The results of the real-world case studies both provide meaningful clustering analyses and demonstrate the effectiveness of the proposed techniques at handling partially-complete and large-scale, multi-view data. Altogether, in this thesis, I demonstrate the suitability of and create techniques for, multi-view clustering in clustering complex, multi-view, social-based data. This thesis advances practical clustering analyses of large-scale, noisy, social-based data and contributes to the field of multi-view clustering in general.
Kathleen M. Carley (Chair)
L. Rick Carley (ECE)
J. Zico Kolter
Tanya Berger-Wolf (Ohio State University)
Zoom Participation Enabled. See announcement.