Computer Science Thesis Proposal

  • Gates&Hillman Centers
  • Traffic 21 Classroom 6501
  • MIGUEL ARAUJO
  • Ph.D. Student
  • Computer Science Department
  • Carnegie Mellon University
Thesis Proposals

Communities and Anomalies in Large Edge-Labeled Graphs

The identification of anomalies and communities of nodes in real-world graphs has applications in widespread domains, from the automatic categorization of wikipedia articles or websites to bank fraud detection. While recent and ongoing research is supplying tools for the analysis of simple unlabeled data, it is still a challenge to find patterns and anomalies in large labeled datasets, such as time evolving networks. What do real communities identified in big datasets look like? How is their structure affected by their size? How can we find realistic communities in labeled data?

The completed work of this proposal details three related problems in this area. Firstly, we explore the shape and structure of real communities in large networks and we introduce the concept of ā€¯hyperbolic communitiesā€¯, providing two different algorithms for finding such structures in large datasets. Secondly, we find communities in edge-labeled networks, where labels can be timesteps or any other categorical information in general. We describe efficient algorithms for this task. Lastly, we study anomalies in bank transaction networks, where both nodes and edges are labeled. We describe parallel algorithms that automatically find locations
where bank accounts were compromised in billion-scale networks.

We also detail future work (1) on the distributed detection of edge-labeled communities, (2) on forecasting communities to the future, predicting what members are going to join and finding the most common community profiles, and (3) on the existence of hyperbolic communities in word-networks, merging community detection and the known heavy-tailed distribution of word frequencies.

Thesis Committee:
Christos Faloutsos (Co-Chair)
Pedro Ribeiro (Co-Chair, University of Porto)
William Cohen
Aarti Singh
Tina Eliassi-Rad (Northeastern University)
Beatriz Santos (University of Aveiro)
Alexandre Francisco (University of Lisbon)

Copy of Proposal Summary

For More Information, Please Contact: 
Keywords: