While deep convolutional neural networks (CNNs) achieve the state-of-the-art performance in visual recognition, they are also criticized as being black boxes that lack interpretability. In this work, we propose a framework called Network Dissection to quantify the interpretability of latent representations of CNNs. By evaluating the alignment between individual hidden units and a set of semantic concepts across a range of objects, parts, scenes, textures, materials, and colors, we associate each hidden unit with a label and a interpretability score automatically. We first verify the hypothesis that the interpretability of units is not equivalent to random linear combinations of units. Then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks, and when trained under different training regularizations. We show that the interpretability of CNNs sheds light on the characteristics of networks that go beyond measurements of their classification power. More detail and the relevant paper are available.

Bolei Zhou is a 5th-year Ph.D. Candidate in Computer Science and Artificial Intelligence Laboratory at MIT, working with Prof. Antonio Torralba. His research is on computer vision and machine learning, with particular interest in visual scene understanding and interpretable machine learning. He is the award recipient of Facebook Fellowship, Microsoft Research Asia Fellowship,MIT Greater China Fellowship. His research work has been covered by TechCrunch, Quartz, and MIT News.

Sponsored in part by Disney Research.

Speech is the dominant modality in human-human communication. It is supported in subtle ways through other communicative cues (e.g., gestures, eye-gaze, and haptics). These cues, although subtle, play a major role in enriching human-human interaction by communicating complementary information. In this talk, I will present case studies that demonstrate the wide range of information that can be extracted from subtle cues, and will show examples of how human-computer interaction in general, and human-robot interaction in particular, can be enhanced with strategic use of subtle communicative cues. The examples will come from robot-assisted joint manipulation tasks (e.g., carrying a table with the help of a robot), conversational robotic agents, and multimodal interaction using eye-gaze tracking, and pen input.

T. Metin Sezgin graduated summa cum laude with Honors from Syracuse University in 1999. He completed his MS in the Artificial Intelligence Laboratory at Massachusetts Institute of Technology in 2001. He received his PhD in 2006 from Massachusetts Institute of Technology. He subsequently moved to University of Cambridge, and joined the Rainbow group at the University of Cambridge Computer Laboratory as a Postdoctoral Research Associate. Dr. Sezgin is currently an Associate Professor in the College of Engineering at Koç University, Istanbul. His research interests include intelligent human-computer interfaces, multimodal sensor fusion, and HCI applications of machine learning. Dr. Sezgin is particularly interested in applications of these technologies in building intelligent pen-based interfaces. Dr. Sezgin’s research has been supported by international and national grants including grants from European Research Council, and Turk Telekom. He is a recipient of the Career Award of the Scientific and Technological Research Council of Turkey.

Subscribe to RI/VASC