While deep convolutional neural networks (CNNs) achieve the state-of-the-art performance in visual recognition, they are also criticized as being black boxes that lack interpretability. In this work, we propose a framework called Network Dissection to quantify the interpretability of latent representations of CNNs. By evaluating the alignment between individual hidden units and a set of semantic concepts across a range of objects, parts, scenes, textures, materials, and colors, we associate each hidden unit with a label and a interpretability score automatically. We first verify the hypothesis that the interpretability of units is not equivalent to random linear combinations of units. Then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks, and when trained under different training regularizations. We show that the interpretability of CNNs sheds light on the characteristics of networks that go beyond measurements of their classification power. More detail and the relevant paper are available.
Bolei Zhou is a 5th-year Ph.D. Candidate in Computer Science and Artificial Intelligence Laboratory at MIT, working with Prof. Antonio Torralba. His research is on computer vision and machine learning, with particular interest in visual scene understanding and interpretable machine learning. He is the award recipient of Facebook Fellowship, Microsoft Research Asia Fellowship，MIT Greater China Fellowship. His research work has been covered by TechCrunch, Quartz, and MIT News.
Sponsored in part by Disney Research.