Robotics Thesis Proposal

  • Remote Access - Zoom
  • Virtual Presentation - ET (New Date)
  • ADAM HARLEY
  • Ph.D. Student
  • Robotics Institute
  • Carnegie Mellon University
Thesis Proposals

Self-Learning of Structured Visual Representations

Most computer vision models in deployment today are not learning. Instead, they are in a "test" mode, where they will behave the same way perpetually, until they are replaced by newer models. This is a problem, because it means the models may perform poorly as soon as their "test" environment becomes different from their "training" environment. As we work towards building models that can be useful in increasingly complex tasks and environments, we need to provide machines with the ability to learn and improve on their own. In this thesis, we investigate methods for computer vision architectures to self-improve in unlabelled data, by exploiting rich regularities of the natural world itself. As a starting point, we embrace the fact that the world is 3D, and design neural architectures that map RGB-D observations into 3D feature maps. This representation allows us to generate self-supervision objectives using other regularities: we know that two objects cannot be in the same location at once, and that multiple views can be related with geometry. We use these facts to train viewpoint-invariant 3D features (unsupervised), and yield improvements in object detection and tracking. We also explore the use of object-centric bottlenecks in our 3D architectures, encouraging the models to parse visual scenes in terms of a few familiar objects and a background, and further attempt to decompose individual objects into their structure and style. In ongoing and future work, we are exploring active methods of knowledge acquisition, where we maintain multi-hypothesis representations of scenes and videos, and attempt to resolve uncertainties with analysis-by-synthesis, as well as through direct physical interaction with a robot arm. ntil they are replaced by newer models. This is a problem, because it means the models may perform poorly as soon as their "test" environment becomes different from their "training" environment. As we work towards building models that can be useful in increasingly complex tasks and environments, we need to provide machines with the ability to learn and improve on their own. In this thesis, we investigate methods for computer vision architectures to self-improve in unlabelled data, by exploiting rich regularities of the natural world itself. As a starting point, we embrace the fact that the world is 3D, and design neural architectures that map RGB-D observations into 3D feature maps. This representation allows us to generate self-supervision objectives using other regularities: we know that two objects cannot be in the same location at once, and that multiple views can be related with geometry. We use these facts to train viewpoint-invariant 3D features (unsupervised), and yield improvements in object detection and tracking. We also explore the use of object-centric bottlenecks in our 3D architectures, encouraging the models to parse visual scenes in terms of a few familiar objects and a background, and further attempt to decompose individual objects into their structure and style. In ongoing and future work, we are exploring active methods of knowledge acquisition, where we maintain multi-hypothesis representations of scenes and videos, and attempt to resolve uncertainties with analysis-by-synthesis, as well as through direct physical interaction with a robot arm.

Thesis Committee:
Katerina Fragkiadaki (Chair)
Deva Ramanan
Christopher G. Atkeson
Andrew Zisserman (University of Oxford)

Additional Information

Zoom Participation. See announcement.

For More Information, Please Contact: 
Keywords: