Robotics Thesis Proposal

  • Remote Access - Zoom
  • Virtual Presentation
  • Ph.D. Student
  • Robotics Institute
  • Carnegie Mellon University
Thesis Proposals

Visual Recognition Towards Autonomy

Perception for autonomy presents a collection of compelling challenges for visual recognition. We focus on three key challenges in this thesis.

The first key challenge is learning representations for 2D data such as RGB images. 2D sensing brings unique challenges in scale variance and occlusion. Intuitively, the cues for recognizing a 3px tall object must be fundamentally different than those for recognizing a 300px tall one. We develop representations that can effectively encode context and top-down feedback and demonstrate the effectiveness in the context of finding small faces and localizing keypoints under occlusion.

The second key challenge is learning representations for 3D data such as LiDAR point clouds. Many of the challenges in 2D processing, such as occlusion and scale, can be better modeled in 3D representations. For example, 3D representations need not model scale variation arising from perspective image projection. However, 3D sensor data is still constrained by line-of-sight visibility and occlusion. Fortunately, 3D sensors know where they do not know via raycasting to visible surfaces. We develop representations that embrace the notable properties of LiDAR geometry and visibility and demonstrate the effectiveness in the context of 3D object detection and grouping. 

The third key challenge is supporting other modules within a full autonomy stack. The input to a ML-based perception module is sensor data that is typically supervised during an extensive offline training stage. But sensor data is collected much faster than it is annotated. How do we curate more labels at lower costs? Intuitively, only 1 bit of feedback is needed if annotation is framed as a binary quality assurance (QA) task at the appropriate granularity; “is the output of the current perception module correct?” We develop an active binary learning regime, where a recognition model must choose which example to label and which binary question to ask. In reality, it is crucial to select an ontology of labels that are useful to downstream modules in the autonomy stack.  When the downstream module is a motion planner, we find that geometry plays as strong a role as semantic class labels. We embrace this perspective and explore self-supervised geometric representations (that can be learned without any human annotation) that directly support local motion planning via dynamic occupancy maps.

Finally, we propose to explore the limitations of a purely geometric self-supervised interface between perception and planning and how to best complement such an interface with other supervision signals.

Thesis Committee:
Deva Ramanan (Chair)
David Held
Chris Atkeson
Drew Bagnell
Raquel Urtasun (Uber ATG Toronto |University of Toronto)

Additional Information

Zoom Participation. See announcement.

For More Information, Please Contact: