Master of Science in Robotics Thesis Talk
- Remote Access - Zoom
- Virtual Presentation - ET
- SEUNGWAN (Sean) CHA
- Masters Student
- Robotics Institute
- Carnegie Mellon University
Retrieval-based Novel Activity Detection in Untrimmed Videos
Accurately detecting activities in untrimmed videos is a challenging task as systems need to handle variance in object scales, multiple viewpoints, and multiple types of activities. Furthermore, in a real-world scenario, activity detectors are often required to detect novel kinds of activities when the need arises from end-users. To address these issues, we propose a method that can detect novel activities from unseen scenes using visual and textual retrieval.
Given a handful of visual exemplars for each activity, we first run a sequence of object detection, optical flow, and hierarchical clustering to obtain spatiotemporal proposals which serve as query proposals. Then, we run the same pipeline on a pool of untrimmed test videos to get gallery proposals. Penultimate features from TSM are extracted and stored for these proposals. The averaged features from the query proposals are compared against the pool of gallery proposals using cosine distance. Finally, the top-ranked proposals are selected as detected activity instances after running post-processing.
In addition to the vision-based retrieval system, we also explore a language-based retrieval system that can utilize the textual descriptions of the unseen activities. To achieve this goal, a state-of-the-art image-text model called CLIP is used to extract textual and visual features from the given examples. Then, cosine distances between the textual features from the query proposals and the visual features from the gallery proposals are computed to rank and retrieve given novel activity. Our proposed system ranked 1st place on the Surprise Activity Leaderboard from Activities in Extended Video (ActEV) Challenge 2021 by outperforming the second-place system by 2%. We hope that the proposed system can help facilitate the successful deployment of activity detection in the real world.
Deva Ramanan (Advisor)
Aswin Sankaranarayanan (Advisor)
Kris M. Kitani
Zoom Participation. See announcement.