Context matters

Context matters: Refining object detection in video with recurrent neural networks

Research output: Contribution to conference › Paper › Research › peer-review

Subarna Tripathi
Zachary C. Lipton
Belongie, Serge
Truong Nguyen

Given the vast amounts of video available online and recent breakthroughs in object detection with static images, object detection in video offers a promising new frontier. However, motion blur and compression artifacts cause substantial frame-level variability, even in videos that appear smooth to the eye. Additionally, in video datasets, frames are typically sparsely annotated. We present a new framework for improving object detection in videos that captures temporal context and encourages consistency of predictions. First, we train a pseudo-labeler, i.e., a domain-adapted convolutional neural network for object detection, on the subset of labeled frames. We then subsequently apply it to provisionally label all frames, including those absent labels. Finally, we train a recurrent neural network that takes as input sequences of pseudo-labeled frames and optimizes an objective that encourages both accuracy on the target frame and consistency across consecutive frames. The approach incorporates strong supervision of target frames, weak-supervision on context frames, and regularization via a smoothness penalty. Our approach achieves mean Average Precision (mAP) of 68.73, an improvement of 7.1 over the strongest image-based baselines for the Youtube-Video Objects dataset. Our experiments demonstrate that neighboring frames can provide valuable information, even absent labels.

Original language	English
Publication date	2016
Number of pages	12
DOIs	https://doi.org/10.5244/C.30.44
Publication status	Published - 2016
Externally published	Yes
Event	27th British Machine Vision Conference, BMVC 2016 - York, United Kingdom Duration: 19 Sep 2016 → 22 Sep 2016

Conference

Conference	27th British Machine Vision Conference, BMVC 2016
Country	United Kingdom
City	York
Period	19/09/2016 → 22/09/2016
Sponsor	ARM, Disney Research, et al., HP, Ocado Technology, OSRAM

Bibliographical note

ID: 301827993

Department of Computer Science

Context matters: Refining object detection in video with recurrent neural networks

Conference

Bibliographical note