Occluded Video Instance Segmentation

Occluded Video Instance Segmentation: A Benchmark

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 4,91 MB, PDF-dokument

Jiyang Qi
Yan Gao
Yao Hu
Xinggang Wang
Xiaoyu Liu
Xiang Bai
Belongie, Serge
Alan Yuille
Philip H. S. Torr
Song Bai

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion. Built upon MaskTrack R-CNN and SipMask, we obtain a remarkable AP improvement on the OVIS dataset. The OVIS dataset and project code are available at http://songbai.site/ovis.

Originalsprog	Engelsk
Tidsskrift	International Journal of Computer Vision
Vol/bind	130
Sider (fra-til)	2022-2039
ISSN	0920-5691
DOI	https://doi.org/10.1007/s11263-022-01629-1
Status	Udgivet - 2022

Bibliografisk note

Funding Information:
This work is supported by Turing AI Fellowship EP/W002981/1.

Publisher Copyright:
© 2022, The Author(s).

ID: 344656097

Datalogisk Institut

Occluded Video Instance Segmentation: A Benchmark

Dokumenter

Bibliografisk note