Learning from noisy large-scale datasets with minimal supervision

Publikation: Bidrag til tidsskrift › Konferenceartikel › Forskning › fagfællebedømt

Andreas Veit
Neil Alldrin
Gal Chechik
Ivan Krasin
Abhinav Gupta
Belongie, Serge

We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the information contained in the clean set. Thus, we demonstrate how to use the clean annotations to reduce the noise in the large dataset before fine-tuning the network using both the clean set and the full set with reduced noise. The approach comprises a multi-task network that jointly learns to clean noisy annotations and to accurately classify images. We evaluate our approach on the recently released Open Images dataset, containing ∼9 million images, multiple annotations per image and over 6000 unique classes. For the small clean set of annotations we use a quarter of the validation set with ∼40k images. Our results demonstrate that the proposed approach clearly outperforms direct fine-tuning across all major categories of classes in the Open Image dataset. Further, our approach is particularly effective for a large number of classes with wide range of noise in annotations (20-80% false positive annotations).

Originalsprog	Engelsk
Tidsskrift	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Sider (fra-til)	6575-6583
Antal sider	9
DOI	https://doi.org/10.1109/CVPR.2017.696
Status	Udgivet - 6 nov. 2017
Eksternt udgivet	Ja
Begivenhed	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, USA Varighed: 21 jul. 2017 → 26 jul. 2017

Konference

Konference	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Land	USA
By	Honolulu
Periode	21/07/2017 → 26/07/2017

Bibliografisk note

ID: 301826772

Datalogisk Institut