Descriptive Attributes for Language-Based Object Keypoint Detection

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.

Originalsprog	Engelsk
Titel	Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings
Redaktører	Henrik I. Christensen, Peter Corke, Renaud Detry, Jean-Baptiste Weibel, Markus Vincze
Forlag	Springer
Publikationsdato	2023
Sider	444-458
ISBN (Trykt)	9783031441363
DOI	https://doi.org/10.1007/978-3-031-44137-0_37
Status	Udgivet - 2023
Begivenhed	14th International Conference on Computer Vision Systems, ICVS 2023 - VIenna, Østrig Varighed: 27 sep. 2023 → 29 sep. 2023

Konference

Konference	14th International Conference on Computer Vision Systems, ICVS 2023
Land	Østrig
By	VIenna
Periode	27/09/2023 → 29/09/2023

Navn	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vol/bind	14253 LNCS
ISSN	0302-9743

Bibliografisk note

Funding Information:
We thank Grant Van Horn for the mapping between NABirds and CUB, Jonathan M. Wells for helpful conversation, Vésteinn Snæbjarnarson for experimental assistance, and the reviewers for important feedback. This work was supported in part by the Pioneer Centre for AI, DNRF grant number P1.

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

ID: 372615567

Datalogisk Institut