Descriptive Attributes for Language-Based Object Keypoint Detection
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.
Original language | English |
---|---|
Title of host publication | Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings |
Editors | Henrik I. Christensen, Peter Corke, Renaud Detry, Jean-Baptiste Weibel, Markus Vincze |
Publisher | Springer |
Publication date | 2023 |
Pages | 444-458 |
ISBN (Print) | 9783031441363 |
DOIs | |
Publication status | Published - 2023 |
Event | 14th International Conference on Computer Vision Systems, ICVS 2023 - VIenna, Austria Duration: 27 Sep 2023 → 29 Sep 2023 |
Conference
Conference | 14th International Conference on Computer Vision Systems, ICVS 2023 |
---|---|
Land | Austria |
By | VIenna |
Periode | 27/09/2023 → 29/09/2023 |
Series | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14253 LNCS |
ISSN | 0302-9743 |
Bibliographical note
Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
- Attributes, Keypoint detection, Vision & language models
Research areas
ID: 372615567