Descriptive Attributes for Language-Based Object Keypoint Detection

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Descriptive Attributes for Language-Based Object Keypoint Detection. / Weinman, Jerod; Belongie, Serge; Frank, Stella.

Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. red. / Henrik I. Christensen; Peter Corke; Renaud Detry; Jean-Baptiste Weibel; Markus Vincze. Springer, 2023. s. 444-458 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 14253 LNCS).

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Weinman, J, Belongie, S & Frank, S 2023, Descriptive Attributes for Language-Based Object Keypoint Detection. i HI Christensen, P Corke, R Detry, J-B Weibel & M Vincze (red), Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. Springer, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), bind 14253 LNCS, s. 444-458, 14th International Conference on Computer Vision Systems, ICVS 2023, VIenna, Østrig, 27/09/2023. https://doi.org/10.1007/978-3-031-44137-0_37

APA

Weinman, J., Belongie, S., & Frank, S. (2023). Descriptive Attributes for Language-Based Object Keypoint Detection. I H. I. Christensen, P. Corke, R. Detry, J-B. Weibel, & M. Vincze (red.), Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings (s. 444-458). Springer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Bind 14253 LNCS https://doi.org/10.1007/978-3-031-44137-0_37

Vancouver

Weinman J, Belongie S, Frank S. Descriptive Attributes for Language-Based Object Keypoint Detection. I Christensen HI, Corke P, Detry R, Weibel J-B, Vincze M, red., Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. Springer. 2023. s. 444-458. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 14253 LNCS). https://doi.org/10.1007/978-3-031-44137-0_37

Author

Weinman, Jerod ; Belongie, Serge ; Frank, Stella. / Descriptive Attributes for Language-Based Object Keypoint Detection. Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. red. / Henrik I. Christensen ; Peter Corke ; Renaud Detry ; Jean-Baptiste Weibel ; Markus Vincze. Springer, 2023. s. 444-458 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bind 14253 LNCS).

Bibtex

@inproceedings{7d94e5c6ff6a465a800c82107058a88f,
title = "Descriptive Attributes for Language-Based Object Keypoint Detection",
abstract = "Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet{\textquoteright}s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.",
keywords = "Attributes, Keypoint detection, Vision & language models",
author = "Jerod Weinman and Serge Belongie and Stella Frank",
note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.; 14th International Conference on Computer Vision Systems, ICVS 2023 ; Conference date: 27-09-2023 Through 29-09-2023",
year = "2023",
doi = "10.1007/978-3-031-44137-0_37",
language = "English",
isbn = "9783031441363",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "444--458",
editor = "Christensen, {Henrik I.} and Peter Corke and Renaud Detry and Jean-Baptiste Weibel and Markus Vincze",
booktitle = "Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings",
address = "Switzerland",

}

RIS

TY - GEN

T1 - Descriptive Attributes for Language-Based Object Keypoint Detection

AU - Weinman, Jerod

AU - Belongie, Serge

AU - Frank, Stella

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

PY - 2023

Y1 - 2023

N2 - Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.

AB - Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.

KW - Attributes

KW - Keypoint detection

KW - Vision & language models

UR - http://www.scopus.com/inward/record.url?scp=85174519994&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-44137-0_37

DO - 10.1007/978-3-031-44137-0_37

M3 - Article in proceedings

AN - SCOPUS:85174519994

SN - 9783031441363

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 444

EP - 458

BT - Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings

A2 - Christensen, Henrik I.

A2 - Corke, Peter

A2 - Detry, Renaud

A2 - Weibel, Jean-Baptiste

A2 - Vincze, Markus

PB - Springer

T2 - 14th International Conference on Computer Vision Systems, ICVS 2023

Y2 - 27 September 2023 through 29 September 2023

ER -

ID: 372615567