Intentonomy: A dataset and study towards human intent understanding

Publikation: Bidrag til tidsskriftKonferenceartikelForskningfagfællebedømt

Standard

Intentonomy : A dataset and study towards human intent understanding. / Jia, Menglin; Wu, Zuxuan; Reiter, Austin; Cardie, Claire; Belongie, Serge; Lim, Ser Nam.

I: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, s. 12981-12991.

Publikation: Bidrag til tidsskriftKonferenceartikelForskningfagfællebedømt

Harvard

Jia, M, Wu, Z, Reiter, A, Cardie, C, Belongie, S & Lim, SN 2021, 'Intentonomy: A dataset and study towards human intent understanding', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, s. 12981-12991. https://doi.org/10.1109/CVPR46437.2021.01279

APA

Jia, M., Wu, Z., Reiter, A., Cardie, C., Belongie, S., & Lim, S. N. (2021). Intentonomy: A dataset and study towards human intent understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 12981-12991. https://doi.org/10.1109/CVPR46437.2021.01279

Vancouver

Jia M, Wu Z, Reiter A, Cardie C, Belongie S, Lim SN. Intentonomy: A dataset and study towards human intent understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2021;12981-12991. https://doi.org/10.1109/CVPR46437.2021.01279

Author

Jia, Menglin ; Wu, Zuxuan ; Reiter, Austin ; Cardie, Claire ; Belongie, Serge ; Lim, Ser Nam. / Intentonomy : A dataset and study towards human intent understanding. I: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2021 ; s. 12981-12991.

Bibtex

@inproceedings{177fd35df4a2440e90740bbce388264b,
title = "Intentonomy: A dataset and study towards human intent understanding",
abstract = "An image is worth a thousand words, conveying information that goes beyond the mere visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can facilitate recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These images are manually annotated with 28 intent categories derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding. Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier. Our results quantitatively and qualitatively shed light on how visual and textual information can produce observable effects when predicting intent.",
author = "Menglin Jia and Zuxuan Wu and Austin Reiter and Claire Cardie and Serge Belongie and Lim, {Ser Nam}",
note = "Funding Information: Acknowledgement We thank Luke Chesser and Timothy Carbone from Unsplash for providing the images, Kimberly Wilber and Bor-chun Chen for tips and suggestions about the annotation interface and annotator management, Kevin Musgrave for the general discussion, and anonymous reviewers for their valuable feedback. This work is supported by a Facebook AI research grant awarded to Cornell University. Publisher Copyright: {\textcopyright} 2021 IEEE; 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 ; Conference date: 19-06-2021 Through 25-06-2021",
year = "2021",
doi = "10.1109/CVPR46437.2021.01279",
language = "English",
pages = "12981--12991",
journal = "I E E E Conference on Computer Vision and Pattern Recognition. Proceedings",
issn = "1063-6919",
publisher = "Institute of Electrical and Electronics Engineers",

}

RIS

TY - GEN

T1 - Intentonomy

T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021

AU - Jia, Menglin

AU - Wu, Zuxuan

AU - Reiter, Austin

AU - Cardie, Claire

AU - Belongie, Serge

AU - Lim, Ser Nam

N1 - Funding Information: Acknowledgement We thank Luke Chesser and Timothy Carbone from Unsplash for providing the images, Kimberly Wilber and Bor-chun Chen for tips and suggestions about the annotation interface and annotator management, Kevin Musgrave for the general discussion, and anonymous reviewers for their valuable feedback. This work is supported by a Facebook AI research grant awarded to Cornell University. Publisher Copyright: © 2021 IEEE

PY - 2021

Y1 - 2021

N2 - An image is worth a thousand words, conveying information that goes beyond the mere visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can facilitate recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These images are manually annotated with 28 intent categories derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding. Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier. Our results quantitatively and qualitatively shed light on how visual and textual information can produce observable effects when predicting intent.

AB - An image is worth a thousand words, conveying information that goes beyond the mere visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can facilitate recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These images are manually annotated with 28 intent categories derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding. Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier. Our results quantitatively and qualitatively shed light on how visual and textual information can produce observable effects when predicting intent.

UR - http://www.scopus.com/inward/record.url?scp=85123160639&partnerID=8YFLogxK

U2 - 10.1109/CVPR46437.2021.01279

DO - 10.1109/CVPR46437.2021.01279

M3 - Conference article

AN - SCOPUS:85123160639

SP - 12981

EP - 12991

JO - I E E E Conference on Computer Vision and Pattern Recognition. Proceedings

JF - I E E E Conference on Computer Vision and Pattern Recognition. Proceedings

SN - 1063-6919

Y2 - 19 June 2021 through 25 June 2021

ER -

ID: 301816716