MULTIFIN - Staff

MULTIFIN: A Dataset for Multilingual Financial NLP

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

MULTIFIN : A Dataset for Multilingual Financial NLP. / Jørgensen, Rasmus Kær; Brandt, Oliver; Hartmann, Mareike; Dai, Xiang; Igel, Christian; Elliott, Desmond.

EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023. Association for Computational Linguistics (ACL), 2023. p. 864-879.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Jørgensen, RK, Brandt, O, Hartmann, M, Dai, X, Igel, C & Elliott, D 2023, MULTIFIN: A Dataset for Multilingual Financial NLP. in EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023. Association for Computational Linguistics (ACL), pp. 864-879, 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023, Dubrovnik, Croatia, 02/05/2023. <https://aclanthology.org/2023.findings-eacl.66>

APA

Jørgensen, R. K., Brandt, O., Hartmann, M., Dai, X., Igel, C., & Elliott, D. (2023). MULTIFIN: A Dataset for Multilingual Financial NLP. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023 (pp. 864-879). Association for Computational Linguistics (ACL). https://aclanthology.org/2023.findings-eacl.66

Vancouver

Jørgensen RK, Brandt O, Hartmann M, Dai X, Igel C, Elliott D. MULTIFIN: A Dataset for Multilingual Financial NLP. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023. Association for Computational Linguistics (ACL). 2023. p. 864-879

Author

Jørgensen, Rasmus Kær ; Brandt, Oliver ; Hartmann, Mareike ; Dai, Xiang ; Igel, Christian ; Elliott, Desmond. / MULTIFIN : A Dataset for Multilingual Financial NLP. EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023. Association for Computational Linguistics (ACL), 2023. pp. 864-879

Bibtex

@inproceedings{34b74f3f45774eab9cf59341d398c271,

title = "MULTIFIN: A Dataset for Multilingual Financial NLP",

abstract = "Financial information is generated and distributed across the world, resulting in a vast amount of domain-specific multilingual data. Multilingual models adapted to the financial domain would ease deployment when an organization needs to work with multiple languages on a regular basis. For the development and evaluation of such models, there is a need for multilingual financial language processing datasets. We describe MULTIFIN– a publicly available financial dataset consisting of real-world article headlines covering 15 languages across different writing systems and language families. The dataset consists of hierarchical label structure providing two classification tasks: multi-label and multi-class. We develop our annotation schema based on a real-world application and annotate our dataset using both {\textquoteleft}label by native-speaker{\textquoteright} and {\textquoteleft}translate-then-label{\textquoteright} approaches. The evaluation of several popular multilingual models, e.g., mBERT, XLM-R, and mT5, show that although decent accuracy can be achieved in high-resource languages, there is substantial room for improvement in low-resource languages.",

author = "J{\o}rgensen, {Rasmus K{\ae}r} and Oliver Brandt and Mareike Hartmann and Xiang Dai and Christian Igel and Desmond Elliott",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 ; Conference date: 02-05-2023 Through 06-05-2023",

year = "2023",

language = "English",

pages = "864--879",

booktitle = "EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

}

RIS

TY - GEN

T1 - MULTIFIN

T2 - 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023

AU - Jørgensen, Rasmus Kær

AU - Brandt, Oliver

AU - Hartmann, Mareike

AU - Dai, Xiang

AU - Igel, Christian

AU - Elliott, Desmond

PY - 2023

Y1 - 2023

N2 - Financial information is generated and distributed across the world, resulting in a vast amount of domain-specific multilingual data. Multilingual models adapted to the financial domain would ease deployment when an organization needs to work with multiple languages on a regular basis. For the development and evaluation of such models, there is a need for multilingual financial language processing datasets. We describe MULTIFIN– a publicly available financial dataset consisting of real-world article headlines covering 15 languages across different writing systems and language families. The dataset consists of hierarchical label structure providing two classification tasks: multi-label and multi-class. We develop our annotation schema based on a real-world application and annotate our dataset using both ‘label by native-speaker’ and ‘translate-then-label’ approaches. The evaluation of several popular multilingual models, e.g., mBERT, XLM-R, and mT5, show that although decent accuracy can be achieved in high-resource languages, there is substantial room for improvement in low-resource languages.

AB - Financial information is generated and distributed across the world, resulting in a vast amount of domain-specific multilingual data. Multilingual models adapted to the financial domain would ease deployment when an organization needs to work with multiple languages on a regular basis. For the development and evaluation of such models, there is a need for multilingual financial language processing datasets. We describe MULTIFIN– a publicly available financial dataset consisting of real-world article headlines covering 15 languages across different writing systems and language families. The dataset consists of hierarchical label structure providing two classification tasks: multi-label and multi-class. We develop our annotation schema based on a real-world application and annotate our dataset using both ‘label by native-speaker’ and ‘translate-then-label’ approaches. The evaluation of several popular multilingual models, e.g., mBERT, XLM-R, and mT5, show that although decent accuracy can be achieved in high-resource languages, there is substantial room for improvement in low-resource languages.

UR - http://www.scopus.com/inward/record.url?scp=85159859314&partnerID=8YFLogxK

M3 - Article in proceedings

AN - SCOPUS:85159859314

SP - 864

EP - 879

BT - EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023

PB - Association for Computational Linguistics (ACL)

Y2 - 2 May 2023 through 6 May 2023

ER -

ID: 355143987

Department of Computer Science