A Cross-lingual Comparison of Human and Model Relative Word Importance

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 600 KB, PDF document

Felix Morger
Brandl, Stephanie
Lisa Beinborn
Nora Hollenstein

Relative word importance is a key metric for natural language processing. In this work, we compare human and model relative word importance to investigate if pretrained neural language models focus on the same words as humans cross-lingually. We perform an extensive study using several importance metrics (gradient-based saliency and attention-based) in monolingual and multilingual models, including eye-tracking corpora from four languages (German, Dutch, English, and Russian). We find that gradient-based saliency, first-layer attention, and attention flow correlate strongly with human eye-tracking data across all four languages. We further analyze the role of word length and word frequency in determining relative importance and find that it strongly correlates with length and frequency, however, the mechanisms behind these non-linear relations remain elusive. We obtain a cross-lingual approximation of the similarity between human and computational language processing and insights into the usability of several importance metrics.

Original language	English
Title of host publication	Proceedings of the 2022 CLASP Conference on (Dis)embodiment
Number of pages	13
Publisher	Association for Computational Linguistics (ACL)
Publication date	2022
Pages	11-23
Publication status	Published - 2022
Event	2022 CLASP Conference on (Dis)embodiment - Gothenburg / Online, Sweden Duration: 15 Sep 2022 → 16 Sep 2022

Conference

Conference	2022 CLASP Conference on (Dis)embodiment
Land	Sweden
By	Gothenburg / Online
Periode	15/09/2022 → 16/09/2022

Department of Computer Science