Revisiting Transformer-based Models for Long Document Classification

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 552 KB, PDF document

The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods.We examine several aspects of sparse attention (e.g., size of local attention window, use of global attention) and hierarchical (e.g., document splitting strategy) transformers on four document classification datasets covering different domains. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.

Original language	English
Title of host publication	Findings of the Association for Computational Linguistics: EMNLP 2022
Publisher	Association for Computational Linguistics
Publication date	2022
Pages	7212–7230
Publication status	Published - 2022

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 339145904

Department of Computer Science