Text Rendering Strategies for Pixel Language Models

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 910 KB, PDF document

Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of almost-equivalent input patches, which may prove sub-optimal for downstream tasks, due to redundancy in the input representations. In this paper, we investigate four approaches to rendering text in the PIXEL model (Rust et al., 2023), and find that simple character bigram rendering brings improved performance on sentence-level tasks without compromising performance on token-level or multilingual tasks. This new rendering strategy also makes it possible to train a more compact model with only 22M parameters that performs on par with the original 86M parameter model. Our analyses show that character bigram rendering leads to a consistently better model but with an anisotropic patch embedding space, driven by a patch frequency bias, highlighting the connections between image patch- and tokenization-based language models.

Original language	English
Title of host publication	Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Publisher	Association for Computational Linguistics (ACL)
Publication date	2023
Pages	10155–10172
DOIs	https://doi.org/10.18653/v1/2023.emnlp-main.628
Publication status	Published - 2023
Event	2023 Conference on Empirical Methods in Natural Language Processing - Singapore Duration: 6 Dec 2023 → 10 Dec 2023

Conference

Conference	2023 Conference on Empirical Methods in Natural Language Processing
By	Singapore
Periode	06/12/2023 → 10/12/2023

Research areas

cs.CL

ID: 379722543

Department of Computer Science