Text Rendering Strategies for Pixel Language Models

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Text Rendering Strategies for Pixel Language Models. / Lotz, Jonas F.; Salesky, Elizabeth; Rust, Phillip; Elliott, Desmond.

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2023. s. 10155–10172.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Lotz, JF, Salesky, E, Rust, P & Elliott, D 2023, Text Rendering Strategies for Pixel Language Models. i Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), s. 10155–10172, 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 06/12/2023. https://doi.org/10.18653/v1/2023.emnlp-main.628

APA

Lotz, J. F., Salesky, E., Rust, P., & Elliott, D. (2023). Text Rendering Strategies for Pixel Language Models. I Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (s. 10155–10172). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.628

Vancouver

Lotz JF, Salesky E, Rust P, Elliott D. Text Rendering Strategies for Pixel Language Models. I Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL). 2023. s. 10155–10172 https://doi.org/10.18653/v1/2023.emnlp-main.628

Author

Lotz, Jonas F. ; Salesky, Elizabeth ; Rust, Phillip ; Elliott, Desmond. / Text Rendering Strategies for Pixel Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2023. s. 10155–10172

Bibtex

@inproceedings{e02b22a1d20d4053ba44385d80a1e37e,

title = "Text Rendering Strategies for Pixel Language Models",

abstract = " Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of almost-equivalent input patches, which may prove sub-optimal for downstream tasks, due to redundancy in the input representations. In this paper, we investigate four approaches to rendering text in the PIXEL model (Rust et al., 2023), and find that simple character bigram rendering brings improved performance on sentence-level tasks without compromising performance on token-level or multilingual tasks. This new rendering strategy also makes it possible to train a more compact model with only 22M parameters that performs on par with the original 86M parameter model. Our analyses show that character bigram rendering leads to a consistently better model but with an anisotropic patch embedding space, driven by a patch frequency bias, highlighting the connections between image patch- and tokenization-based language models. ",

keywords = "cs.CL",

author = "Lotz, {Jonas F.} and Elizabeth Salesky and Phillip Rust and Desmond Elliott",

year = "2023",

doi = "10.18653/v1/2023.emnlp-main.628",

language = "English",

pages = "10155–10172",

booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

note = "2023 Conference on Empirical Methods in Natural Language Processing ; Conference date: 06-12-2023 Through 10-12-2023",

}

RIS

TY - GEN

T1 - Text Rendering Strategies for Pixel Language Models

AU - Lotz, Jonas F.

AU - Salesky, Elizabeth

AU - Rust, Phillip

AU - Elliott, Desmond

PY - 2023

Y1 - 2023

N2 - Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of almost-equivalent input patches, which may prove sub-optimal for downstream tasks, due to redundancy in the input representations. In this paper, we investigate four approaches to rendering text in the PIXEL model (Rust et al., 2023), and find that simple character bigram rendering brings improved performance on sentence-level tasks without compromising performance on token-level or multilingual tasks. This new rendering strategy also makes it possible to train a more compact model with only 22M parameters that performs on par with the original 86M parameter model. Our analyses show that character bigram rendering leads to a consistently better model but with an anisotropic patch embedding space, driven by a patch frequency bias, highlighting the connections between image patch- and tokenization-based language models.

AB - Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of almost-equivalent input patches, which may prove sub-optimal for downstream tasks, due to redundancy in the input representations. In this paper, we investigate four approaches to rendering text in the PIXEL model (Rust et al., 2023), and find that simple character bigram rendering brings improved performance on sentence-level tasks without compromising performance on token-level or multilingual tasks. This new rendering strategy also makes it possible to train a more compact model with only 22M parameters that performs on par with the original 86M parameter model. Our analyses show that character bigram rendering leads to a consistently better model but with an anisotropic patch embedding space, driven by a patch frequency bias, highlighting the connections between image patch- and tokenization-based language models.

KW - cs.CL

U2 - 10.18653/v1/2023.emnlp-main.628

DO - 10.18653/v1/2023.emnlp-main.628

M3 - Article in proceedings

SP - 10155

EP - 10172

BT - Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics (ACL)

T2 - 2023 Conference on Empirical Methods in Natural Language Processing

Y2 - 6 December 2023 through 10 December 2023

ER -

ID: 379722543

Datalogisk Institut