Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data. / Rethmeier, Nils; Augenstein, Isabelle.

In: Computer Sciences & Mathematics Forum , Vol. 3, 10, 2022.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Rethmeier, N & Augenstein, I 2022, 'Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data', Computer Sciences & Mathematics Forum , vol. 3, 10. https://doi.org/10.3390/cmsf2022003010

APA

Rethmeier, N., & Augenstein, I. (2022). Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data. Computer Sciences & Mathematics Forum , 3, [10]. https://doi.org/10.3390/cmsf2022003010

Vancouver

Rethmeier N, Augenstein I. Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data. Computer Sciences & Mathematics Forum . 2022;3. 10. https://doi.org/10.3390/cmsf2022003010

Author

Rethmeier, Nils ; Augenstein, Isabelle. / Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data. In: Computer Sciences & Mathematics Forum . 2022 ; Vol. 3.

Bibtex

@article{8ec8aece54d24bc8a6cb895e4462140f,
title = "Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data",
abstract = "Preserving long-tail, minority information during model compression has been linked to algorithmic fairness considerations. However, this assumes that large models capture long-tail information and smaller ones do not, which raises two questions. One, how well do large pretrained language models encode long-tail information? Two, how can small language models be made to better capture long-tail information, without requiring a compression step? First, we study the performance of pretrained Transformers on a challenging new long-tail, web text classification task. Second, to train small long-tail capture models we propose a contrastive training objective that unifies self-supervised pretraining, and supervised long-tail fine-tuning, which markedly increases tail data-efficiency and tail prediction performance. Third, we analyze the resulting long-tail learning capabilities under zero-shot, few-shot and full supervision conditions, and study the performance impact of model size and self-supervision signal amount. We find that large pretrained language models do not guarantee long-tail retention and that much smaller, contrastively pretrained models better retain long-tail information while gaining data and compute efficiency. This demonstrates that model compression may not be the go-to method for obtaining good long-tail performance from compact models.",
author = "Nils Rethmeier and Isabelle Augenstein",
year = "2022",
doi = "10.3390/cmsf2022003010",
language = "English",
volume = "3",
journal = "Computer Sciences & Mathematics Forum ",
issn = "2813-0324",
publisher = "MDPI",

}

RIS

TY - JOUR

T1 - Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data

AU - Rethmeier, Nils

AU - Augenstein, Isabelle

PY - 2022

Y1 - 2022

N2 - Preserving long-tail, minority information during model compression has been linked to algorithmic fairness considerations. However, this assumes that large models capture long-tail information and smaller ones do not, which raises two questions. One, how well do large pretrained language models encode long-tail information? Two, how can small language models be made to better capture long-tail information, without requiring a compression step? First, we study the performance of pretrained Transformers on a challenging new long-tail, web text classification task. Second, to train small long-tail capture models we propose a contrastive training objective that unifies self-supervised pretraining, and supervised long-tail fine-tuning, which markedly increases tail data-efficiency and tail prediction performance. Third, we analyze the resulting long-tail learning capabilities under zero-shot, few-shot and full supervision conditions, and study the performance impact of model size and self-supervision signal amount. We find that large pretrained language models do not guarantee long-tail retention and that much smaller, contrastively pretrained models better retain long-tail information while gaining data and compute efficiency. This demonstrates that model compression may not be the go-to method for obtaining good long-tail performance from compact models.

AB - Preserving long-tail, minority information during model compression has been linked to algorithmic fairness considerations. However, this assumes that large models capture long-tail information and smaller ones do not, which raises two questions. One, how well do large pretrained language models encode long-tail information? Two, how can small language models be made to better capture long-tail information, without requiring a compression step? First, we study the performance of pretrained Transformers on a challenging new long-tail, web text classification task. Second, to train small long-tail capture models we propose a contrastive training objective that unifies self-supervised pretraining, and supervised long-tail fine-tuning, which markedly increases tail data-efficiency and tail prediction performance. Third, we analyze the resulting long-tail learning capabilities under zero-shot, few-shot and full supervision conditions, and study the performance impact of model size and self-supervision signal amount. We find that large pretrained language models do not guarantee long-tail retention and that much smaller, contrastively pretrained models better retain long-tail information while gaining data and compute efficiency. This demonstrates that model compression may not be the go-to method for obtaining good long-tail performance from compact models.

U2 - 10.3390/cmsf2022003010

DO - 10.3390/cmsf2022003010

M3 - Journal article

VL - 3

JO - Computer Sciences & Mathematics Forum

JF - Computer Sciences & Mathematics Forum

SN - 2813-0324

M1 - 10

ER -

ID: 339336056