Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data

Research output: Contribution to journal › Journal article › Research › peer-review

Documents

Fulltext
Final published version, 1.13 MB, PDF document

Preserving long-tail, minority information during model compression has been linked to algorithmic fairness considerations. However, this assumes that large models capture long-tail information and smaller ones do not, which raises two questions. One, how well do large pretrained language models encode long-tail information? Two, how can small language models be made to better capture long-tail information, without requiring a compression step? First, we study the performance of pretrained Transformers on a challenging new long-tail, web text classification task. Second, to train small long-tail capture models we propose a contrastive training objective that unifies self-supervised pretraining, and supervised long-tail fine-tuning, which markedly increases tail data-efficiency and tail prediction performance. Third, we analyze the resulting long-tail learning capabilities under zero-shot, few-shot and full supervision conditions, and study the performance impact of model size and self-supervision signal amount. We find that large pretrained language models do not guarantee long-tail retention and that much smaller, contrastively pretrained models better retain long-tail information while gaining data and compute efficiency. This demonstrates that model compression may not be the go-to method for obtaining good long-tail performance from compact models.

Original language	English
Article number	10
Journal	Computer Sciences & Mathematics Forum
Volume	3
Number of pages	18
ISSN	2813-0324
DOIs	https://doi.org/10.3390/cmsf2022003010
Publication status	Published - 2022

ID: 339336056

Department of Computer Science

Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data

Documents