Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data
Research output: Contribution to journal › Journal article › Research › peer-review
Documents
- Fulltext
Final published version, 1.13 MB, PDF document
Preserving long-tail, minority information during model compression has been linked to algorithmic fairness considerations. However, this assumes that large models capture long-tail information and smaller ones do not, which raises two questions. One, how well do large pretrained language models encode long-tail information? Two, how can small language models be made to better capture long-tail information, without requiring a compression step? First, we study the performance of pretrained Transformers on a challenging new long-tail, web text classification task. Second, to train small long-tail capture models we propose a contrastive training objective that unifies self-supervised pretraining, and supervised long-tail fine-tuning, which markedly increases tail data-efficiency and tail prediction performance. Third, we analyze the resulting long-tail learning capabilities under zero-shot, few-shot and full supervision conditions, and study the performance impact of model size and self-supervision signal amount. We find that large pretrained language models do not guarantee long-tail retention and that much smaller, contrastively pretrained models better retain long-tail information while gaining data and compute efficiency. This demonstrates that model compression may not be the go-to method for obtaining good long-tail performance from compact models.
Original language | English |
---|---|
Article number | 10 |
Journal | Computer Sciences & Mathematics Forum |
Volume | 3 |
Number of pages | 18 |
ISSN | 2813-0324 |
DOIs | |
Publication status | Published - 2022 |
ID: 339336056