Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data. / Rethmeier, Nils; Augenstein, Isabelle.
In: Computer Sciences & Mathematics Forum , Vol. 3, 10, 2022.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data
AU - Rethmeier, Nils
AU - Augenstein, Isabelle
PY - 2022
Y1 - 2022
N2 - Preserving long-tail, minority information during model compression has been linked to algorithmic fairness considerations. However, this assumes that large models capture long-tail information and smaller ones do not, which raises two questions. One, how well do large pretrained language models encode long-tail information? Two, how can small language models be made to better capture long-tail information, without requiring a compression step? First, we study the performance of pretrained Transformers on a challenging new long-tail, web text classification task. Second, to train small long-tail capture models we propose a contrastive training objective that unifies self-supervised pretraining, and supervised long-tail fine-tuning, which markedly increases tail data-efficiency and tail prediction performance. Third, we analyze the resulting long-tail learning capabilities under zero-shot, few-shot and full supervision conditions, and study the performance impact of model size and self-supervision signal amount. We find that large pretrained language models do not guarantee long-tail retention and that much smaller, contrastively pretrained models better retain long-tail information while gaining data and compute efficiency. This demonstrates that model compression may not be the go-to method for obtaining good long-tail performance from compact models.
AB - Preserving long-tail, minority information during model compression has been linked to algorithmic fairness considerations. However, this assumes that large models capture long-tail information and smaller ones do not, which raises two questions. One, how well do large pretrained language models encode long-tail information? Two, how can small language models be made to better capture long-tail information, without requiring a compression step? First, we study the performance of pretrained Transformers on a challenging new long-tail, web text classification task. Second, to train small long-tail capture models we propose a contrastive training objective that unifies self-supervised pretraining, and supervised long-tail fine-tuning, which markedly increases tail data-efficiency and tail prediction performance. Third, we analyze the resulting long-tail learning capabilities under zero-shot, few-shot and full supervision conditions, and study the performance impact of model size and self-supervision signal amount. We find that large pretrained language models do not guarantee long-tail retention and that much smaller, contrastively pretrained models better retain long-tail information while gaining data and compute efficiency. This demonstrates that model compression may not be the go-to method for obtaining good long-tail performance from compact models.
U2 - 10.3390/cmsf2022003010
DO - 10.3390/cmsf2022003010
M3 - Journal article
VL - 3
JO - Computer Sciences & Mathematics Forum
JF - Computer Sciences & Mathematics Forum
SN - 2813-0324
M1 - 10
ER -
ID: 339336056