What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
- https://arxiv.org/abs/2204.05832
- Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, Colin Raffel
Instruction tuning, LLMs, Zero shot learning
Our experiments show that causal decoder-only models trained on an autoregressive language modeling objective exhibit the strongest zero-shot generalization after purely unsupervised pretraining. However, models with non-causal visibility on their input trained with a masked language modeling objective followed by multitask finetuning perform the best among our experiments.