Improving Language Understanding by Generative Pre-Training

GPT

We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model. … followed by discriminative fine-tuning on each specific task.

Transformer model (generative decoder) + unsupervised pretraining.