How to pre-train transformer models?

Transformer model, BERT, GPT, T5



How much does it cost to train a BERT model? Sharir2020cost estimates cost for models with different parameter size.

Clova’s LaRva1 team says that BERT costs about $7,000 with 16 v2 TPUs (4 days). The Staggering Cost of Training SOTA AI Models names $6,912 for BERT one-time pretraining, or about $500 for BERT-Base model.

Izsak2021how proposes a recipe for training BERT in 24 hours with 8 12GB GPUs.


It’s important to be able to see the progress during the training1. They also made a dynamic data pipeline that generates training set on the fly.

How to scale the BERT Training with Nvidia GPUs?


Google’s cloud TPU can be used for training. There is a tutorial with source code. – and

Hugging Face has a tutorial for training a new language model from scratch:

It can be done with AWS: Amazon Web Services achieves fastest training times for BERT and Mask R-CNN. There are a couple of offerings in AWS marketplace: