How to pre-train transformer models?

Implementations, How to train your model

Methods

New LLM Pre-training and Post-training Paradigms by Sebastian Raschka

Cost

How much does it cost to train a BERT model? Sharir2020cost estimates cost for models with different parameter size.

Clova’s LaRva¹ team says that BERT costs about $7,000 with 16 v2 TPUs (4 days). [The Staggering Cost of Training SOTA AI Models](<a href="https://syncedreview.com/2019/06/27/the-staggering-cost-of-training-sota-ai-models/#:~:text=Google%20suggests%20researchers%20with%20tight,cost%20of%20about%20US%24500" rel="nofollow">https://syncedreview.com/2019/06/27/the-staggering-cost-of-training-sota-ai-models/#:~:text=Google%20suggests%20researchers%20with%20tight,cost%20of%20about%20US%24500</a>) names$ 6,912 for BERT one-time pretraining, or about $500 for BERT-Base model.

Izsak2021how proposes a recipe for training BERT in 24 hours with 8 12GB GPUs.

Tips

It’s important to be able to see the progress during the training¹. They also made a dynamic data pipeline that generates training set on the fly.

How to scale the BERT Training with Nvidia GPUs?

Options

Google’s cloud TPU can be used for training. There is a tutorial with source code. –https://towardsdatascience.com/pre-training-bert-from-scratch-with-cloud-tpu-6e2f71028379 and https://colab.research.google.com/drive/1nVn6AFpQSzXBt8_ywfx6XR8ZfQXlKGAz

Hugging Face has a tutorial for training a new language model from scratch: https://huggingface.co/blog/how-to-train

It can be done with AWS: Amazon Web Services achieves fastest training times for BERT and Mask R-CNN. There are a couple of offerings in AWS marketplace: https://aws.amazon.com/marketplace/search/results?x=0&y=0&searchTerms=BERT

엄청 큰 언어모델 공장 가동기 - LaRva: Language representation by Clova ↩↩