How to Train BERT with an Academic Budget

https://peteriz.github.io/posts/papers/academic-budget-bert.pdf

We present a recipe for training a BERT-like masked language model in 24 hours, using only 8 Nvidia Titan-V GPUs (12GB each).