Scaling Laws for Neural Language Models

https://arxiv.org/pdf/2001.08361.pdf

There is a crazy power-law Scaling law between test loss and dataset/model/computation size. No apparent saturation and the loss keeps decreasing.

LLMs < >