Temporal Scaling Law for Large Language Models
- https://arxiv.org/abs/2404.17785
- Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Jianwei Niu, Guiguang Ding
we propose the novel concept of Temporal Scaling Law and study the loss of LLMs from the temporal dimension. … the temporal scaling law reveals that LLMs learn uniformly on different token positions.