Temporal Scaling Law for Large Language Models

Scaling law, LLMs

we propose the novel concept of Temporal Scaling Law and study the loss of LLMs from the temporal dimension. … the temporal scaling law reveals that LLMs learn uniformly on different token positions.