Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

LLMs,

See also Zelikman2024Quert STaR