Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

sample more verify better.