Home About YY Help Changes

LLM evaluation

AI leaderboards are no longer useful. It’s time to switch to Pareto curves
Evaluating Large Language Models Using “Counterfactual Tasks” by Melanie Mitchell

« LLMs/Model evaluation »