Utility is in the Eye of the User: A Critique of NLP Leaderboards

Model evaluation, Human evaluation

See also Ethayarajh2022authenticity