MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

https://openreview.net/forum?id=1Fs1LvjYQW
- https://github.com/snap-stanford/MLAgentBench
Qian Huang, Jian Vora, Percy Liang, Jure Leskovec

LLMs, Agent, AI Team, Automated machine learning, Auto GPT

13 tasks about improving ML models. Tests how well Agent can improve ML models.