MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
- https://openreview.net/forum?id=1Fs1LvjYQW
- Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
LLMs, Agent, AI Team, Automated machine learning, Auto GPT
13 tasks about improving ML models. Tests how well Agent can improve ML models.