The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas

https://arxiv.org/abs/2506.20803
Chenglei Si, Tatsunori Hashimoto, Diyi Yang

This study not only asks human experts to score the effectiveness of AI- and human-generated ideas, but also to execute those ideas.

AI-generated research ideas sounded better (high estimated effectiveness score) but their score drops after the execution.

Usage of AI in Science