Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Training large language models with reinforcement learning, LLMs, Reinforcement learning