Guiding Generative Protein Language Models with Reinforcement Learning
- https://arxiv.org/abs/2412.12979
- Filippo Stocco, Maria Artigues-Lleixa, Andrea Hunklinger, Talal Widatalla, Marc Guell, Noelia Ferruz
Autoregressive protein language models (pLMs; Language models for genetics) have been successful but they tend to sample from the distribution adjacent to the training set. This paper uses Reinforcement learning to address this limitation. This is inspired by aligning LLMs based on human feedback (Reinforcement learning from human feedback).
The proposed method uses Direct preference optimization.