Reducing Hallucinations in Vision-Language Models via Latent Space Steering
- https://arxiv.org/abs/2410.15778
- Sheng Liu, Haotian Ye, Lei Xing, James Zou
LVLM hallucination, Latent space steering
We identify that hallucinations often arise from the sensitivity of text decoders to vision inputs, a natural phenomenon when image encoders and text decoders are pre-trained separately.
Visual and Textual Intervention method is introduced. This methods steers latent space representations during inference.