Reducing Hallucinations in Vision-Language Models via Latent Space Steering

https://arxiv.org/abs/2410.15778
- https://github.com/shengliu66/VTI
Sheng Liu, Haotian Ye, Lei Xing, James Zou

LVLM hallucination, Latent space steering

We identify that hallucinations often arise from the sensitivity of text decoders to vision inputs, a natural phenomenon when image encoders and text decoders are pre-trained separately.

Visual and Textual Intervention method is introduced. This methods steers latent space representations during inference.