Reducing Hallucinations in Vision-Language Models via Latent Space Steering

LVLM hallucination, Latent space steering

We identify that hallucinations often arise from the sensitivity of text decoders to vision inputs, a natural phenomenon when image encoders and text decoders are pre-trained separately.

Visual and Textual Intervention method is introduced. This methods steers latent space representations during inference.