Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

https://arxiv.org/abs/2401.12070
Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

Capybara problem:

Prompts have a strong influence over downstream text, and prompts are typically unknown to the detector. On the one hand, the prompt “1, 2, 3,” might result in the very low perplexity completion “4, 5, 6.” On the other hand, the prompt “Can you write a few sentences about a capybara that is an astrophysicist?” will yield a response that seems more strange. In the presence of the prompt, the response may be unsurprising (low perplexity). But in the absence of the prompt, a response containing the curious words “capybara” and “astrophysicist” in the same sentence will have high perplexity, resulting in the false determination that the text was written by a human, see the example in Figure 2. Clearly, certain contexts will result in high perplexity and others low perplexity, regardless of whether the author is human or machine. We refer to this dilemma as “the capybara problem” – in the absence of the prompt, LLM detection seems difficult and naive perplexity-based detection fails.