LLM detection

Is it possible to detect AI-generated writing? Tang2024science is a recent review at CACM.

Desaire2023distinguishing argues that it is possible (99% accuracy!). However, OpenAI says that it is not possible to reliably detect AI-generated writing. Google developed a system that can watermark AI-generated content (text, image, …): SynthID

The AI-generated text may have low Perplexity due to self-averaging effect. Because LLMs are trained on a huge amount of text, it is like the average of numerous individuals’ writing. By contrast, a single individual’s writing is special and more unique. But perplexity can be easily increased by replacing some words with less likely ones.

Also, an important issue is that the detector cannot see the prompt. If there was an elaborate, detailed prompt, the perplexity sans the prompt can be higher while the perplexity with the prompt is drastically lower. Hans2024spotting calls this problem “Capybara problem”.

The reported high accuracy of some models may be due to the limited set of text that they examined.

A fundamental problem, especially regarding false positives of AI-writing, is that a lot of texts on the web is already used to train LLMs. That means a text that is similar to what has been on the web, especially those that are duplicated many times, will inevitably resemble texts by LLMs.

Methods

Articles

The Science of Detecting LLM-Generated Texts
Detecting LLM-Generated Texts
Liang2024monitoring - a study on the AI conference Peer review.
Latona2024ai argues that AI peer reviewers may prefer AI writers.