Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

Transformer model

We discover temporal heads, specific attention heads primarily responsible for processing temporal knowledge through circuit analysis.

TKC: temporal knowledge circuit.

Feed data with temporal information, like “In 2003, the president of South Korea was … ” and then perform ablation to identify the heads.

Circuit analysis considers transformer’s computation as a DAG between attention heads and MLP modules, input node, and output node. and the edges are between these nodes. A circuit is a subgraph that explain a specific behavior.