Zoom In: An Introduction to Circuits

https://distill.pub/2020/circuits/zoom-in/
Chris Olah

AI: Interpretability, Mechanistic interpretability

Content

Basic idea

Biology, neuroscience, etc. have been developed through “zooming in” and identifying the units like cells and neurons. Can we do similar for the Artificial neural network in Deep learning?

One of the earliest articulations of something approaching modern cell theory was three claims by Theodor Schwann — who you may know for Schwann cells — in 1839:

The cell is the unit of structure, physiology, and organization in living things.
The cell retains a dual existence as a distinct entity and a building block in the construction of organisms.
Cells form by free-cell formation, similar to the formation of crystals.

The paper proposes three speculative claims about neural networks:

Features: Features are the fundamental unit of neural networks. They correspond to directions. These features can be rigorously studied and understood.
Circuits: Features are connected by weights, forming circuits. These circuits can also be rigorously studied and understood.
Universality: Analogous features and circuits form across models and tasks.

Examples

The paper uses Feature visualization as a way to argue the existence of a certain function (e.g., curve detection).