Bias in language models

Bias (Racial bias, Gender bias), Language model, Word embedding

StereoSet is “a dataset that measures stereotype bias in language models”:

Joseph2020when studies the bias in the Word embedding to surveys of beliefs.

The bias is often assumed to be linear in the embedding space. Maybe it’s on a non-linear Manifold? If that’s the case, how can we probe the manifold of bias? Joaquín Goñi‘s work on Riemannian geometry (e.g., abbas2021geodesic and upcoming) may be an inspiration.