Bias in language models
Bias (Racial bias, Gender bias), Language model, Word embedding
StereoSet is “a dataset that measures stereotype bias in language models”: https://stereoset.mit.edu
Joseph2020when studies the bias in the Word embedding to surveys of beliefs.
The bias is often assumed to be linear in the embedding space. Maybe it’s on a non-linear Manifold? If that’s the case, how can we probe the manifold of bias? Joaquín Goñi‘s work on Riemannian geometry (e.g., abbas2021geodesic and upcoming) may be an inspiration.