Large pre-trained language models contain human-like biases of what is right and wrong to do
Moral judgment, Continuous embedding, Neural language models
Large language models have moral judgment as a “bias”. This can be extracted by formulating a prompt that signals moral judgment and applying PCA of the resulting embedding. The first principal component is strongly correlated with the human judgment.