Negative sampling

A common method to estimate the conditional probability (Language model) in word2vec and similar methods.

It is actually a biased estimator. See Noise Contrasive Estimation suggested by Gutmann2010noise. But due to this bias, it estimates another reasonable conditional probability that takes into account the frequency of the target word.

Yang2020understanding


word2vec < > Freqnecy bias