word2vec

See also GloVe, wiki2vec, WordRank, sense2vec. A Continuous embedding method for embedding words.

The majority of LLMs are transformers and what they do is obtaining token vectors, which is exactly the same as the word2vec. Of course, the token vectors obtained by LLMs are context-aware and that’s what matters, but at the most basic level, they do the same thing: given a series of tokens, get vector representation of each token.

This paper by Bengio was the early idea of using neural networks to obtain such embedding:https://proceedings.neurips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html

This whole idea is tied to more classical models:https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00134/43264/Improving-Distributional-Similarity-with-Lessons

word2vec Parameter Learning Explained

Here are some others papers to understand the word2vec model better.

https://proceedings.neurips.cc/paper/2014/hash/feab05aa91085b7a8012516bc3533958-Abstract.html

https://proceedings.neurips.cc/paper/2021/hash/ca9541826e97c4530b07dda2eba0e013-Abstract.html

And cool applications:

https://doi.org/10.1038/s41586-019-1335-8

https://journals.sagepub.com/doi/full/10.1177/0003122419877135

http://yongyeol.com/2021/02/13/paper-mobilityembedding.html

Building on the idea of word2vec, ppl thought that it’d be great to take the context into account. That led to the attention mechanisms, which led to the transformers.

https://lilianweng.github.io/posts/2018-06-24-attention/

Bias in language models

Software and Libraries

Intro and Theory

It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector(‘Paris’) - vector(‘France’) + vector(‘Italy’) results in a vector that is very close to vector(‘Rome’), and vector(‘king’) - vector(‘man’) + vector(‘woman’) is close to vector(‘queen’) [3, 1].

Levy2014: Neural World Embedding as Implicit Matrix Factorization
Levy2014a: Linguistic Regularities in Sparse and Explicit Word Representations
word2vec Parameter Learning Explained

Methods

Skip gram | Negative sampling

Tutorials

Articles

Presentations

Topics

Paragaraph vectors

Comparing multiple sets of vectors

word2vec

Software and Libraries

Intro and Theory

Methods

Tutorials

Articles

Presentations

Topics

Paragaraph vectors

Comparing multiple sets of vectors

Optimal dimension

Sentiment analysis

Application to networks

Scalability

Gender bias and Bias in language models

Clinical concepts