word2vec
See also GloVe, wiki2vec, WordRank, sense2vec. A Continuous embedding method for embedding words.
The majority of LLMs are transformers and what they do is obtaining token vectors, which is exactly the same as the word2vec. Of course, the token vectors obtained by LLMs are context-aware and that’s what matters, but at the most basic level, they do the same thing: given a series of tokens, get vector representation of each token.
This paper by Bengio was the early idea of using neural networks to obtain such embedding:https://proceedings.neurips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html
This whole idea is tied to more classical models:https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00134/43264/Improving-Distributional-Similarity-with-Lessons
word2vec Parameter Learning Explained
Here are some others papers to understand the word2vec model better.
https://proceedings.neurips.cc/paper/2014/hash/feab05aa91085b7a8012516bc3533958-Abstract.html
https://proceedings.neurips.cc/paper/2021/hash/ca9541826e97c4530b07dda2eba0e013-Abstract.html
And cool applications:
https://doi.org/10.1038/s41586-019-1335-8
https://journals.sagepub.com/doi/full/10.1177/0003122419877135
http://yongyeol.com/2021/02/13/paper-mobilityembedding.html
Building on the idea of word2vec, ppl thought that it’d be great to take the context into account. That led to the attention mechanisms, which led to the transformers.
https://lilianweng.github.io/posts/2018-06-24-attention/
Software and Libraries
Intro and Theory
- A Neural Probabilistic Language Model
- Distributed Representations of Words and Phrases and their Compositionality
It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector(‘Paris’) - vector(‘France’) + vector(‘Italy’) results in a vector that is very close to vector(‘Rome’), and vector(‘king’) - vector(‘man’) + vector(‘woman’) is close to vector(‘queen’) [3, 1].
- Levy2014: Neural World Embedding as Implicit Matrix Factorization
- Levy2014a: Linguistic Regularities in Sparse and Explicit Word Representations
- word2vec Parameter Learning Explained
Methods
Tutorials
- http://rare-technologies.com/word2vec-tutorial/
- Deep Learning Basics: Neural Networks, Backpropagation and Stochastic Gradient Descent
- 한국어와 NLTK, Gensim의 만남
- Demystifying Word2Vec
- Jurafsky and Martin: Vector Semantics, Part II
- Chris McCormick Word2Vec Tutorial - The Skip-Gram Model
- TensorFlow Vector Representations of Words
Articles
- http://deeplearning4j.org/word2vec.html#crazy
-
what does the output vector of a word in word2vec represent?
Presentations
- Machine Perception with Neural Networks by Ilya Sutskever
- Text By the Bay 2015: Chris Moody, A Word is Worth a Thousand Vectors