Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning
- https://arxiv.org/abs/2412.10924
- Julia Witte Zimmerman, Denis Hudon, Kathryn Cramer, Alejandro J. Ruiz, Calla Beauregard, Ashley Fehr, Mikaela Irene Fudolig, Bradford Demarest, Yoshi Meke Bird, Milo Z. Trujillo, Christopher M. Danforth, Peter Sheridan Dodds
The importance of Tokenizer in LLMs.
A lot of fun figures. This is an abstracted excerpt from the first author Julia Witte Zimmerman‘s dissertation.