Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning

The importance of Tokenizer in LLMs.

A lot of fun figures. This is an abstracted excerpt from the first author Julia Witte Zimmerman‘s dissertation.