LLM Tokenizer

Bsky thread: What are your favorite papers on tokenizers (for pretraining data processing) and their downstream effects?