LLMs embedding

LLMs (Transformer model) converts tokens into Embedding (continuous representation) to do computation, and then outputs tokens back. In a way, this may limit the ability to think more freely (confined by exact tokens rather than nuanced thinking). Indeed, some papers show that it is possible to allow LLMs to reason in embedding space without going through the token (collapsing). See Latent reasoning.