The Expressive Power of Transformers with Chain of Thought

See also Deng2024from that claimed we can train LLMs to “internalize” Chain-of-thought reasoning. Does this paper prove that that type of internalization is not possible in general?