ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

A reduced LLMs from BERT. It reduces the space complexity of BERT by using

  1. Matrix factorization of word piece embedding by considering context-independent representation and context-dependent representation.
  2. Parameter sharing across layers.