Multi-view representation learning

Multi-view (or multi-modality) learning aims to incorporate multiple “views” such as images, texts, audio, etc. In the context of networks, it can refer to multiple layers of a network.

See also a survey: Li2018survey

How is it connected to classical methods such as Canonical correlation?