Multi-view representation learning
Multi-view (or multi-modality) learning aims to incorporate multiple “views” such as images, texts, audio, etc. In the context of networks, it can refer to multiple layers of a network.
See also a survey: Li2018survey
How is it connected to classical methods such as Canonical correlation?