KEYNOTE: Least Squares Support Vector Machines and Deep Learning
Johan Suykens, Katholieke Universiteit Leuven, BELGIUM
-
CIS
IEEE Members: Free
Non-members: FreeLength: 01:07:40
Johan Suykens, Katholieke Universiteit Leuven, BELGIUM
ABSTRACT: "While powerful architectures have been proposed in deep learning, with support vector machines and kernel-based methods solid foundations have been obtained from the perspective of statistical learning theory and optimization. Simple core models were obtained within the least squares support vector machines framework, related to classification, regression, kernel principal component analysis, kernel canonical correlation analysis, kernel spectral clustering, recurrent models, approximate solutions to partial differential equations and optimal control problems, etc. The representations of the models are understood in terms of primal and dual representations, respectively related to feature maps and kernels. The insights have been exploited for tailoring representations to given data characteristics, both for high dimensional input data and large scale data sets. One can either work with explicit feature maps (such as e.g. convolutional feature maps) or implicit feature maps through the kernel functions.
Within this talk we will mainly focus on new insights connecting deep learning and least squares support vector machines. Related to Restricted Boltzmann machines and Deep Boltzmann machines we show how least squares support vector machine models can be transformed into so-called Restricted Kernel Machine representations. It enables to conceive new deep kernel machines, generative models, multi-view and tensor based models with latent space exploration, and obtain improved robustness and explainability. On most recent work, we will explain how the attention mechanism in transformers can be seen within the least squares support vector machine framework. More precisely it can be represented as an extension to asymmetric kernel singular value decomposition with primal and dual model representations, related to two feature maps (queries and keys) and an asymmetric kernel. In the resulting method of ""Primal-Attention"" a regularized loss is employed to achieve low-rank representations for efficient training in the primal.
Finally, these newly obtained synergies are very promising in order to obtain the bigger and unifying picture. Several future challenges will be outlined from this perspective."