I have a naive understanding of things so far.
For example, suppose we’re talking about classification problems. For typical SVM, the classification problem turns into looking for a linear boundary between the data (image from wiki):
But usually the boundary is not linear, so that’s why kernel methods (kernel svm, logistic regressions) are introduced, in hope to map/stretch the nonlinear boundary into linear:
On the other hand, deep learning approach the problem from a totally different way. Instead of “stretching” the boundary, DN parameterizes and approximates the boundary.
I’m not an expert in kernel SVM or DL, so that’s just a high level understanding of mine. Hopefully I can explain things more clearly in the future posts.
Here’s a good example how DL does the approximation by stacking multiple layers:
Theoretically, this idea is backed up by “universal approximation theorem” (Goodfellow’s DL book, 6.4.1 ), that any functional could be approximated by a high dimensional network (although there’s no linear bound of the layer # could be found).