I have a naive understanding of things so far.

For example, suppose we’re talking about classification problems. For typical SVM, the classification problem turns into looking for a linear boundary between the data (image from wiki):

But usually the boundary is not linear, so that’s why kernel methods (kernel svm, logistic regressions) are introduced, in hope to map/stretch the nonlinear boundary into linear:

On the other hand, deep learning approach the problem from a totally different way. Instead of “stretching” the boundary, DN parameterizes and approximates the boundary.

I’m not an expert in kernel SVM or DL, so that’s just a high level understanding of mine. Hopefully I can explain things more clearly in the future posts.

Here’s a good example how DL does the approximation by stacking multiple layers:

https://www.r-bloggers.com/a-primer-on-universal-function-approximation-with-deep-learning-in-torch-and-r/

Theoretically, this idea is backed up by “universal approximation theorem” (Goodfellow’s DL book, 6.4.1 ), that any functional could be approximated by a high dimensional network (although there’s no linear bound of the layer # could be found).

### Like this:

Like Loading...

## Leave a Reply