Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>There's a very well known result in machine learning that states that a single hidden layer is enough to approximate any smooth, bounded function (the paper was called <a href="http://portal.acm.org/citation.cfm?id=70408" rel="noreferrer">"Multilayer feedforward networks are universal approximators"</a> and it's now almost 20 years old). There are several things to note, however.</p> <ul> <li>The single hidden layer may need to be arbitrarily wide. </li> <li>This says nothing about the ease with which an approximation may be found; in general large networks are hard to train properly and fall victim to overfitting quite frequently (the exception are so-called "convolutional neural networks" which really are only meant for vision problems).</li> <li>This also says nothing about the efficiency of the representation. Some functions require exponential numbers of hidden units if done with one layer but scale much more nicely with more layers (for more discussion of this read <a href="http://yann.lecun.com/exdb/publis/pdf/bengio-lecun-07.pdf" rel="noreferrer">Scaling Learning Algorithms Towards AI</a>)</li> </ul> <p>The problem with deep neural networks is that they're even harder to train. You end up with very very small gradients being backpropagated to the earlier hidden layers and the learning not really going anywhere, especially if weights are initialized to be small (if you initialize them to be of larger magnitude you frequently get stuck in bad local minima). There are some techniques for "pre-training" like the ones discussed in this <a href="http://www.youtube.com/watch?v=AyzOUbkUf3M" rel="noreferrer">Google tech talk</a> by Geoff Hinton which attempt to get around this.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload