StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>The random weights given to a Neural Network often immediately restrict the portion of the search space that will be available during learning. This is particularly true when learning rates are small.</p> <p>However, in the XOR case (using a 3-3-1 topology) there should not be any local minima.</p> <p>My recommendation is that since the network is so tiny that you should print the edge weights when it seems stuck in a local minima. You should be able to quickly evaluate whether or not the weights appear to be correct and how far away the values are from giving you a perfect network.</p> <p>One trick that made a large difference for me was instead of updating the weights immediately after each piece of training data was to batch the errors up and update the weights at the end of an epoch. That prevented my network from being swayed early on if the first half of my input data belonged to the same classification bucket.</p> <p>Which brings me to my next point, are you sure you have an evenly distributed number of training examples? If you provide a neural network with 900 positive classification results but only 100 negative classification results sometimes the network thinks it's just easier to say everything is within the classification group because it only has a 10% error rate if it does. Many learning algorithms are extremely good at finding these kinds of things.</p> <p>Lastly, the activation function should make little-to-no difference whether or not it hits local minima. The activation function serves primarily as a way to project the domain of reals onto a much smaller known range; (0,1) for sigmoid and (-1,1) for the hyperbolic tangent activation function. You can think of this as a way of enforcing equality across all of your learned features at a given neural layer (a.k.a. feature scaling). Since the input domain is not known before hand it's not as simple as regular feature scaling for linear regression and thusly activation functions must be used but it is otherwise compensated for when computing errors during back propagation.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload