Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>It is no wonder that each of the separate networks yields better performance on the according training set it has been trained on. But these prediction error values are misleading, because it is an <em>ill-posed</em> problem to minimize the error on a training set. Your ultimate goal is to maximize the generalization performance of your model, so it performs well on new data it has not seen during training. Imagine a network which just memorizes each of the characters and thus functions more like a hashtable. Such a network would yield 0 errors on the training data but would perform badly on other data.</p> <p>One way to measure generalization performance is to extract a fraction (e.g. 10%) of your available data and to use it as a <em>test set</em>. You do not use this test set during training, only for measurement.</p> <p>Further, you should check the topology of your network. How many hidden layers and how many neurons per hidden layer do you use? Make sure your topology is large enough so it can tackle the complexity of your problem. </p> <p>Also have a look at other techniques to improve generalization performance of your network, like <em>L1 regularization</em> (subtracting a small fixed amount of the absolute value of your weights after each training step), <em>L2 regularization</em> (subtracting a small percentage of your weights after each training step) or <a href="http://arxiv.org/pdf/1207.0580.pdf" rel="nofollow">Dropout</a> (randomly turning off hidden units during training and halving the weight vector as soon as training is finished). Further, you should consider more efficient training algorithms like <em>RPROP-</em> or <em>RMSProp</em> rather than plain backpropagation (see <a href="https://www.coursera.org/course/neuralnets" rel="nofollow">Geoffrey Hinton's coursera course on neural networks</a>). You should also consider the MNIST dataset containing written numbers 0-9 for testing your setup (you should easily achieve less than 300 misclassificaitons on the test set). </p> <p>To answer your original question on how to omit certain output neurons, you could create an own layer module. Have a look at the SoftmaxLayer, but before applying the softmax activation function, set all output-neurons to 0 which belong to the classes you want to omit. You need to manipulate the <code>outbuf</code> varable in <code>_forwardImplementation</code>. If you want to use this during training, make sure to set the error signal to zero for those classes before backpropagating the error to the previous layer (by manipulating <code>_backwardImplementation</code>). This can be useful e.g. if you have incomplete data and do not want to throw away each sample containing just one NaN value. But in your case you actually do not need this.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload