StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHigh error on training set when asked to predict on training set but average loss while training is low
text
Body
copied!<p>I am training a model using vowpal wabbit and notice something very strange. During training, the average loss reported is very low somewhere around 0.06. However I notice that when I asked the model to predict labels on the same training data, the average loss is high around ~0.66 and the model performs poorly on predicting labels for even the training data. My initial conjecture was that the model suffered a high bias, and hence I increased the complexity of the model to use 300 hidden nodes in the first layer, but still the problem persists.</p> <p>I would greatly appreciate pointers on what could be going on</p> <p>The tutorial slides for VW mentions: "If you test on the train set, does it work? (no => something crazy)" </p> <p>So something very crazy seems to be happening and I am trying to understand where I should dig deeper possibly.</p> <p>More details: I am using vowpal wabbit for a named entity recognition task where features are word representations. I am trying several models using neural networks with multiple hidden units and trying to evaluate the model. However all of my trained models exhibit high average loss when tested on the training data itself which I find very odd. </p> <p>Here is one way to reproduce the problem:</p> <p>Output of training:</p> <pre><code>vw -d ~/embeddings/eng_train_4.vw --loss_function logistic --oaa 6 --nn 32 -l 10 --random_weights 1 -f test_3.model --passes 4 -c final_regressor = test_3.model Num weight bits = 18 learning rate = 10 initial_t = 0 power_t = 0.5 decay_learning_rate = 1 using cache_file = /home/vvkulkarni/embeddings/eng_train_4.vw.cache ignoring text input in favor of cache input num sources = 1 average since example example current current current loss last counter weight label predict features 0.666667 0.666667 3 3.0 1 1 577 0.833333 1.000000 6 6.0 1 2 577 0.818182 0.800000 11 11.0 4 4 577 0.863636 0.909091 22 22.0 1 4 577 0.636364 0.409091 44 44.0 1 1 577 0.390805 0.139535 87 87.0 1 1 577 0.258621 0.126437 174 174.0 1 1 577 0.160920 0.063218 348 348.0 1 1 577 0.145115 0.129310 696 696.0 1 1 577 0.138649 0.132184 1392 1392.0 1 1 577 0.122486 0.106322 2784 2784.0 1 1 577 0.097522 0.072557 5568 5568.0 1 1 577 0.076875 0.056224 11135 11135.0 1 1 577 0.058647 0.040417 22269 22269.0 1 1 577 0.047803 0.036959 44537 44537.0 1 1 577 0.038934 0.030066 89073 89073.0 1 1 577 0.036768 0.034601 178146 178146.0 1 1 577 0.032410 0.032410 356291 356291.0 1 1 577 h 0.029782 0.027155 712582 712582.0 1 1 577 h finished run number of examples per pass = 183259 passes used = 4 weighted example sum = 733036 weighted label sum = 0 average loss = 0.0276999 best constant = 0 total feature number = 422961744 </code></pre> <p>Now when I evaluate the model above using the same data (used for training)</p> <pre><code>vw -t ~/embeddings/eng_train_4.vw -i test_3.model -p test_3.pred only testing Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 predictions = test_3.pred using no cache Reading datafile = /home/vvkulkarni/embeddings/eng_train_4.vw num sources = 1 average since example example current current current loss last counter weight label predict features 0.333333 0.333333 3 3.0 1 1 577 0.500000 0.666667 6 6.0 1 4 577 0.636364 0.800000 11 11.0 6 3 577 0.590909 0.545455 22 22.0 1 1 577 0.500000 0.409091 44 44.0 4 1 577 0.482759 0.465116 87 87.0 1 1 577 0.528736 0.574713 174 174.0 1 3 577 0.500000 0.471264 348 348.0 1 3 577 0.517241 0.534483 696 696.0 6 1 577 0.536638 0.556034 1392 1392.0 4 4 577 0.560345 0.584052 2784 2784.0 1 5 577 0.560884 0.561422 5568 5568.0 6 2 577 0.586349 0.611820 11135 11135.0 1 1 577 0.560914 0.535477 22269 22269.0 1 1 577 0.557200 0.553485 44537 44537.0 1 1 577 0.568938 0.580676 89073 89073.0 1 2 577 0.560568 0.552199 178146 178146.0 1 1 577 finished run number of examples per pass = 203621 passes used = 1 weighted example sum = 203621 weighted label sum = 0 average loss = 0.557428 <<< This is what is tricky. best constant = -4.91111e-06 total feature number = 117489309 </code></pre> <p>Things I have tried: 1.I tried increasing the number of hidden nodes to 600 but to no avail. 2.I also tried using quadratic features with 300 hidden nodes but that did not help either.</p> <p>The rationale behind trying 1.) and 2.) was to increase model complexity assuming that high training error was due to high bias.</p> <p>Update: Even more intrestingsly, if I however specify the number of passes to be 4 in the testing phase (even though I assumed the model would have learnt a decision boundary), then the problem goes away. I am trying to understand why ?</p> <pre><code>vvkulkarni@einstein:/scratch1/vivek/test$ vw -t ~/embeddings/eng_train_4.vw -i test_3.model -p test_3_1.pred --passes 4 -c only testing Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 decay_learning_rate = 1 predictions = test_3_1.pred using cache_file = /home/vvkulkarni/embeddings/eng_train_4.vw.cache ignoring text input in favor of cache input num sources = 1 average since example example current current current loss last counter weight label predict features 0.333333 0.333333 3 3.0 1 1 577 0.166667 0.000000 6 6.0 1 1 577 0.090909 0.000000 11 11.0 4 4 577 0.045455 0.000000 22 22.0 1 1 577 0.022727 0.000000 44 44.0 1 1 577 0.011494 0.000000 87 87.0 1 1 577 0.017241 0.022989 174 174.0 1 1 577 0.022989 0.028736 348 348.0 1 1 577 0.020115 0.017241 696 696.0 1 1 577 0.043822 0.067529 1392 1392.0 1 1 577 0.031968 0.020115 2784 2784.0 1 1 577 0.031968 0.031968 5568 5568.0 1 1 577 0.032959 0.033950 11135 11135.0 1 1 577 0.029952 0.026944 22269 22269.0 1 1 577 0.029212 0.028471 44537 44537.0 1 1 577 0.030481 0.031750 89073 89073.0 1 1 577 0.028673 0.026866 178146 178146.0 1 1 577 0.034001 0.034001 356291 356291.0 1 1 577 h 0.034026 0.034051 712582 712582.0 1 1 577 h </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload