StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><a href="http://en.wikipedia.org/wiki/Model_selection" rel="noreferrer">Model selection</a> using <a href="http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29" rel="noreferrer">cross validation</a> may be what you need.</p> <h3>Cross validation</h3> <p>What you do is simply to split your dataset into k non-overlapping subsets (folds), train a model using k-1 folds and predict its performance using the fold you left out. This you do for each possible combination of folds (first leave 1st fold out, then 2nd, ... , then kth, and train with the remaining folds). After finishing, you estimate the mean performance of all folds (maybe also the variance/standard deviation of the performance).</p> <p>How to choose the parameter k depends on the time you have. Usual values for k are 3, 5, 10 or even N, where N is the size of your data (that's the same as <em>leave-one-out cross validation</em>). I prefer 5 or 10.</p> <h3>Model selection</h3> <p>Let's say you have 5 methods (ANN, SVM, KNN, etc) and 10 parameter combinations for each method (depending on the method). You simply have to run cross validation for each method and parameter combination (5 * 10 = 50) and select the best model, method and parameters. Then you re-train with the best method and parameters on all your data and you have your final model.</p> <p>There are some more things to say. If, for example, you use a <em>lot of methods and parameter combinations</em> for each, it's very likely you will overfit. In cases like these, you have to use <em>nested cross validation</em>.</p> <h3>Nested cross validation</h3> <p>In <em>nested cross validation</em>, you perform cross validation on the model selection algorithm.</p> <p>Again, you first split your data into k folds. After each step, you choose k-1 as your training data and the remaining one as your test data. Then you run model selection (the procedure I explained above) for each possible combination of those k folds. After finishing this, you will have k models, one for each combination of folds. After that, you test each model with the remaining test data and choose the best one. Again, after having the last model you train a new one with the same method and parameters on all the data you have. That's your final model.</p> <p>Of course, there are many variations of these methods and other things I didn't mention. If you need more information about these look for some publications about these topics.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload