StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>The optimal parameters will depend on your data. Your best (and perhaps only) option is to try multiple parameter sets in succession, and see which one gives you the best performance, by whichever metric you choose.</p> <p>As for plotting the training and test errors - a good way to assess a classifier is by using the F-measure as a metric of performance. This allows you to take account of both false positive and false negative errors, and weight them as appropriate to your particular domain. If you mean something else by plotting the training and test errors, please clarify.</p> <p>EDIT: in response to your comment</p> <p>LibSVM doesn't know how to optimize its own parameters, either - that's why you need to provide it with parameters as an argument to the svm_train function. You need to optimize your own parameters experimentally, and to do that you'll require some single quantitative measure of performance. I'm not sure what you mean by 30-value problems, but you should be able to use F-measure by creatively redefining true positive, false positive, true negative, and false negative.</p> <p>You have two options: one which is more comprehensive, and one which is computationally cheaper. Either you can use a three-layer nested loop to test a variety of possible combinations of gamma, C, and epsilon, choosing the parameters which result in the highest performance on test data (I advise using cross-validation to avoid overfitting to specific test data), or you can optimize each in succession - first, given some bland, default C and epsilon, iterate through many gamma values until you find the best; then do the same for C and epsilon.</p> <p>If you want to enhance the second method, make it such that when you're optimizing each parameter, use the best value for all the other parameters instead of some default, and optimize each parameter multiple times (so that they can be run with successively better values in other parameters).</p> <p>To make either method more precise (though always at the cost of potential overfitting, remember that), use a telescoping search - say that you search from 1 to 101 the first time, with a step size of 10, so you search 1, 11, 21...101. On the next run through after you've achieved a best value of let's say 51, search through 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, so that you reuse the same information but become more precise.</p> <p>To make either method less sensitive to random fluctuations (say in the random folds you're generating for cross-validation), run multiple cross-validation tests with the default parameters (good defaults, I would say, are probably 1.0 for C, and 1E-9 for epsilon, I'm not sure about gamma) and obtain the mean and standard deviation of the performance measure you use. Then, you can tell if a given performance measure is statistically significantly better than the second-best, or than just using the default parameters.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload