Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There are two broad schools of classification:</p> <p>1) <strong>Discriminative</strong> - Here we try to learn a decision boundary from the training examples. Then based on which part of space the test example lies in, as determined by the decision boundary, we assign it a class. The state-of-the-art algorithm is the <a href="http://en.wikipedia.org/wiki/Support_vector_machine" rel="noreferrer">SVM</a>, but you need kernels if your data is can't be separated by a line (for e.g it is separable by a circle).</p> <p>Modifications to SVM for Multi-class (many ways of doing this, here's one): </p> <p>Let the jth (of k) training example xj be in class i (of N). Then its label yj = i.</p> <p>a) Feature Vector: If xj = a training example belonging to class i (of N) then the Feature Vector corresponding to xj is phi(xj,yj) = [0 0 ... X .. 0]</p> <ul> <li><p>Note: X is in the ith "position". phi has a total of D*N components, where each example has D features e.g. a picture of an onion has D = 640*480 greyscale integers</p></li> <li><p>Note: For other classes p i.e y = p, phi(xj, y) has "X" in the feature vector in position p, all other zero.</p></li> </ul> <p>b) Constraints: Minimize W^2 (as in Vanilla SVM) such that:</p> <p>1) For all labels y except y1: W.phi(x1,y1) >= W.phi(x1, y) + 1</p> <p>and 2) For all labels y except y2: W.phi(x2,y2) >= W.phi(x2, y) + 1</p> <p>...</p> <p>and k) For all labels y except yk: W.phi(xk, yk) >= W.phi(xk, y) + 1</p> <ul> <li>Note: The intuition here is that W.phi(xj, yj) is more than all other W.phi(xj, y1), W.phi(xj, y2) etc.</li> </ul> <p>2) <strong>Generative</strong> - Here we ASSUME (which may turn out to be nonsense) that each example was generated by a probability distribution for that class (like a gaussian for males faces and one for female faces which works well in practice) &amp; we try to learn the parameters - mean, covariance - of each distribution by calculating the mean, covariance of the training examples corresponding to that class. Then for a test example we see which distribution gives the highest probability and classify accordingly.</p> <p>Neither uses N yes-no classifiers.</p> <p>The discriminative method works better in practice for classification, but can't model probabilistic answers. It also needs a large number of training examples for the optimization step (minimize W^2) to converge. There is a technique to combine the two, avoiding kernels, called Maximum Entropy Discrimination.</p> <p>To answer your other question: </p> <blockquote> <p>what do I do about a picture that gets high scores from both? Is there some way to get a single, mushroom-or-onion classifier that somehow knows that there is no overlap between these two classes of vegetation?</p> </blockquote> <p>This is more of a problem with the input data, not with the learning algorithm itself which just works on a matrix of numbers. It could reflect noise/uncertainty in the domain (aka can humans tell mushrooms apart from onions perfectly??). This maybe fixed by a larger/better (training) dataset. Or maybe you picked a bad distribution to model, in the generative case. </p> <p>Most people would pre-process the raw images, prior to classification, in a stage called Feature Selection. One feature selection technique could be to capture the silhouette of the vegetable since mushrooms and onions have different shapes, the rest of the image maybe "noise". In other domains like Natural language processing, you could drop prepositions, and retain a count of the different nouns. But sometimes performance may not improve because the learning algorithm might not look at all the features anyway. It really depends on what you're trying to capture - creativity is involved. Feature selection algorithms also exist.</p> <p>A good resource for machine learning are <a href="http://www.cs.columbia.edu/~jebara/courses.html" rel="noreferrer">Tony Jebara's courses</a> at Columbia University</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload