StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
7764501
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2011-10-14T07:38:57.157
FavoriteCount
0
LastActivityDate
2011-10-14T09:00:47.990
LastEditDate
2011-10-14T09:00:47.990
LastEditorUserId
994720
OwnerUserId
994720
ParentId
7762167
PostTypeId
2
Score
6
ViewCount
0
LastEditorDisplayName
text
Body
There are two broad schools of classification: 1) Discriminative - Here we try to learn a decision boundary from the training examples. Then based on which part of space the test example lies in, as determined by the decision boundary, we assign it a class. The state-of-the-art algorithm is the <a href="http://en.wikipedia.org/wiki/Support_vector_machine" rel="noreferrer">SVM</a>, but you need kernels if your data is can't be separated by a line (for e.g it is separable by a circle). Modifications to SVM for Multi-class (many ways of doing this, here's one): Let the jth (of k) training example xj be in class i (of N). Then its label yj = i. a) Feature Vector: If xj = a training example belonging to class i (of N) then the Feature Vector corresponding to xj is phi(xj,yj) = [0 0 ... X .. 0] <ul> <li>Note: X is in the ith "position". phi has a total of D*N components, where each example has D features e.g. a picture of an onion has D = 640*480 greyscale integers</li> <li>Note: For other classes p i.e y = p, phi(xj, y) has "X" in the feature vector in position p, all other zero.</li> </ul> b) Constraints: Minimize W^2 (as in Vanilla SVM) such that: 1) For all labels y except y1: W.phi(x1,y1) >= W.phi(x1, y) + 1 and 2) For all labels y except y2: W.phi(x2,y2) >= W.phi(x2, y) + 1 ... and k) For all labels y except yk: W.phi(xk, yk) >= W.phi(xk, y) + 1 <ul> <li>Note: The intuition here is that W.phi(xj, yj) is more than all other W.phi(xj, y1), W.phi(xj, y2) etc.</li> </ul> 2) Generative - Here we ASSUME (which may turn out to be nonsense) that each example was generated by a probability distribution for that class (like a gaussian for males faces and one for female faces which works well in practice) & we try to learn the parameters - mean, covariance - of each distribution by calculating the mean, covariance of the training examples corresponding to that class. Then for a test example we see which distribution gives the highest probability and classify accordingly. Neither uses N yes-no classifiers. The discriminative method works better in practice for classification, but can't model probabilistic answers. It also needs a large number of training examples for the optimization step (minimize W^2) to converge. There is a technique to combine the two, avoiding kernels, called Maximum Entropy Discrimination. To answer your other question: <blockquote> what do I do about a picture that gets high scores from both? Is there some way to get a single, mushroom-or-onion classifier that somehow knows that there is no overlap between these two classes of vegetation? </blockquote> This is more of a problem with the input data, not with the learning algorithm itself which just works on a matrix of numbers. It could reflect noise/uncertainty in the domain (aka can humans tell mushrooms apart from onions perfectly??). This maybe fixed by a larger/better (training) dataset. Or maybe you picked a bad distribution to model, in the generative case. Most people would pre-process the raw images, prior to classification, in a stage called Feature Selection. One feature selection technique could be to capture the silhouette of the vegetable since mushrooms and onions have different shapes, the rest of the image maybe "noise". In other domains like Natural language processing, you could drop prepositions, and retain a count of the different nouns. But sometimes performance may not improve because the learning algorithm might not look at all the features anyway. It really depends on what you're trying to capture - creativity is involved. Feature selection algorithms also exist. A good resource for machine learning are <a href="http://www.cs.columbia.edu/~jebara/courses.html" rel="noreferrer">Tony Jebara's courses</a> at Columbia University
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POFor classifying data into N classes, is there an alternative to using N yes-no classifiers?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USsatish b
UserOwnerUserId
1. USsatish b
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POFor classifying data into N classes, is there an alternative to using N yes-no classifiers?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId
1. CONice! If I understand #1 right, this is basically applying SVM to the Cartesian product space of {data classes} X {data coordinates}. That's one of those elegant solutions that is obvious once you already know about it :D. Thanks for going along with my silly example also. The silhouette feature selection is a really good idea. Welcome to Stack Overflow!
 singulars
 PostPostId
 PO
 UserUserId
 USandronikus
2. CO@andronikus Thank you, you're welcome. Yes, I concur with your understanding of #1.
 singulars
 PostPostId
 PO
 UserUserId
 USsatish b

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.