StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
4207875
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2010-11-17T18:37:42
FavoriteCount
0
LastActivityDate
2016-06-16T05:54:49.787
LastEditDate
2016-06-16T05:54:49.787
LastEditorUserId
66549
OwnerUserId
66549
ParentId
4207057
PostTypeId
2
Score
8
ViewCount
0
LastEditorDisplayName
text
Body
There are several conventional techniques by which words are mapped to features (columns in a 2D data matrix in which the rows are the individual data vectors) for input to machine learning models.<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.6513" rel="nofollow">classification</a>: <ul> <li>a Boolean field which encodes the presence or absence of that word in a given document;</li> <li>a frequency histogram of a predetermined set of words, often the X most commonly occurring words from among all documents comprising the training data (more about this one in the last paragraph of this Answer);</li> <li>the juxtaposition of two or more words (e.g., 'alternative' and 'lifestyle' in consecutive order have a meaning not related either component word); this juxtaposition can either be captured in the data model itself, eg, a boolean feature that represents the presence or absence of two particular words directly adjacent to one another in a document, or this relationship can be exploited in the ML technique, as a naive Bayesian classifier would do in this instanceemphasized text;</li> <li>words as raw data to extract latent features, eg, <a href="http://en.wikipedia.org/wiki/Latent_semantic_analysis" rel="nofollow">LSA</a> or Latent Semantic Analysis (also sometimes called LSI for Latent Semantic Indexing). LSA is a matrix decomposition-based technique which derives latent variables from the text not apparent from the words of the text itself.</li> </ul> A common reference data set in machine learning is comprised of frequencies of 50 or so of the most common words, aka "stop words" (e.g., a, an, of, and, the, there, if) for published works of Shakespeare, London, Austen, and Milton. A basic multi-layer perceptron with a single hidden layer can separate this data set with 100% accuracy. This data set and variations on it are widely available in ML Data Repositories and <a href="http://escholarship.org/uc/item/9w56v25f.pdf" rel="nofollow">academic papers</a> presenting classification results are likewise common.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to include words as numerical feature in classification
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USdoug
UserOwnerUserId
1. USdoug
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow to include words as numerical feature in classification
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.