StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
1032647
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2009-06-23T13:40:53.427
FavoriteCount
0
LastActivityDate
2009-06-23T13:40:53.427
LastEditDate
LastEditorUserId
0
OwnerUserId
15842
ParentId
1014927
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
This answer isn't directly to the posters' question, but to the meta question of how to autotag news items. The OP mentions Named Entity Recognition, but I believe they mean something more along the line of autotagging. If they really mean NER, then this response is hogwash :) Given these constraints (600 items / day, 100-200 characters / item) with divergent sources, here are some tagging options: <ol> <li>By hand. An analyst could easily do 600 of these per day, probably in a couple of hours. Something like Amazon's Mechanical Turk, or making users do it, might also be feasible. Having some number of "hand-tagged", even if it's only 50 or 100, will be a good basis for comparing whatever the autogenerated methods below get you. </li> <li>Dimentionality reductions, using LSA, Topic-Models (Latent Dirichlet Allocation), and the like.... I've had really poor luck with LSA on real-world data sets and I'm unsatisfied with its statistical basis. LDA I find much better, and has an <a href="https://lists.cs.princeton.edu/mailman/listinfo/topic-models" rel="nofollow noreferrer">incredible mailing list</a> that has the best thinking on how to assign topics to texts.</li> <li>Simple heuristics... if you have actual news items, then exploit the structure of the news item. Focus on the first sentence, toss out all the common words (stop words) and select the best 3 nouns from the first two sentences. Or heck, take all the nouns in the first sentence, and see where that gets you. If the texts are all in english, then do part of speech analysis on the whole shebang, and see what that gets you. With structured items, like news reports, LSA and other order independent methods (tf-idf) throws out a lot of information. </li> </ol> Good luck! (if you like this answer, maybe retag the question to fit it)
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POLSA - Latent Semantic Analysis - How to code it in PHP?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USGregg Lind
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COThank you very much. You're right, I meant autotagging. But I definitely don't want to manually tag articles (1). Approach 3 is too simple and gives too poor results (already tried this out). But approach 2 sounds good and this is what my question is about. ;) I want to autotag (I didn't use this word, but other words which are wrong, maybe) news articles with LSA. LDA sounds good, too, but it's a method for classification, not for tagging I think.
 singulars
 PostPostId
 PO
 UserUserId
 UScaw
2. COLDA works for tagging too. All of these techniques are attempts to reduce the dimensionality (the basis) of the document space.
 singulars
 PostPostId
 PO
 UserUserId
 USGregg Lind

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.