StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
8590687
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2011-12-21T13:39:23.250
FavoriteCount
0
LastActivityDate
2011-12-24T16:06:39.423
LastEditDate
2011-12-24T16:06:39.423
LastEditorUserId
811001
OwnerUserId
811001
ParentId
8482800
PostTypeId
2
Score
8
ViewCount
0
LastEditorDisplayName
text
Body
First thing to notice is that this can be approximated by a local problem. That is to say, a "trending" word really depends only upon recent data. So immediately we can truncate our data to the most recent <code>N</code> days where <code>N</code> is some experimentally determined optimal value. This significantly cuts down on the amount of data we have to look at. In fact, the <a href="http://www.npr.org/2011/12/07/143013503/how-twitters-trending-algorithm-picks-its-topics" rel="nofollow">NPR article</a> suggests this. Then you need to somehow look at growth. And this is precisely what the derivative captures. First thing to do is normalize the data. Divide all your data points by the value of the first data point. This makes it so that the large growth of an infrequent word isn't drowned out by the relatively small growth of a popular word. For the first derivative, do something like this: <pre><code>d[i] = (data[i] - data[i+k])/k </code></pre> for some experimentally determined value of <code>k</code> (which, in this case, is a number of days). Similarly, the second derivative can be expressed as: <pre><code>d2[i] = (data[i] - 2*data[i+k] + data[i+2k])/(2k) </code></pre> Higher derivatives can also be expressed like this. Then you need to assign some kind of weighting system for these derivatives. This is a purely experimental procedure which really depends on what you want to consider "trending." For example, you might want to give acceleration of growth half as much weight as the velocity. Another thing to note is that you should try your best to remove noise from your data because derivatives are very sensitive to noise. You do this by carefully choosing your value for <code>k</code> as well as discarding words with very low frequencies altogether. I also notice that you multiply by the log sum of the frequencies. I presume this is to give the growth of popular words more weight (because more popular words are less likely to trend in the first place). The standard way of measuring how popular a word is is by looking at it's <a href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf" rel="nofollow">inverse document frequency</a> (IDF). I would divide by the IDF of a word to give the growth of more popular words more weight. <pre><code>IDF[word] = log(D/(df[word)) </code></pre> where <code>D</code> is the total number of documents (e.g. for Twitter it would be the total number of tweets) and <code>df[word]</code> is the number of documents containing <code>word</code> (e.g. the number of tweets containing a word). A high IDF corresponds to an unpopular word whereas a low IDF corresponds to a popular word.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow can I measure trends in certain words, like Twitter?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. UStskuzzy
UserOwnerUserId
1. UStskuzzy
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow can I measure trends in certain words, like Twitter?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.