StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
13885734
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2012-12-14T20:24:47.093
FavoriteCount
0
LastActivityDate
2012-12-16T20:49:43.587
LastEditDate
2012-12-16T20:49:43.587
LastEditorUserId
1667256
OwnerUserId
1667256
ParentId
13880174
PostTypeId
2
Score
2
ViewCount
0
LastEditorDisplayName
text
Body
The data in <code>[publicdata:samples.github_timeline]</code> looks like snapshots of every repository at different timestamps. If that is the case, to calculate the change of fork number per repo per month, I don't think you should do <code>SUM(repository_forks)</code>. Instead you want to get the first snapshot and the last snapshot of every month and do a <code>minus</code> calculation to get the <code>delta</code>. The result is from the below query: <pre class="lang-sql prettyprint-override"><code>select repository_name, created_at, repository_forks from [publicdata:samples.github_timeline] where repository_name='Bukkit' order by created_at; </code></pre> <img src="https://i.stack.imgur.com/Ydauk.png" alt="enter image description here"> However, I don't understand why at <code>2012-03-11 08:30:21</code>, the number of repository_forks from <code>Bukkit</code> is zero. It might be a data error? If it is data error, I will treat them as outliers. Setting some threshold on it might be able to remove those outliers. Note the threshold I set: <code>where repository_forks > 10</code> in order to skip the bad data. <pre class="lang-sql prettyprint-override"><code>SELECT top100.repository_name, substr(created_at, 0, 7) month, max(repository_forks)-min(repository_forks) monthly_increase, min(repository_forks) monthly_begin_at, max(repository_forks) monthly_end_with FROM [githubarchive:github.timeline] timeline JOIN (SELECT repository_name , MAX(repository_forks) as forks FROM [githubarchive:github.timeline] WHERE (created_at CONTAINS "2012-04-01") GROUP BY repository_name ORDER BY forks DESC LIMIT 100) top100 on timeline.repository_name = top100.repository_name where repository_forks > 10 GROUP BY top100.repository_name, month ORDER BY top100.repository_name, month; </code></pre> And the result looks like: <img src="https://i.stack.imgur.com/U7edO.png" alt="enter image description here"> If I am wrong and the number of repository_forks is already a change, you can go ahead and do the sum over repository_forks as what you did. Then it's actually easier: <pre><code>SELECT repository_name, substr(created_at,0,7) as month, SUM(repository_forks) as forks FROM [publicdata:samples.github_timeline] timeline JOIN (SELECT repository_url , MAX(repository_forks) as forks FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-04-01") GROUP BY repository_url ORDER BY forks DESC LIMIT 100) top100 on timeline.repository_url = top100.repository_url GROUP BY repository_name, month ORDER BY repository_name, month DESC; </code></pre> <img src="https://i.stack.imgur.com/AhkSu.png" alt="enter image description here"> <h2>Update:</h2> yes. I changed the dataset to point to <code>githubarchive:github.timeline</code>, then I have data until December, 2012. Corresponding <code>sql</code> and results are updated. But the data quality is not good, still see a lot of <code>outlier</code> data points. 
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POTracing the growth of top 100 repositories on GitHub?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USgreeness
UserOwnerUserId
1. USgreeness
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POTracing the growth of top 100 repositories on GitHub?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.