StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
10464686
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-05-05T18:19:52.767
FavoriteCount
0
LastActivityDate
2012-05-05T18:31:42.167
LastEditDate
2012-05-05T18:31:42.167
LastEditorUserId
368896
OwnerUserId
368896
ParentId
10464184
PostTypeId
2
Score
0
ViewCount
0
LastEditorDisplayName
text
Body
Here's the grain of an algorithm. It certainly isn't flushed out or tested, and it may not be complete. I'm just throwing it out here as a possible starting point. It seems the most challenging issue is time required to run the algorithm over billions of rows, followed perhaps by memory limitations. I also believe the fundamental task involved in solving this problem lies in the single operation of "comparing one set of numbers with another" to locate a shared set. Therefore, might I suggest the following (rough) approach, in order to tackle both time, and memory: <pre><code>(1) Consolidate multiple sets into a single, larger set. </code></pre> i.e., take 100 consecutive sets (in your example, <code>23, 67, 34, 23, 54</code>, <code>23, 54</code>, <code>78, 96, 23</code>, and the following 97 sets), and simply merge them together into a single set (ignoring duplicates). <pre><code>(2) Give each *consolidated* set from (1) a label (or index), and then map this set (by its label) to the original sets that compose it. </code></pre> In this way, you will be able to retrieve (look up) the original individual sets <code>23, 67, 34, 23, 54</code>, etc. <pre><code>(3) The data is now denormalized - there are a much smaller number of sets, and each set is much larger. </code></pre> Now, the algorithm moves onto a new stage. <pre><code>(4) Develop an algorithm to look for matching sequences between any two of these larger sets. </code></pre> There will be many false positives; however, hopefully the nature of your data is that the false positives will not "ruin" the efficiency that is gained by this approach. I don't provide an algorithm to perform the matching between 2 individual sets here; I assume that you can come up with one yourself (sort both the sets, etc.). <pre><code>(5) For every possible matching sequence found in (4), iterate through the individual sets that compose the two larger sets being compared, weeding out false positives. </code></pre> I suspect that the above step could be optimized significantly, but this is the basic idea. At this point, you will have all of the matching sequences between all original sets that compose the two larger sets being compared. <pre><code>(6) Execute steps (4) and (5) for every pair of large sets constructed in (2). </code></pre> Now, you will have ALL matching sequences - with duplicates. <pre><code>(7) Remove duplicates from the set of matching sequences. </code></pre> Just a thought.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POGetting trends from raw data
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USDan Nissenbaum
UserOwnerUserId
1. USDan Nissenbaum
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTDownMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.