StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
3320965
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2010-07-23T17:53:34.383
FavoriteCount
0
LastActivityDate
2010-07-23T17:53:34.383
LastEditDate
LastEditorUserId
0
OwnerUserId
258625
ParentId
3320306
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
Data Storage: I would say a database is a good idea (sounds like the potential for a rather large data set). I don't know how many questions you plan on having but to help with simplifying the analysis (including your SQL queries) a bit you may want to group answers to similar questions in separate tables. And I would agree using a numerical value (byte 0-2) would be a good route to take instead of a boolean or something else. You are computing a similarity score so might as well start with numbers. Comparison: As far as the comparison itself, i would suggest creating an class SimilarQuestionAnswers that contains a list of bytes and a class UserAnswers that contains a list of these SimilarQuestionAnswers. What this does is it sets up your clusters for the cluster analysis method you mentioned. This way you can add weight to certain clusters. (cluster a is an important cluster so it's score is multiplied by 20 where as cluster b is not as important so its score is only multiplied by 10) This also allows you to apply different comparisons for each cluster if that is needed. I know you said the questions aren't related but you can still at least group questions by their importance. I think the sequence analysis will still work granted your similarity matrix will be all 1's so that kinda simplifies the problem a bit, but the rest of the math associated with that should be sufficient. Comparison Applied: This is where having the database back end comes in handy. Use SQL queries to minimize the dataset you are dealing with. If you are comparing one person with everyone else, you can use the SQL sum method on their answers to get a quick and dirty comparison within each cluster and pull only those within a certain threshold. This may result in some overlap but that can be eliminated easily. Another thought is to also have a table with each user and a column for each cluster with a comparison to a fake user that has answered true to each question. Then you could just query that table for a range around the current users scores for each cluster. This my be faster but less accurate. Either way in the end you will still need to do the comparison to each of the users you get from that query. So the faster you can make that comparison the better. Try to stick to a formula that involves only +,-,*,/ most of the Math.Whatever() methods can add a lot of time over a large number of calls. Sorry this was so long, most of the questions were pretty open ended and I had to assume a few details. I hope this helps.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow do I write a function to compare and rank many sets of boolean (true/false) answers?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USJack
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COThanks, some really useful ideas in there. I think there's potential in the idea of using a "fake user" or "control user" as a way of quickly comparing distance (similarity). However two users might have the same value of d (distance from control) yet have answered very differently. I think you might need to compare every user individually in order to build up a true comparison.
 singulars
 PostPostId
 PO
 UserUserId
 USgomezuk
2. COI agree that you still need to do a final comparison, I only meant for the control user comparison to be a rough cut to make your dataset you are doing the final comparison on smaller and more manageable. I assume no one user is really going to look at all n comparisons, probably just the top 5% if that.
 singulars
 PostPostId
 PO
 UserUserId
 USJack

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.