StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
2550383
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2010-03-31T03:48:47.560
FavoriteCount
0
LastActivityDate
2010-03-31T13:42:48.777
LastEditDate
2010-03-31T13:42:48.777
LastEditorUserId
23478
OwnerUserId
23478
ParentId
2550229
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
I think that from a big O standpoint, you have it implemented as good as it gets. The overriding cost is the sort, which is O(N log N). One possibility, though, would be to build up a new vector with the duplicate entries rather than use the existing vector with the delete operation removing the non-duplicates. However, this would only be better if the distinct number of duplicates is small relative to the total number of entries. Consider the extreme example. If the original array consisted of 1,000 entries with only one duplicate, then the output would be a vector with just one value. It might be a bit more efficient to create the new vector with one entry rather than deleting the other 999 entries from the original vector. I suspect, however, that in real world testing, the savings of that change could be difficult to measure. Edit I was just thinking about this in terms of "interview" question. In other words, this is not a terribly useful answer. But it would be possible to solve this in O(N) (linear time) as opposed to O(N Log N). Use storage space instead of CPU. Create two "bit" arrays with them cleared initially. Loop through your vector of integer values. Look up each value in the first bit array. If it is not set, then set the bit (set it to 1). If it is set, then set the corresponding bit in the second array (indicating a duplicate). After all vector entries are processed, scan through the second array and output the integers that are duplicates (indicated by the bits set in the second bit array). The reason for using bit arrays is just for space efficiency. If dealing with 4-byte integers, then the raw space required is <code>(2 * 2^32 / 8 )</code>. But this could actually be turned into a usable algorithm by making it a sparse array. The very pseudo pseudo-code would be something like this: <pre><code>bitarray1[infinite_size]; bitarray2[infinite_size]; clear/zero bitarrays // NOTE - do not need to sort the input foreach value in original vector { if ( bitarray1[value] ) // duplicate bitarray2[value] = 1 bitarray1[value] = 1 } // At this point, bitarray2 contains a 1 for all duplicate values. // Scan it and create the new vector with the answer for i = 0 to maxvalue if ( bitarray2[i] ) print/save/keep i </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to keep only duplicates efficiently?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USMark Wilkins
UserOwnerUserId
1. USMark Wilkins
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.