StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POExtract first N unique integers from an Array
primarykey
Id
666528
data
AcceptedAnswerId
666569
AnswerCount
8
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2009-03-20T15:02:12.993
FavoriteCount
0
LastActivityDate
2009-03-24T15:48:27.137
LastEditDate
2009-03-24T15:48:27.137
LastEditorUserId
15955
OwnerUserId
15955
ParentId
0
PostTypeId
1
Score
3
ViewCount
2270
LastEditorDisplayName
Nils Pipenbrinck
text
Body
I have a large list of integers (thousands), and I want to extract the first N (in the order of 10-20) unique elements from it. Each integer in the list occurs roughly three times. Writing an algorithm to do this is trivial, but I wonder what's the most speed and memory efficient way to do it. There are some additional constraints and informations in my case: <ul> <li>In my use-case I extract my uniques multiple times on the array, each time skipping some elements from the beginning. The amount of elements that I skip is not known during unique-extraction. I don't even have a upper bound. Therefore sorting is not speed efficient (I have to preserve the order of the array).</li> <li>The integers are all over the place, so a bit-array as a lookup solution is not feasible.</li> <li>I want to avoid temporary allocations during the search at all costs.</li> </ul> My current solution looks roughly like this: <pre><code> int num_uniques = 0; int uniques[16]; int startpos = 0; while ((num_uniques != N) && (start_pos < array_length)) { // a temporary used later. int insert_position; // Get next element. int element = array[startpos++]; // check if the element exist. If the element is not found // return the position where it could be inserted while keeping // the array sorted. if (!binary_search (uniques, element, num_uniques, &insert_position)) { // insert the new unique element while preserving // the order of the array. insert_into_array (uniques, element, insert_position); uniques++; } } </code></pre> The binary_search / insert into array algorithm gets the job done, but the performance is not great. The insert_into_array call moves elements around a lot, and this slows everythign down. Any ideas? <hr> EDIT Great answers, guys! Everyone deserves an accepted answer, but I can give only one. I'll implement a bunch of your ideas and do a performance-shootout with some typical data. The one with the idea that lead to the quickest implementation get's the accepted answer. I'll run the code on a modern PC and a embedded CortexA8-CPU and I'll weight the results somehow. Will post the results as well. <hr> EDIT: Results of the shoot-out Timings on a Core-Duo, 100 iterations over a 160kb test-dataset. <pre><code>Bruteforce (Pete): 203 ticks Hash and Bruteforce (Antti): 219 ticks Inplace Binary Tree (Steven): 390 ticks Binary-Search (Nils): 438 ticks </code></pre> <a href="http://torus.untergrund.net/code/unique_search_shootout.zip" rel="nofollow noreferrer">http://torus.untergrund.net/code/unique_search_shootout.zip</a> (C-source and testdata) Additional remarks: <ul> <li>The Inplace Binary Tree absolutely rocks for true random distributions (my test-data has a tendency to be ascending). </li> <li>The Binary-Search works very well on my testdata for more than 32 uniques. It performs almost linear.</li> </ul>
Tags
<algorithm><sorting><search>
Title
Extract first N unique integers from an Array
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USNils Pipenbrinck
UserOwnerUserId
1. USNils Pipenbrinck
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POExtract first N unique integers from an Array
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POExtract first N unique integers from an Array
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POExtract first N unique integers from an Array
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.