StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
15729001
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-03-31T11:39:43.710
FavoriteCount
0
LastActivityDate
2013-03-31T14:18:52.397
LastEditDate
2013-03-31T14:18:52.397
LastEditorUserId
2065432
OwnerUserId
2065432
ParentId
15728266
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
Let's say that you have N vectors of length K, and there are only M unique of them. <ul> <li>Hashing + hashmap</li> </ul> You can calculate the hash of every vector in O(K) time, check whether you already have such a vector in your hashmap and inserting new vector in O(1) time both. For hash function you can simply use polinomial hash without modulus, just storing hashes in 64-bit type and ignoring overflows. Implementation is very simple and it will work in O(N*K) time requiring O(M*K) memory. If you need to sort the elements first, the time will be O(N*K*log(K)) <ul> <li>Radix tree</li> </ul> I think you should not use radix tree here because you will still need to look through each element of each vector. That is so because if you don't have such a vector in a tree you'll need to insert all of its elements, and if you have such a vector you'll need to go down to the leaf of the tree to see that you have really inserted such a vector before. So the asymptotiсs remain the same, but you'll need to implement the tree by yourself and it is not a very good idea :) <hr> Looks like it is easy to show that you need at least to read all the elements of vectors. That is so because in every moment you have two possibilities - you have found current vector before and you need to read all its elements to the end to identify it, or you haven't found current vector before and you need to read all its elements to sort and save them. Yet if vectors were already sorted, you will need to read elements only to the first mismatch. But lets imagine that first 30000 vectors were unique, then you'll need to read all others vectors to the end to determine that they are not unique, no matter what algorithm or data structure you'll use. And finally we get that you need to read almost all the vectors to the end :) If your values are really in range (-100, 100) and there are only 30 values in vector, you can notice that such vector can be saved in four 64-bit integers because you have only <code>8*30 = 240</code> bits of data in it. But it is just another idea to play with, and I don't think that any implementation using it will work faster than hashing + hashmap.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POFast ways to remove duplicates from a list of integer vectors
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USMikhail Melnik
UserOwnerUserId
1. USMikhail Melnik
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COI agree that most of the time I will have to do 30 comparisons to find that the vector was already there. However, computing the hash function takes a lot longer than 30 comparisons. In my experiments the trie is by far the fastest.
 singulars
 PostPostId
 PO
 UserUserId
 USThomas
2. COWow. Hashing requires one addition and one multiplication per symbol. When working with the trie requires some work with pointers, unless you implement your trie with the array. So you say that one addition and one multiplication is faster than getting something from the memory?
 singulars
 PostPostId
 PO
 UserUserId
 USMikhail Melnik

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.