Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Let's say that you have N vectors of length K, and there are only M unique of them. </p> <ul> <li>Hashing + hashmap</li> </ul> <p>You can calculate the hash of every vector in O(K) time, check whether you already have such a vector in your hashmap and inserting new vector in O(1) time both. For hash function you can simply use polinomial hash without modulus, just storing hashes in 64-bit type and ignoring overflows. Implementation is very simple and it will work in O(N*K) time requiring O(M*K) memory. If you need to sort the elements first, the time will be O(N*K*log(K))</p> <ul> <li>Radix tree</li> </ul> <p>I think you should not use radix tree here because you will still need to look through each element of each vector. That is so because if you don't have such a vector in a tree you'll need to insert all of its elements, and if you have such a vector you'll need to go down to the leaf of the tree to see that you have really inserted such a vector before. So the asymptotiсs remain the same, but you'll need to implement the tree by yourself and it is not a very good idea :)</p> <hr> <p>Looks like it is easy to show that you need at least to read all the elements of vectors. That is so because in every moment you have two possibilities - you have found current vector before and you need to read all its elements to the end to identify it, or you haven't found current vector before and you need to read all its elements to sort and save them. Yet if vectors were already sorted, you will need to read elements only to the first mismatch. But lets imagine that first 30000 vectors were unique, then you'll need to read all others vectors to the end to determine that they are not unique, no matter what algorithm or data structure you'll use. And finally we get that you need to read almost all the vectors to the end :)</p> <p>If your values are really in range (-100, 100) and there are only 30 values in vector, you can notice that such vector can be saved in four 64-bit integers because you have only <code>8*30 = 240</code> bits of data in it. But it is just another idea to play with, and I don't think that any implementation using it will work faster than hashing + hashmap.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload