Note that there are some explanatory texts on larger screens.

plurals
  1. POHash a Sequence of positive/negative integers
    primarykey
    data
    text
    <p>I have a file with millions of lines (actually it's an online stream of data, which means we are receiving it line by line) , each line consists of an array of integers which is not sorted (positive and negative) there's no limit for the each number and the lengths are different and we might have duplicate values in one line,</p> <p>I want to remove the <code>duplicate lines</code> (if 2 lines have same values regardless of how they are ordered we consider them duplicate), is there any good hashing function ? </p> <p>We want to do this in <code>O(n)</code> while <code>n</code> is number of lines (we can assume that the maximum possibele number of elements in each line is constant, e.g. we have maximum of 100 elements in each line)</p> <p>I've read some of the questions posted here in stackoverflow and I also googled it, most of them were for the cases where the arrays are of the same length or the integers are positive or even or they are sorted, is there any way to solve this in general case ?</p> <p>My solution: First we sort each line with the use of <code>O(n)</code> sorting algorithm e.g. <code>Counting sort</code> , then we put them into a string and then we use <code>md5</code> hashing to put them into a hashset. If it's not in the set we put it into that set, if it's already in the list we check the arrays with the same hash value.</p> <p>Problem with the solution : sorting using the <code>Counting Sort</code> takes a lot of space as there's no limit for the numbers and the collisions are possible .</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload