Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>OK, I see the definition of the Extent class. First of all, it violates the rule that if <code>obj1.Equals(obj2)==true</code> then <code>obj1.GetHashCode()==obj2.GetHashCode()</code>. But that's besides the point and can be fixed (if you won't, the algorithms which depend on hashing, like a <code>HashSet</code> will fail).</p> <p>Now, if the only operation that one can do on the Extent object is to compare for equality, then it will not be possible to get the worst case performance above <strong>O(N*M)</strong> (where N is the size of first collection, and M is the size of the second collection). That's because you will ultimately have to compare every element with every element.</p> <p>This can be made better by the use of <code>GetHashCode()</code> and the fact that objects with different hash codes will also be different themselves. Other people have suggested to use the <code>HashSet</code> class, that would be such a solution. The best case performance in this case would be <strong>O(N+M)</strong>, and the worst case - <strong>O(N+N*M)</strong>. On average though you should win, unless the <code>GetHashCode()</code> method is very poorly implemented and returns the same hash codes for many objects.</p> <p>I myself prefer a more stable solution. If the extent class could be sorted reliably (that is, if you could compare two Extent objects to see which one was bigger and which one was smaller), then you could sort both lists and the performance could be brought down to <strong>O(sorting+M+N)</strong>. The idea is that when the lists are sorted, you can go through them both <em>simultaneously</em> and look for equal elements there.</p> <p>Now the sorting performance is the tricky thing here. If you only implement the comparison operation (as in, the <code>IComparable</code> interface), you will be able to sort both lists in time <strong>O(N*logN+M*logM)</strong>. The standard <code>List.Sort()</code> method should do that for you. All in all, the total performance would be <strong>O(N*logN+M*logM+N+M)</strong>. You should note however that this uses the QuickSort algorithm which performs poorly on nearly-sorted lists. The worst case is a completely sorted list in which case it is <strong>O(N*M)</strong>. If your lists are close to being sorted already, you should consider another sorting algorithm (and implement it yourself).</p> <p>The ultimate in reliable speed would be if you could convert each Extent to an integer (or more generally, some string) with the property that if the strings are equal, the Extents are equal as well, and if the strings are not equal, then the Extents are not equal either. The thing with strings is that they can be sorted in linear time with algorithms like <a href="http://en.wikipedia.org/wiki/Radix_sort" rel="nofollow noreferrer">radix sort</a>, <a href="http://en.wikipedia.org/wiki/Radix_tree" rel="nofollow noreferrer">radix tree</a>, etc. Then the sorting would take only the time of <strong>O(N+M)</strong>. In fact, if you constructed a Radix tree, you would only have to sort the first list and you could search for strings in it directly (with every search taking <strong>O(1)</strong> time). All in all, the total performance would be <strong>O(N+M)</strong> which is the best available.</p> <p>One thing you should always keep in mind though - big algorithms have big constants. The radix approach might look the best on paper, but it will be quite tricky to implement and generally slower than the simpler approaches for small amounts of data. Only if your lists have elements in the ranges of thousands and tens of thousands should you start to think about this. Also, these algorithms require to create a lot of new objects and the cost of each <code>new()</code> operation becomes significant as well. You should think carefully to minimize the number of allocations required.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload