Note that there are some explanatory texts on larger screens.

plurals
  1. PO.net collection for fast filtering (sorted collection)
    primarykey
    data
    text
    <p>When profiling a very slow method, I discovered the lag is on searching and filtering of a collection.</p> <p>The method does the following (in order). According to profiler, 80% of time are spend on step 1-3.</p> <ol> <li>Read sorted collection from a file and deserialize using Protobuf-net (v2)</li> <li>From a sorted collection, filter based on a start and end integer (name <code>.RangeFromTo()</code>)</li> <li>From the same sorted collection, get the next element of the collection (name .<code>Right()</code>)</li> <li>Perform some task...</li> </ol> <p><code>.RangeFromTo()</code> filters for a given range, for example:</p> <pre><code>[3,7,9,12].RangeFromTo(2,9) -&gt; [3,7,9] [3,7,9,12].RangeFromTo(2,8) -&gt; [3,7] [3,7,9,12].RangeFromTo(7,13) -&gt; [7,9,12] [3,7,9,12].RangeFromTo(13,14) -&gt; [ ] </code></pre> <p><code>.Right()</code> finds an element in the collection and gives you the next on in the list. If the element doesn't exist it gives you the closest one counting to the right. For example:</p> <pre><code>[3,7,9,12].Right(0) -&gt; 3 [3,7,9,12].Right(3) -&gt; 7 [3,7,9,12].Right(4) -&gt; 7 [3,7,9,12].Right(12) -&gt; null </code></pre> <p>Currently the collection is using <code>SortedArray</code> from C5 (<a href="https://github.com/sestoft/C5/" rel="nofollow">https://github.com/sestoft/C5/</a>). Is there a more suitable collection that I can use?</p> <p>Note: Step 1. takes around 30% of the total time. If I use a List instead, protobuf actually takes 40% less time deserializing! I guess when inserting into an SortedArray the collection doesn't know the data is already sorted and is doing a whole bunch of work. The ideal collection (if exist) should also be able to bypass that.</p> <p><strong>Edit:</strong> To clarify, the list are around 1000-5000 and there are 90k different collections! The method in question needs to load all the collections in memory to perform some business task. </p> <p><strong>Edit 2:</strong> I have added some sample benchmark here:</p> <p><a href="https://github.com/cchanpromys/so_19188345" rel="nofollow">https://github.com/cchanpromys/so_19188345</a></p> <p>It compares <code>SortedArray</code> from C5 with <code>SortedSet</code> from .Net. So far the results are as follows:</p> <pre><code>C5 sorted array deserialize took 904 Sorted set deserialize took 1040 C5 sorted array .Right() took 5 Sorted set .Right() took 798 &lt;--- Not sure what happened here... C5 sorted array .RangeFromTo() took 217 Sorted set .RangeFromTo() took 140 </code></pre> <p><strong>Edit 3</strong> This is out of my expectations but I ended up with a custom implementation of a list. </p> <p>The problem that I had is that SortedArray's Find operation (in general) takes O(Log(N)), while I want it to be an O(1) operation.</p> <p>Also, the list is sorted by nature, you will never add to the middle of the list.</p> <p>So I ended up implementing a list that has an internal indexer array, for example:</p> <p>For example:</p> <pre><code>indexer: [0,0,0,0,1,1,1,1,2,2] list: [3,7,9] </code></pre> <p>So <code>.Right(3)</code> would be <code>list[indexer[3]++]</code>.</p> <p>The code can be found <a href="https://github.com/cchanpromys/so_19188345/blob/master/CollectionTest/Collection/SortedIndexedArray.cs" rel="nofollow">here</a>.</p> <p>It is hard to believe that this type of list is not already implemented somewhere on the internet. If possible, I would like to use a library so I won't have to manage my own list. </p> <p>Do such implementation exist on the internet?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload