Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to speed up search in a huge dictionary
    primarykey
    data
    text
    <p>I have a very huge dictionary and the content inside it looks like:<br> (headers not included in the dictionary)</p> <pre><code>(code) (names) ------------------------------ 910235487 Diabetes, tumors, sugar sick, ..... </code></pre> <p>I have more than 150K lines of this kind of pairs in the dictionary.</p> <p><strong>The user input is key words (diagnosis names), I cannot search the dictionary by keys.</strong></p> <p><strong>Here is the code:</strong></p> <pre><code>var relevantIDs = this.dic.Where(ele =&gt; ele.Value.Contains(keyword)).Select(n =&gt; Convert.ToUInt64(n.Key)); </code></pre> <p>The dictionary is <code>Dictionary&lt;string, string&gt;</code> and I have to use string as the data type of key, because the codes can sometimes contain characters. The names column contains a list of relevant diagnosis names. So I cannot change this data type either.</p> <p>I think the problem is for each value of a pair, I did the <code>Contains</code> operation which slows down the who process, but I cannot find an alternative way to do so...</p> <p>This is what I did in order to find the matched codes.<br> But the performance of this code is terrible (it takes around 5 mins to finish this single line of code).</p> <p>Can someone help?</p> <hr> <p><strong>Update and simplest solution:</strong></p> <p>I finally found the season why the search is so slow, and solved it by doing so:</p> <pre><code>var relevantStringIDs = this.dic.Where(ele =&gt; ele.Value.Contains(keyword)).Tolist(); var relevantUlongIDs = relevantStringIDs.Select(n =&gt; Convert.ToUInt64(n.Key)).Tolist(); </code></pre> <p>The reason why it was that slow is <code>this.dic.Where(ele =&gt; ele.Value.Contains(keyword))</code>, it will be executed every time whenever the second part of the query is executed (this is the feature of <code>IEnumberable&lt;T&gt;</code>, I forget the term for it (maybe delayed execution)). So I use <code>ToList()</code> to convert the delayed query to a concrete list in the memory so that the result can be reused when converting strings to <code>ulongs</code>, rather than execute the query again for each conversion.<br> Please correct me if you found something wrong in this explanation.</p> <p>By the way, although this may not be the best solution but the performance of the changed code is quiet satisfactory. The first statement of the code only costs 169 ms which is quick enough for me.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload