Note that there are some explanatory texts on larger screens.

plurals
  1. POc# SortedList<string, TValue>.ContainsKey for successfully added key returns false
    text
    copied!<p><strong>CHECK UPDATE 3 below</strong> I found out the issue I ran into is related to a known serious problem with c# string comparers for .Net 4.0, 4.0 client and 4.5, that will lead to inconsistent sort order of lists of strings (causing the output to depend on the order in the input and the sort algorithm used). The problem was reported to Microsoft in Dec 2012 and closed as "won't be fixed". A work around is available, but is so much slower that it is hardly practical for large collections.</p> <p>While implementing an immutable PatriciaTrie, I wanted to compare its performance with a System.Collections.Generic.SortedList. I used the following file <a href="https://github.com/rkapsi/patricia-trie/blob/master/src/test/resources/org/ardverk/collection/hamlet.txt" rel="noreferrer">https://github.com/rkapsi/patricia-trie/blob/master/src/test/resources/org/ardverk/collection/hamlet.txt</a> to create an input wordlist for testing.</p> <p>When inserting each of the words in the c# SortedList, using either <code>Comparer&lt;string&gt;.Default</code> or <code>StringComparer.InvariantCulture</code> as the key comparer, a number of the entries that are successfully inserted cannot be retrieved using the normal search methods (e.g. <code>ContainsKey</code> returns false) but the key is present in the list as is observed by iterating the list. </p> <p>Even more curious, the comparer returns the value '0' when comparing the key retrieved from the sorted list with the search key that could not be found using <code>ContainsKey</code>.</p> <p>The complete example below demonstrates this issue on my system. </p> <pre><code>using System; using System.IO; using System.Linq; using System.Collections.Generic; class Program { static void Main(string[] args) { // the problem is possibly related to comparison. var fail = true; var comparer = fail ? StringComparer.InvariantCulture : StringComparer.Ordinal; // read hamlet (contains duplicate words) var words = File .ReadAllLines("hamlet.txt") .SelectMany(l =&gt; l.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries)) .Select(w =&gt; w.Trim()) .Where(w =&gt; !string.IsNullOrEmpty(w)) .Distinct(comparer) .ToArray(); // insert hamlet's words in the sorted list. var list = new SortedList&lt;string, int&gt;(comparer); var ndx = 0; foreach (var word in words) list[word] = ndx++; // search for each of the added words. foreach (var keyToSearch in words) { if (!list.ContainsKey(keyToSearch)) { // was inserted, but cannot be retrieved. Console.WriteLine("Error - Key not found: \"{0}\"", keyToSearch); // however when we iterate over the list, we see that the entry is present var prefix = keyToSearch.Substring(0, Math.Min(keyToSearch.Length, 3)); foreach (var wordCloseToSearchKey in list.Keys.Where(s =&gt; s.StartsWith(prefix))) { // and using the SortedList's supplied comparison returns 0, signaling equality var comparisonResult = list.Comparer.Compare(wordCloseToSearchKey, keyToSearch); Console.WriteLine("{0} - comparison result = {1}", wordCloseToSearchKey, comparisonResult); } } } // Check that sort order of List.Keys is correct var keys = list.Keys.ToArray(); BinarySearchAll("list.Keys", keys, list.Comparer); CheckCorrectSortOrder("list.Keys", keys, list.Comparer); // Check that sort order of Array.Sort(List.Keys) is correct var arraySortedKeys = CopySortSearchAndCheck("Array.Sort(List.Keys)", keys, list.Comparer); // Check that sort order of the Array.Sort(input words) is correct var sortedInput = CopySortSearchAndCheck("Array.Sort(input words)", words, list.Comparer); Console.ReadLine(); } static string[] CopySortSearchAndCheck(string arrayDesc, string[] input, IComparer&lt;string&gt; comparer) { // copy input var sortedInput = new string[input.Length]; Array.Copy(input, sortedInput, sortedInput.Length); // sort it Array.Sort(sortedInput, comparer); // check that we can actually find the keys in the array using bin. search BinarySearchAll(arrayDesc, sortedInput, comparer); // check that sort order is correct CheckCorrectSortOrder(arrayDesc, sortedInput, comparer); return sortedInput; } static void BinarySearchAll(string arrayDesc, string[] sortedInput, IComparer&lt;string&gt; comparer) { // check that each key in the input can be found using bin. search foreach (var word in sortedInput) { var ix = Array.BinarySearch(sortedInput, word, comparer); if (ix &lt; 0) // and it appears it cannot! Console.WriteLine("Error - {0} - Key not found: \"{1}\"", arrayDesc, word); } } static void CheckCorrectSortOrder(string arrayDesc, string[] sortedKeys, IComparer&lt;string&gt; comparer) { for (int n = 0; n &lt; sortedKeys.Length; n++) { for (int up = n + 1; up &lt; sortedKeys.Length; up++) { var cmp = comparer.Compare(sortedKeys[n], sortedKeys[up]); if (cmp &gt;= 0) { Console.WriteLine( "{0}[{1}] = \"{2}\" not &lt; than {0}[{3}] = \"{4}\" - cmp = {5}", arrayDesc, n, sortedKeys[n], up, sortedKeys[up], cmp); } } for (int down = n - 1; down &gt; 0; down--) { var cmp = comparer.Compare(sortedKeys[n], sortedKeys[down]); if (cmp &lt;= 0) { Console.WriteLine( "{0}[{1}] = \"{2}\" not &gt; than {0}[{3}] = \"{4}\" - cmp = {5}", arrayDesc, n, sortedKeys[n], down, sortedKeys[down], cmp); } } } } } </code></pre> <p>Does anyone have an explanation for this unexpected and odd behaviour?</p> <p>When changing the comparer used by the SortedList to <code>StringComparer.Ordinal</code> (by e.g. changing <code>fail</code> to <code>false</code> in the above example) the problem disappears, which seems to point to a comparison issue, but I don't quite understand why.</p> <p><strong>UPDATED</strong> As noted by Sébastien the issue described here does not show up on .Net 3.5 and 3.5 client profiles. It does on .Net 4.0, 4.0 client and 4.5.</p> <p>After some more digging, I noticed that if I take the sorted keys from the list and I run <code>Array.BinarySearch</code> on those keys, it also returns negative (not found) values for the same keys that are not found using <code>SortedList.ContainsKey</code>. Thus this would suggest that the sort order of the keys is not correct. </p> <p>If I take the already sorted keys from the list and sort them using <code>Array.Sort</code> the sort order of the output is different for the keys that were problematic.</p> <p>So I added a function to check (using the list's comparer) if the sort order of a given array is correct (i.e. a preceding key is always smaller, a succeeding key is always larger) and restricted the input to words that are distinct according to the comparer. I applied this function on 3 different inputs (all using the same comparer):</p> <ol> <li>the SortedList's Keys collection.</li> <li>the output of Array.Sort on these keys.</li> <li>the output of Array.Sort on the input from the file.</li> </ol> <p>The output of (2) and (3) is identical and different from (1). However performing Array.BinarySearch on the Array.Sort outputs of (2) and (3) again fails to find the same keys (returning &lt; 0). Also the function that checks for correct sort order indicates that for all 3 cases, the sort order around the involved problematic keys is not correct.</p> <p>At this point I am just hoping I did something incredibly stupid and there is a simple explanation. Hopefully someone can point that out to me.</p> <p>The example code is updated with my additional troubleshooting experiments, a screenshot of the output can be found here <a href="http://imgur.com/DU8SCsA" rel="noreferrer">http://imgur.com/DU8SCsA</a>.</p> <p><strong>UPDATE 2</strong> Ok, I have narrowed the problem down to what seems to me like a very serious problem with c# string comparers introduced as of .Net 4.0.</p> <p>In summary, suppose we have 3 values: a1, a2 and a3. For any kind of sorting to work correctly, we would expect that if <code>a1 &lt; a2</code> and <code>a2 &lt; a3</code> that in order for comparison to be consistent, as a consequence the following also holds: <code>a1 &lt; a3</code>. </p> <p>This is however not the case with the c# string comparers (at least for <code>Comparer&lt;string&gt;.Default</code> and <code>StringComparer.InvariantCulture</code>).</p> <p>The little program below illustrates this exact problem:</p> <pre><code>class Program { static void Main(string[] args) { var comparer = StringComparer.InvariantCulture; var a1 = "A"; var a2 = "a\'"; var a3 = "\'a"; PrintComparison("a1", a1, "a2", a2, comparer); PrintComparison("a2", a2, "a3", a3, comparer); PrintComparison("a1", a1, "a3", a3, comparer); Console.ReadLine(); } public static void PrintComparison(string firstSymbol, string first, string secondSymbol, string second, IComparer&lt;string&gt; comparer) { var cmp = comparer.Compare(first, second); var result = cmp == 0 ? "=" : cmp &lt; 0 ? "&lt;" : "&gt;"; Console.WriteLine("{0} {1} {2} ({3} {1} {4})", firstSymbol, result, secondSymbol, first, second); } } </code></pre> <p>This is its output:</p> <pre><code>a1 &lt; a2 (A &lt; a') a2 &lt; a3 (a' &lt; 'a) a1 &gt; a3 (A &gt; 'a) </code></pre> <p>The conclusion would seem to be that it is not safe to rely on sort order determined using c# string comperators, or am I missing something?</p> <p><strong>UPDATE 3</strong> This issue seems to have been reported to MS in December 2012, and is closed with status "won't be fixed", which is rather disappointing; refer to link posted in comments below (it appears I can't post in here due to my limited reputation points). This also lists a workaround, that I have implemented and used to verify that this indeed resolves the problems observed with the standard comparers. </p> <pre><code>public class WorkAroundStringComparer : StringComparer { private static readonly Func&lt;CompareInfo, string, CompareOptions, int&gt; _getHashCodeOfString; private readonly CompareInfo _compareInfo; private readonly CompareOptions _compareOptions; static WorkAroundStringComparer() { // Need this internal method to compute hashcode // as an IEqualityComparer implementation. _getHashCodeOfString = BuildGetHashCodeOfStringDelegate(); } static Func&lt;CompareInfo, string, CompareOptions, int&gt; BuildGetHashCodeOfStringDelegate() { var compareInfoType = typeof(CompareInfo); var argTypes = new[] { typeof(string), typeof(CompareOptions) }; var flags = BindingFlags.NonPublic | BindingFlags.Instance; var methods = compareInfoType.GetMethods(flags).ToArray(); ; var method = compareInfoType.GetMethod("GetHashCodeOfString", flags, null, argTypes, null); var instance = Expression.Parameter(compareInfoType, "instance"); var stringArg = Expression.Parameter(typeof(string), "string"); var optionsArg = Expression.Parameter(typeof(CompareOptions), "options"); var methodCall = Expression.Call(instance, method, stringArg, optionsArg); var expr = Expression.Lambda&lt;Func&lt;CompareInfo, string, CompareOptions, int&gt;&gt;(methodCall, instance, stringArg, optionsArg); return expr.Compile(); } public WorkAroundStringComparer() : this(CultureInfo.InvariantCulture) { } public WorkAroundStringComparer(CultureInfo cultureInfo, CompareOptions compareOptions = CompareOptions.None) { if (cultureInfo == null) throw new ArgumentNullException("cultureInfo"); this._compareInfo = cultureInfo.CompareInfo; this._compareOptions = compareOptions; } public override int Compare(string x, string y) { if (ReferenceEquals(x, y)) return 0; if (ReferenceEquals(x, null)) return -1; if (ReferenceEquals(y, null)) return 1; var sortKeyFor_x = _compareInfo.GetSortKey(x, _compareOptions); var sortKeyFor_y = _compareInfo.GetSortKey(y, _compareOptions); return SortKey.Compare(sortKeyFor_x, sortKeyFor_y); } public override bool Equals(string x, string y) { return Compare(x, y) == 0; } public override int GetHashCode(string obj) { return _getHashCodeOfString(_compareInfo, obj, _compareOptions); } } </code></pre> <p>The problem with this workaround is that it is hardly practical for sizable collections, because it is an order of magnitude slower than e.g. <code>StringComparer.InvariantCulture</code>.</p> <p>Time taken when sorting the given word list 1000 times using both comparers:</p> <pre><code>StringComparer.InvariantCulture : 00:00:15.3120013 WorkAroundStringComparer : 00:01:35.8322409 </code></pre> <p>So I am still hoping that either Microsoft reconsiders or someone knows a viable alternative. Otherwise the only option that remains is falling back on using <code>StringComparer.Ordinal</code>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload