Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to improve hashing for short strings to avoid collisions?
    primarykey
    data
    text
    <p>I am having a problem with hash collisions using short strings in .NET4.<br> <strong>EDIT:</strong> I am using the <em>built-in</em> string hashing function in .NET.</p> <p>I'm implementing a cache using objects that store the direction of a conversion like this<br></p> <pre><code>public class MyClass { private string _from; private string _to; // More code here.... public MyClass(string from, string to) { this._from = from; this._to = to; } public override int GetHashCode() { return string.Concat(this._from, this._to).GetHashCode(); } public bool Equals(MyClass other) { return this.To == other.To &amp;&amp; this.From == other.From; } public override bool Equals(object obj) { if (obj == null) return false; if (this.GetType() != obj.GetType()) return false; return Equals(obj as MyClass); } } </code></pre> <p>This is direction dependent and the <code>from</code> and <code>to</code> are represented by short strings like "AAB" and "ABA".</p> <p>I am getting sparse hash collisions with these small strings, I have tried something simple like adding a salt (did not work). </p> <p><strong>The problem</strong> is that too many of my small strings like "AABABA" collides its hash with the reverse of "ABAAAB" (Note that these are not real examples, I have no idea if AAB and ABA actually cause collisions!)</p> <p>and I have gone heavy duty like implementing MD5 (which works, but is MUCH slower)</p> <p>I have also implemented the suggestion from Jon Skeet here:<br> <a href="https://stackoverflow.com/questions/957588/should-i-use-a-concatenation-of-my-string-fields-as-a-hash-code">Should I use a concatenation of my string fields as a hash code?</a> This works but I don't know how dependable it is with my various 3-character strings.</p> <p>How can I improve and stabilize the hashing of small strings without adding too much overhead like MD5?</p> <p><strong>EDIT:</strong> In response to a few of the answers posted... the cache is implemented using concurrent dictionaries keyed from <code>MyClass</code> as stubbed out above. If I replace the <code>GetHashCode</code> in the code above with something simple like @JonSkeet 's code from the link I posted:</p> <pre><code>int hash = 17; hash = hash * 23 + this._from.GetHashCode(); hash = hash * 23 + this._to.GetHashCode(); return hash; </code></pre> <p>Everything functions as expected. It's also worth noting that in this particular use-case the cache is not used in a multi-threaded environment so there is no race condition. </p> <p><strong>EDIT:</strong> I should also note that this misbehavior is platform dependant. It works as intended on my fully updated Win7x64 machine but does not behave properly on a non-updated Win7x64 machine. I don't know the extend of what updates are missing but I know it doesn't have Win7 SP1... so I would assume there may also be a framework SP or update it's missing as well. </p> <p><strong>EDIT:</strong> As susggested, my issue was not caused by a problem with the hashing function. I had an elusive race condition, which is why it worked on some computers but not others and also why a "slower" hashing method made things work properly. The answer I selected was the most useful in understanding why my problem <em>was not</em> hash collisions in the dictionary.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload