Note that there are some explanatory texts on larger screens.

plurals
  1. PORegarding the example in Oracle's java online tutorial that uses HashMap to store anagram
    primarykey
    data
    text
    <p>I was reading the example in Oracle's online java tutorial that uses HashMap to store anagrams:</p> <pre><code> // Read words from file and put into a simulated multimap Map&lt;String, List&lt;String&gt;&gt; m = new HashMap&lt;String, List&lt;String&gt;&gt;(); try { Scanner s = new Scanner(new File(args[0])); while (s.hasNext()) { String word = s.next(); String alpha = alphabetize(word); List&lt;String&gt; l = m.get(alpha); if (l == null) m.put(alpha, l=new ArrayList&lt;String&gt;()); l.add(word); } } catch (IOException e) { System.err.println(e); System.exit(1); } // Print all permutation groups above size threshold for (List&lt;String&gt; l : m.values()) if (l.size() &gt;= minGroupSize) System.out.println(l.size() + ": " + l); } private static String alphabetize(String s) { char[] a = s.toCharArray(); Arrays.sort(a); return new String(a); } </code></pre> <p>}</p> <p>Since HashMap is implemented with Hash Table, I think that each sorted alphabetized string should have a unique hash code after compression(otherwise the linked list that stores the values in the HashMap will store a value that's not a anagram of the sorted alphabetized string).</p> <p>I am not exactly sure how Java's implementation of HashMap can satisfy this - I assume that they use the hash code of a string (a1*31^n-1 + a2*31^n-2 + ... + an). This might guarantee the uniqueness of the hash code if we are talking about strings with only lower case chars. However, one also has to compress the hash code before putting the value of the key to the bucket in the hash table (otherwise you would have a hugggggge hash table that can't be handled in memory, just thinking of how big 31^10 is). Among this compression, I would think that there will be collision. In other words, two different strings that are not true anagram will end up being stored in the same bucket (which should be only used to store a list of true anagrams)...</p> <p>Could anyone help me to understand what I might be missing? Or if the online tutorial is missing something? </p> <p>thanks!</p> <p>Jason</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COhttp://stackoverflow.com/questions/4980757/how-do-hashtables-deal-with-collisions and in particular: http://stackoverflow.com/a/4980796/829571
      singulars
    2. COThanks for pointing me to that post. I can understand that hash code is used to locate bucket and the equals() method will be used later to retrieve the correct entry. However, I can't image how to store multiple nodes in a linked list without a next node pointer. If there are 2+ nodes in a linked list without a next node pointer, then the list is no longer "linking" all the nodes and thus that shouldn't be a linked list, right?
      singulars
    3. COThe other question after reading that explanation of how HashMap works is that, for two strings (non anagrams) with same hash code, the example provided in the java tutorial is going to store them in the same bucket. So you are saving something like "abc", "cab" and "def" into the same linked list. Although later you may be able to just print out "abc" and "cab", the "def" is stored at the wrong place and another bucket that expected this "def" is missing the anagram entry. Not sure how this will be addresses by the hashMap implementation. Or maybe that's just the bug from the example?
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload