Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I think the main point to understand here is the distinction between <code>String</code> Java object and its contents - <code>char[]</code> under <a href="http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/lang/String.java#String.0value" rel="noreferrer">private <code>value</code> field</a>. <code>String</code> is basically a wrapper around <code>char[]</code> array, encapsulating it and making it impossible to modify so the <code>String</code> can remain immutable. Also the <code>String</code> class remembers which parts of this array is actually used (see below). This all means that you can have two different <code>String</code> objects (quite lightweight) pointing to the same <code>char[]</code>.</p> <p>I will show you few examples, together with <code>hashCode()</code> of each <code>String</code> and <code>hashCode()</code> of internal <code>char[] value</code> field (I will call it <em>text</em> to distinguish it from string). Finally I'll show <code>javap -c -verbose</code> output, together with constant pool for my test class. Please do not confuse class constant pool with string literal pool. They are not quite the same. See also <a href="https://stackoverflow.com/questions/5546280">Understanding javap&#39;s output for the Constant Pool</a>.</p> <h1><em>Prerequisites</em></h1> <p>For the purpose of testing I created such a utility method that breaks <code>String</code> encapsulation:</p> <pre><code>private int showInternalCharArrayHashCode(String s) { final Field value = String.class.getDeclaredField("value"); value.setAccessible(true); return value.get(s).hashCode(); } </code></pre> <p>It will print <code>hashCode()</code> of <code>char[] value</code>, effectively helping us understand whether this particular <code>String</code> points to the same <code>char[]</code> text or not.</p> <h1>Two string literals in a class</h1> <p>Let's start from the simplest example.</p> <h2>Java code</h2> <pre><code>String one = "abc"; String two = "abc"; </code></pre> <p>BTW if you simply write <code>"ab" + "c"</code>, Java compiler will perform concatenation at compile time and the generated code will be exactly the same. This only works if all strings are known at compile time.</p> <h2>Class constant pool</h2> <p>Each class has its own <a href="http://en.wikipedia.org/wiki/Java_class_file#The_constant_pool" rel="noreferrer">constant pool</a> - a list of constant values that can be reused if they occur several times in the source code. It includes common strings, numbers, method names, etc.</p> <p>Here are the contents of the constant pool in our example above.</p> <pre><code>const #2 = String #38; // abc //... const #38 = Asciz abc; </code></pre> <p>The important thing to note is the distinction between <code>String</code> constant object (<code>#2</code>) and Unicode encoded text <code>"abc"</code> (<code>#38</code>) that the string points to.</p> <h2>Byte code</h2> <p>Here is generated byte code. Note that both <code>one</code> and <code>two</code> references are assigned with the same <code>#2</code> constant pointing to <code>"abc"</code> string:</p> <pre><code>ldc #2; //String abc astore_1 //one ldc #2; //String abc astore_2 //two </code></pre> <h2>Output</h2> <p>For each example I am printing the following values:</p> <pre><code>System.out.println(showInternalCharArrayHashCode(one)); System.out.println(showInternalCharArrayHashCode(two)); System.out.println(System.identityHashCode(one)); System.out.println(System.identityHashCode(two)); </code></pre> <p>No surprise that both pairs are equal:</p> <pre><code>23583040 23583040 8918249 8918249 </code></pre> <p>Which means that not only both objects point to the same <code>char[]</code> (the same text underneath) so <code>equals()</code> test will pass. But even more, <code>one</code> and <code>two</code> are the exact same references! So <code>one == two</code> is true as well. Obviously if <code>one</code> and <code>two</code> point to the same object then <code>one.value</code> and <code>two.value</code> must be equal.</p> <h1>Literal and <code>new String()</code></h1> <h2>Java code</h2> <p>Now the example we all waited for - one string literal and one new <code>String</code> using the same literal. How will this work?</p> <pre><code>String one = "abc"; String two = new String("abc"); </code></pre> <p>The fact that <code>"abc"</code> constant is used two times in the source code should give you some hint...</p> <h2>Class constant pool</h2> <p>Same as above.</p> <h2>Byte code</h2> <pre><code>ldc #2; //String abc astore_1 //one new #3; //class java/lang/String dup ldc #2; //String abc invokespecial #4; //Method java/lang/String."&lt;init&gt;":(Ljava/lang/String;)V astore_2 //two </code></pre> <p>Look carefully! The first object is created the same way as above, no surprise. It just takes a constant reference to already created <code>String</code> (<code>#2</code>) from the constant pool. However the second object is created via normal constructor call. But! The first <code>String</code> is passed as an argument. This can be decompiled to:</p> <pre><code>String two = new String(one); </code></pre> <h2>Output</h2> <p>The output is a bit surprising. The second pair, representing references to <code>String</code> object is understandable - we created two <code>String</code> objects - one was created for us in the constant pool and the second one was created manually for <code>two</code>. But why, on earth the first pair suggests that both <code>String</code> objects point to the same <code>char[] value</code> array?!</p> <pre><code>41771 41771 8388097 16585653 </code></pre> <p>It becomes clear when you look at how <a href="http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/lang/String.java#String.%3Cinit%3E(java.lang.String)" rel="noreferrer"><code>String(String)</code> constructor works</a> (greatly simplified here):</p> <pre><code>public String(String original) { this.offset = original.offset; this.count = original.count; this.value = original.value; } </code></pre> <p>See? When you are creating new <code>String</code> object based on existing one, it <strong>reuses</strong> <code>char[] value</code>. <code>String</code>s are immutable, there is no need to copy data structure that is known to be never modified.</p> <p>I think this is the clue of your problem: even if you have two <code>String</code> objects, they might still point to the same contents. And as you can see the <code>String</code> object itself is quite small. </p> <h1>Runtime modification and <code>intern()</code></h1> <h2>Java code</h2> <p>Let's say you initially used two different strings but after some modifications they are all the same:</p> <pre><code>String one = "abc"; String two = "?abc".substring(1); //also two = "abc" </code></pre> <p>The Java compiler (at least mine) is not clever enough to perform such operation at compile time, have a look:</p> <h2>Class constant pool</h2> <p>Suddenly we ended up with two constant strings pointing to two different constant texts:</p> <pre><code>const #2 = String #44; // abc const #3 = String #45; // ?abc const #44 = Asciz abc; const #45 = Asciz ?abc; </code></pre> <h2>Byte code</h2> <pre><code>ldc #2; //String abc astore_1 //one ldc #3; //String ?abc iconst_1 invokevirtual #4; //Method String.substring:(I)Ljava/lang/String; astore_2 //two </code></pre> <p>The fist string is constructed as usual. The second is created by first loading the constant <code>"?abc"</code> string and then calling <code>substring(1)</code> on it. </p> <h2>Output</h2> <p>No surprise here - we have two different strings, pointing to two different <code>char[]</code> texts in memory:</p> <pre><code>27379847 7615385 8388097 16585653 </code></pre> <p>Well, the texts aren't really <em>different</em>, <code>equals()</code> method will still yield <code>true</code>. We have two unnecessary copies of the same text.</p> <p>Now we should run two exercises. First, try running:</p> <pre><code>two = two.intern(); </code></pre> <p>before printing hash codes. Not only both <code>one</code> and <code>two</code> point to the same text, but they are the same reference!</p> <pre><code>11108810 11108810 15184449 15184449 </code></pre> <p>This means both <code>one.equals(two)</code> and <code>one == two</code> tests will pass. Also we saved some memory because <code>"abc"</code> text appears only once in memory (the second copy will be garbage collected).</p> <p>The second exercise is slightly different, check out this:</p> <pre><code>String one = "abc"; String two = "abc".substring(1); </code></pre> <p>Obviously <code>one</code> and <code>two</code> are two different objects, pointing to two different texts. But how come the output suggests that they both point to the same <code>char[]</code> array?!?</p> <pre><code>23583040 23583040 11108810 8918249 </code></pre> <p>I'll leave the answer to you. It'll teach you how <code>substring()</code> works, what are the advantages of such approach and when it can <a href="https://stackoverflow.com/questions/1281549">lead to big troubles</a>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload