Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>Short Answer</strong></p> <p>The first time that <code>calloc</code> is called it is explicitly zeroing out the memory. While the next time that it is called it assumed that the memory returned from <code>mmap</code> is already zeroed out.</p> <p><strong>Details</strong></p> <p>Here's some of the things that I checked to come to this conclusion that you could try yourself if you wanted:</p> <ol> <li><p>Insert a <code>calloc</code> call before your first <code>ALLOC</code> call. You will see that after this the Time for Time A and Time B are the same.</p></li> <li><p>Use the <code>clock()</code> function to check how long each of the <code>ALLOC</code> calls take. In the case where they are both using <code>calloc</code> you will see that the first call takes much longer than the second one.</p></li> <li><p>Use <code>time</code> to time the execution time of the <code>calloc</code> version and the <code>USE_MMAP</code> version. When I did this I saw that the execution time for <code>USE_MMAP</code> was consistently slightly less.</p></li> <li><p>I ran with <code>strace -tt -T</code> which shows both the time of when the system call was made and how long it took. Here is part of the output:</p></li> </ol> <p>Strace output:</p> <pre><code>21:29:06.127536 mmap(NULL, 2000015360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fff806fd000 &lt;0.000014&gt; 21:29:07.778442 mmap(NULL, 2000015360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fff093a0000 &lt;0.000021&gt; 21:29:07.778563 times({tms_utime=63, tms_stime=102, tms_cutime=0, tms_cstime=0}) = 4324241005 &lt;0.000011&gt; </code></pre> <p>You can see that the first <code>mmap</code> call took <code>0.000014</code> seconds, but that about <code>1.5</code> seconds elapsed before the next system call. Then the second <code>mmap</code> call took <code>0.000021</code> seconds, and was followed by the <code>times</code> call a few hundred microsecond later.</p> <p>I also stepped through part of the application execution with <code>gdb</code> and saw that the first call to <code>calloc</code> resulted in numerous calls to <code>memset</code> while the second call to <code>calloc</code> did not make any calls to <code>memset</code>. You can see the source code for <code>calloc</code> <a href="http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/malloc/malloc.c?revision=17701&amp;view=markup" rel="nofollow">here</a> (look for <code>__libc_calloc</code>) if you are interested. As for why <code>calloc</code> is doing the <code>memset</code> on the first call but not subsequent ones I don't know. But I feel fairly confident that this explains the behavior you have asked about.</p> <p>As for why the array that was zeroed <code>memset</code> has improved performance my guess is that it is because of values being loaded into the TLB rather than the cache since it is a very large array. Regardless the specific reason for the performance difference that you asked about is that the two <code>calloc</code> calls behave differently when they are executed. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload