Note that there are some explanatory texts on larger screens.

plurals
  1. POx64 memset core, is passed buffer address truncated?
    primarykey
    data
    text
    <p><strong>1. Problem Background</strong></p> <p>Recently a core dump occurred on one of our on-line search server. The core happens in <code>memset()</code> due to the attempt to write to an invalid address, and hence received the SIGSEGV signal. The following information is from dmsg:</p> <p><code>is_searcher_ser[17405]: segfault at 000000002c32a668 rip 0000003da0a7b006 rsp 0000000053abc790 error 6</code></p> <p>The environment of our on-line servers goes as follows:</p> <ul> <li>OS: RHEL 5.3 </li> <li>Kernel: 2.6.18-131.el5.custom, x86_64 (64-bit)</li> <li>GCC: 4.1.2 20080704 (Red Hat 4.1.2-44)</li> <li>Glibc: glibc-2.5-49.6</li> </ul> <p>The following is the relevant code snippet:</p> <pre><code>CHashMap&lt;…&gt;::CHashMap(…) { … typedef HashEntry *HashEntryPtr; m_ppEntry = new HashEntryPtr[m_nHashSize]; // m_nHashSize is 389 when core assert(m_ppEntry != NULL); memset(m_ppEntry, 0x0, m_nHashSize*sizeof(HashEntryPtr)); // Core in this memset() invocation … } </code></pre> <p>The assembly code of the above code is:</p> <pre><code>… 0x000000000091fe9e &lt;+110&gt;: callq 0x502638 &lt;_Znam@plt&gt; // new HashEntryPtr[m_nHashSize] 0x000000000091fea3 &lt;+115&gt;: mov 0xc(%rbx),%edx // Get the value of m_nHashSize 0x000000000091fea6 &lt;+118&gt;: mov %rax,%rdi // Put m_ppEntry pointer to %rdi for later memset invocation 0x000000000091fea9 &lt;+121&gt;: mov %rax,0x20(%rbx) // Store the pointer to m_ppEntry member variable(%rbx holds the this pointer) 0x000000000091fead &lt;+125&gt;: xor %esi,%esi // Generate 0 0x000000000091feaf &lt;+127&gt;: shl $0x3,%rdx // m_nHashSize*sizeof(HashEntryPtr) 0x000000000091feb3 &lt;+131&gt;: callq 0x502b38 &lt;memset@plt&gt; // Call the memset() function … </code></pre> <p>In the core dump, the assembly of <code>memset@plt</code> is:</p> <pre><code>(gdb) disassemble 0x502b38 Dump of assembler code for function memset@plt: 0x0000000000502b38 &lt;+0&gt;: jmpq *0x771b92(%rip) # 0xc746d0 &lt;memset@got.plt&gt; 0x0000000000502b3e &lt;+6&gt;: pushq $0x53 0x0000000000502b43 &lt;+11&gt;: jmpq 0x5025f8 End of assembler dump. (gdb) x/ag 0x0000000000502b3e+0x771b92 0xc746d0 &lt;memset@got.plt&gt;: 0x3da0a7acb0 &lt;memset&gt; (gdb) disassemble 0x3da0a7acb0 Dump of assembler code for function memset: 0x0000003da0a7acb0 &lt;+0&gt;: cmp $0x1,%rdx 0x0000003da0a7acb4 &lt;+4&gt;: mov %rdi,%rax … </code></pre> <p>For the above GDB analysis, we know that the address of <code>memset()</code> has been resolved in the relocation PLT table. That is to say, the first <code>jmpq *0x771b92(%rip)</code> will directly jump to the first instruction of function <code>memset()</code>. Besides, the program had run nearly one day on-line, the relocation address of <code>memset()</code> should have been already resolved earlier.</p> <p><strong>2. Weird phenomenon</strong> </p> <p>This core fired at the instruction <code>=&gt; 0x0000003da0a7b006 &lt;+854&gt;: mov %rdx,-0x8(%rdi)</code> in the <code>memset()</code>. Actually this is the instruction in the <code>memset()</code> to set the <code>0</code> at the right begin position of the buffer which is the first parameter of <code>memset()</code>.</p> <p>When cored , in frame 0, the value of <code>$rdi</code> is <code>0x2c32a670</code> ,and <code>$rax</code> is <code>0x2c32a668</code>. From the assembly analysis and off-line test, <code>$rax</code> should hold the source buffer of the <code>memset</code>, i.e., the first parameter of <code>memset()</code>.</p> <p>So, in our example, <code>$rax</code> should be same as the address of <code>m_ppEntry</code>, the value of which is stored in the <code>this</code> object (<code>this</code> pointer is stored in <code>%rbx</code>) first before it is zeroed by <code>memset</code> later. However, the value of <code>m_ppEntry</code> is <code>0x2ab02c32a668</code>.</p> <p>Then use <code>info files</code> GDB command to check, the address <code>0x2c32a668</code> is indeed invalid (not mapped), and address <code>0x2ab02c32a668</code> is a valid address. </p> <p><strong>3. Why it is weird?</strong></p> <p>The weird place of this core is that: If the real address of <code>memset</code> has been resolved already(very very probably), then there are only very few instructions between the operation to put the pointer value into <code>m_ppEntry</code> and the attempt to <code>memset</code> it. And actually the value of register <code>$rax</code> (holding the passed buffer address) are not changed at all during these instructions. So, how can <code>m_ppEntry</code> isn’t equal to <code>$rax</code>?</p> <p>What is weird <em>More</em> is that: when core, the value of <code>$rax</code> (<code>0x2c32a668</code>) is actually the value of lower 4 bytes of <code>m_ppEntry</code> (<code>0x2ab02c32a668</code>). If there is indeed some relationship between the two values, is the <code>m_ppEntry</code> parameter passed to <code>memset</code> being truncated? However, the involved several instructions all use <code>%rax</code>, rather than <code>%eax</code>. By the way, I cannot reproduce this issue offline.</p> <p>So,</p> <p>1) Which address is valid? If <code>0x2c32a668</code> is valid? Is the heap corrupted just between the several instructions? And how to paraphrase that the value of <code>m_ppEntry</code> is <code>0x2ab02c32a668</code>, and why the low 4 bytes of this two value is the same?</p> <p>2) If <code>0x2ab02c32a668</code> is valid, why the address is truncated when passed into the 64-bit <code>memset()</code>? Under which condition this error will occur? I cannot reproduce this offline. Is this issue an known bug? I didn't find it through Google.</p> <p>3) Or, is it due to some hardware or power issue to make the 4 higher bytes of <code>%rdi</code> passed to <code>memset</code> zeroed? (I’m very very reluctant to believe this).</p> <p><strong>At last, any comment on this core is appreciated.</strong></p> <p>Thanks,</p> <p>Gary Hu</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload